Hi All,

We have 8 identical (I hope) ROACH-2 boards.  We have been using 4 of them
for a long time, and I just brought two more online, but one of them is
behaving differently than the others.  One problem is that the sensor list
is different.  If I run the katcp routine to get the sensor list

reply, sensors = ro[5].fpga.blocking_request(Message.request('sensor-list'))

the message returns quickly but has a short list:

sensors:
[<Message inform sensor-list (mode, current\_mode, none, discrete, raw)>,
 <Message inform sensor-list (raw.temp.ambient,
Ambient\_board\_temperature, millidegrees, integer, -2147483648,
2147483647, -2147483648, 2147483647)>,
 <Message inform sensor-list (raw.temp.ppc, PowerPC\_temperature,
millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
 <Message inform sensor-list (raw.temp.fpga, FPGA\_temperature,
millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
 <Message inform sensor-list (raw.temp.inlet, Inlet\_ambient\_temperature,
millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
 <Message inform sensor-list (raw.temp.outlet,
Outlet\_ambient\_temperature, millidegrees, integer, -2147483648,
2147483647, -2147483648, 2147483647)>]


If I run the katcp routine to get the sensor values, the command takes ~ 8
s and returns bad values

reply, vals =
ro[5].fpga.blocking_request(Message.request('sensor-value'),timeout=10)

vals:
[<Message inform sensor-value (1446089462529, 1, mode, unknown, raw)>,
 <Message inform sensor-value (1446089463542, 1, raw.temp.ambient, nominal,
-1)>,
 <Message inform sensor-value (1446089464554, 1, raw.temp.ppc, nominal,
-1)>,
 <Message inform sensor-value (1446089465566, 1, raw.temp.fpga, nominal,
-1)>,
 <Message inform sensor-value (1446089468602, 1, raw.temp.inlet, nominal,
0)>,
 <Message inform sensor-value (1446089471638, 1, raw.temp.outlet, nominal,
0)>]

Doing this on a good board returns immediately and gives the much longer
sensor list:

vals:
[<Message inform sensor-value (1446089513976, 1, mode, unknown, raw)>,
 <Message inform sensor-value (1446089513984, 1, raw.temp.ambient, nominal,
34000)>,
 <Message inform sensor-value (1446089513984, 1, raw.temp.ppc, nominal,
49000)>,
 <Message inform sensor-value (1446089513984, 1, raw.temp.fpga, nominal,
59000)>,
 <Message inform sensor-value (1446089513987, 1, raw.fan.chs1, nominal,
7650)>,
 <Message inform sensor-value (1446089513990, 1, raw.fan.chs2, nominal,
7650)>,
 <Message inform sensor-value (1446089513993, 1, raw.fan.fpga, nominal,
5730)>,
 <Message inform sensor-value (1446089513996, 1, raw.fan.chs0, nominal,
7650)>,
 <Message inform sensor-value (1446089513998, 1, raw.temp.inlet, nominal,
34000)>,
 <Message inform sensor-value (1446089514000, 1, raw.temp.outlet, nominal,
32750)>,
 <Message inform sensor-value (1446089514006, 1, raw.voltage.1v, nominal,
1004)>,
 <Message inform sensor-value (1446089514007, 1, raw.voltage.1v5, nominal,
1498)>,
 <Message inform sensor-value (1446089514007, 1, raw.voltage.1v8, nominal,
1808)>,
 <Message inform sensor-value (1446089514007, 1, raw.voltage.2v5, nominal,
2497)>,
 <Message inform sensor-value (1446089514008, 1, raw.voltage.3v3, nominal,
3360)>,
 <Message inform sensor-value (1446089514008, 1, raw.voltage.5v, nominal,
5098)>,
 <Message inform sensor-value (1446089514008, 1, raw.voltage.12v, nominal,
3936)>,
 <Message inform sensor-value (1446089514009, 1, raw.voltage.3v3aux,
nominal, 3388)>,
 <Message inform sensor-value (1446089514015, 1, raw.voltage.5vaux,
nominal, 5055)>,
 <Message inform sensor-value (1446089514015, 1, raw.current.3v3, nominal,
420)>,
 <Message inform sensor-value (1446089514015, 1, raw.current.2v5, nominal,
1009)>,
 <Message inform sensor-value (1446089514016, 1, raw.current.1v8, nominal,
500)>,
 <Message inform sensor-value (1446089514016, 1, raw.current.1v5, error,
7050)>,
 <Message inform sensor-value (1446089514016, 1, raw.current.1v, error,
31240)>,
 <Message inform sensor-value (1446089514017, 1, raw.current.5v, nominal,
7777)>,
 <Message inform sensor-value (1446089514017, 1, raw.current.12v, nominal,
16127)>]

I also tried telneting into the bad ROACH, and got the same short list,
long sensor value request time, and bad sensor values.

Another symptom is that setting the ADC registers on the KATADC board does
not seem to work, or at least one of the ADCs is misbehaving in the same
way they did when we were not setting the registers correctly.

Has anyone seen this before?  Is it hardware, firmware, software?

Thanks,
Dale

Reply via email to