We have about 100 ROACH2s deployed, both on our site and in various lab 
systems, and have not seen this sort of behaviour in any of our boards. But, we 
don't have ADCs connected to any of them. There're only power, 4x 10G SFP 
copper cables and 1x 1G STP going into the back. The boards are grounded via 
the ground pin of the IEC power connector.

We have observed that after an AC power restore, some of the boards do not 
automatically power back up, and we've got a few whose PPCs periodically stop 
responding on the network and they need to be power-cycled to recover them. But 
I believe these are all hardware failures, since they happen on the same boards 
each time and represent a small percentage (~5%) of all the boards.

Jason Manley
Functional Manager: DSP
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 18 Apr 2018, at 2:33, Matt Dexter <mdex...@berkeley.edu> wrote:

> Hi Jonathon & Dan,
> 
> I had the same sort of idea regarding the FTDI USB and PPC USB ports.
> 
> Not sure how that being held in some partially on partially off
> state could create problems for an ADC but maybe somehow corrupts
> the power on reset or something connected, even indirectly, to
> the FTDI IC U33.  Or the PPC IC for the PPC USB port case.
> 
> Are either of Roach2's USB ports connected to something powered up during
> the attempts to power down ?
> 
> Matt
> 
> On Tue, 17 Apr 2018, Dan Werthimer wrote:
> 
>> Date: Tue, 17 Apr 2018 17:16:22 -0700
>> From: Dan Werthimer <d...@ssl.berkeley.edu>
>> Reply-To: casper@lists.berkeley.edu
>> To: CASPER Mailing List <casper@lists.berkeley.edu>
>> Subject: Re: [casper] temporary ROACH2 faults after power dips and spikes
>> hi jonathan,
>> here's a remote possibility that might explain some of the behaviour you are
>> seeting: when the power goes down to your roach2's, does the power also go 
>> down on the sample clock or 1 PPS distribution? if the sample clock 
>> continues to be fed to the ADC's, then the CMOS adc chips can continued to 
>> be powered via the clock, or perhaps via 1 PPS, and because the voltages are 
>> low, the adc's can get in a
>> wierd mode...
>> you might need to power off the 1 PPS and sample clock ? or after power is 
>> restored, issue a reset to the ADC ? dan
>> On Tue, Apr 17, 2018 at 4:22 PM, Jonathan Weintroub
>> <jweintr...@cfa.harvard.edu> wrote:
>>      Hi CASPERites,
>> 
>>      With experience on quite a few ROACH2s in the lab and in the field
>>      for some years, and a pattern has emerged which warrants a question
>>      to the ROACH2 experts on this list. The SAO team has seen strange
>>      faults happen on multiple ROACH2 units after power failures, dips
>>      and lightening storms.   I’ll list the various weirdnesses below,
>>      but the key point is while a full power cycle, including removing
>>      power from the line input, does not reset and cure the units. But
>>      extended power down (like overnight, or 24 hours, or more) does
>>      seem to bring the units back to life again.  This was discovered
>>      serendipitously, and has happened often enough that the pattern
>>      seems repeatable (though controlled experiments aren’t really
>>      possible, we try not to stress our equipment this way).
>> 
>>      Has anyone else seen this, and does someone perhaps have a
>>      suggestion as to root cause, or some way to accelerate the reset?
>> 
>>      Example faults have included:
>> 
>>      —ADC5G clock not being correctly received, or not being transmitted
>>      to FPGA, or being transmitted at incorrect speed.
>> 
>>      —A particular ADC would refuse to calibrate its digital interface
>>      to the FPGA.
>> 
>>      —QDRs which don’t calibrate
>> 
>>      —After a lightening storm on Maunakea we have two units with a
>>      single SFP+ port among 8 falling to transmit packets, though we
>>      have yet to see if an extended power down will cure this.
>> 
>>      Again these faults have been distributed across multiple units, and
>>      in all cases have eventually been cleared, after extended power
>>      down.  Which is good, but the pathology worries us.
>> 
>>      Thanks in advance for any light that might be cast on this issue.
>> 
>>      Jonathan and André
>>      EHT/SMA
>> 
>>      --
>>      You received this message because you are subscribed to the Google
>>      Groups "casper@lists.berkeley.edu" group.
>>      To unsubscribe from this group and stop receiving emails from it,
>>      send an email to casper+unsubscr...@lists.berkeley.edu.
>>      To post to this group, send email to casper@lists.berkeley.edu.
>> --
>> You received this message because you are subscribed to the Google Groups
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email
>> to casper+unsubscr...@lists.berkeley.edu.
>> To post to this group, send email to casper@lists.berkeley.edu.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu.
> To post to this group, send email to casper@lists.berkeley.edu.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Reply via email to