Thanks for the feedback!

I'd also like to understand this problem a little better.

Q13 sits on the 5V rail and the P-channel MOSFET is rated at -11A with a 13mohm 
on resistance. That's good for over 50W. Was there any indication of heat 
damage on the failed parts (due to overloading or maybe the heat sinking was 
bad)?

Or did the part look fine, it just didn't turn on anymore? There is a resistor 
between the gate and source, so the gate doesn't float even when not in use. It 
really shouldn't have broken due to ESD after it was installed.

We have had batches of boards with bad components before... once even due to 
passives (termination resistors that didn't all have the right resistance) 
which is the last thing we expect to fail. These are usually all caught in the 
factory during the standard off-the-line tests.

Can you supply the serial numbers of these boards so we can add this issue to 
our database? We'll then monitor to see if it re-occurs on any other boards.

Jason

On 19 Jun 2012, at 19:05, John Ford wrote:

>> Jason R. and John,
>> Was the roach running a particularly intensive design at the time
>> around the failure? Just wondering why this part would be failing. Is
>> the current limit somehow being exceeded?
> 
> We don't know about the first one, because it came to us from Socorro, but
> the second roach was being used to test the tutorials, so I don't think it
> was particularly heavily loaded.
> 
> I had a thought that we should check the serial numbers and see if they
> are from the same batch.  Maybe some bad parts or ESD damage?
> 
> John
> 
>> Thanks,
>> Glenn
>> 
>> On Tue, Jun 19, 2012 at 9:52 AM, Jason Ray <j...@nrao.edu> wrote:
>>> The first time I was troubleshooting this problem, I did see a fault on
>>> the
>>> 1V supply with roach_monitor.py.  I didn't check roach_monitor.py on the
>>> second roach because the problem was so fresh in our mind we just jumped
>>> to
>>> the finish line and checked the mosfet with a meter, then replaced it.
>>> 
>>> For reference, the part in question is Q13 (FD6675BZ).
>>> 
>>> Thanks,
>>> Jason
>>> 
>>> 
>>> 
>>> At 09:33 AM 6/19/2012, Jason Manley wrote:
>>>> 
>>>> Good sleuthing!
>>>> 
>>>> FWIW, roach_monitor.py is supposed to be able to pull the log out of
>>>> the
>>>> Actel Fusion, which should have logged a fault on the 1V rail before
>>>> shutting-down the board. This should work independent of PPC or dmesg
>>>> states. I'm afraid I have little faith in the Fusion/Xport combo to
>>>> reliably
>>>> catch these issues, but it has helped me a few times.
>>>> 
>>>> If it works, it only retrieves the reason for the last shutdown, so
>>>> you'll
>>>> have to plug a laptop into the Xport to query it directly after it
>>>> self-shutdown.
>>>> 
>>>> Jason
>>>> 
>>>> On 19 Jun 2012, at 15:23, John Ford wrote:
>>>> 
>>>>> Hi all.  We've had a couple of ROACH failures with identical causes.
>>>>> Maybe some of you have seen this, but it's worth keeping in mind in
>>>> case
>>>>> you have a problem.
>>>>> 
>>>>> The symptom is that the ROACH would sort of power on, but then turn
>>>> off
>>>>> spontaneously.  On one, as soon as the bof was loaded the roach would
>>>>> turn
>>>>> off.  The other one would come on for a brief few seconds and then
>>>> turn
>>>>> or, or it would cycle on and off.  The monitor readout in dmesg gave
>>>>> non-sense readings.
>>>>> 
>>>>> In any event, the cause was traced to the +1 volt supply MOSFET
>>>> switch.
>>>>> Replacing that mosfet fixed both roaches.  Kudos to Jason Ray for
>>>>> finding
>>>>> the problem originally.
>>>>> 
>>>>> John
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>> 
>> 
> 
> 


Reply via email to