Below is a message I wrote with more about the problems we had at
NRAO, which did not make it to the list. By the way, others at NRAO
are using a recent version of the repository and have had better luck,
but based on your experience I wonder if there is still some subtle
issue with marginal signals or timing on some boards.

Glenn

Previous message:

The problem was because of some errors that crept into the ska-sa
repository. I had to revert to a commit BEFORE this one
https://github.com/ska-sa/mlib_devel/commit/bad95b18fe79146d288607e5fe3c0360c071c2ad
 (easy to remember since the hash starts with 'bad' :)
Something about this EPB to OPB optimization they did messes things
up. In theory they reverted these changes, but I found it still was
present last time I looked. And this is of course the least fun kind
of problem to keep checking if it's still there...
Note I had other issues with ROACH1s with this commit too.


On Thu, Jul 11, 2013 at 8:23 PM, Ryan Monroe <ryan.m.mon...@gmail.com> wrote:
> Thanks!  Sounds good
>
> Also: I take back the deterministic part.  The other roach started having
> the problem too, and it might have something to do with read lengths.  More
> to follow (eventually)
>
> --Ryan Monroe
> 626.773.0805
>
>
> On 07/11/2013 05:20 PM, John Ford wrote:
>>
>> Hi Ryan.  We had this problem, which appeared to be a "lockup".  I think
>> that Glenn and some others corresponded about it, and it was due to trying
>> to read/write bytes instead of words over the opb bus with a buggy kernel
>> or a buggy library.
>>
>> You might search through the mailing list for Glenn's name in about
>> November of last year.
>>
>> John
>>
>>
>>> Hey all,
>>>
>>> I'm trying to test out a new bit file (it uses the "pcore" feature and
>>> has 4 black boxes under the hood for what it's worth). *On one ROACH2 it
>>> works just fine* (in the context of this problem).
>>>
>>> On the other one, for ~1/5 of the registers, upon reading that register
>>> the ROACH2 stops responding to all katcp commands.  From dmesg, it looks
>>> like tcpborphserver is crashing.  It appears that the registers which
>>> kill it are deterministic across programmings. It also looks like the
>>> registers which fail are all shared_brams, but there is nothing
>>> exceptional about the ones which fail, imho
>>>
>>> Attached are the results of a python script on the two roaches, and a
>>> dmesg output of the failed board.  In addition, pictures of the
>>> configuration for both roaches.
>>>
>>> Anyone seen this before?
>>>
>>> --
>>> --Ryan Monroe
>>> 626.773.0805
>>>
>>>
>>
>
>

Reply via email to