Hi Ross,

I think I have a solution to your problem. It's possible that it's
something else or you might need/want to solve it in a different way, but
this is how I do it.

The problem is most likely because shared memory segments and semaphore
arrays are being locked and when a change is made to the size of the
semaphore controlled shared memory buffers or something else happens behind
the scenes in hashpipe, the interprocess-communication segments (IPCS)
can't be changed. I was unfamiliar with IPCS before this issue, and I
recommend learning about it if you haven't already.

So enter 'ipcs' into your terminal and you should see shared memory
segments and semaphore arrays with the exact number of blocks that you
initialized alongside your username. You might also see quite a few other
processes depending on the activity on your server. Once you've spotted
those segments and arrays, you'll need to remove them using their IDs. The
commands to do so are:

ipcrm -m 'shared memory ID'      -> This removes the shared memory segment
corresponding to that ID.
ipcrm -s 'semaphore ID'      -> This removes the semaphore
array corresponding to that ID.

I was also able to write a shell script that removes all of the segments
and arrays associated with your account/username by running it once with no
need to look at any IDs which simplifies the process quite a bit (entering
one line in the terminal). But this may not be your issue so I can provide
you with it if you need it. Or you can write your own, whichever works.

Hopefully this helps. If this isn't your problem, maybe provide a few more
details if you can, and maybe I can help.

Good luck!

Mark Ruzindana

On Wed, Dec 16, 2020 at 11:08 PM Ross Andrew Donnachie <
radonnac...@gmail.com> wrote:

> Good day all,
>
> Been working on a hashpipe with a pipeline of network, transposition and
> then disk-dump threads. We have 24 data-buffers that we rotate through.
>
> An inconsistent (happens after various amounts of time) crash occurs with
> this printout:
> -----------------------------------------------------
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_filled): semctl
> error [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_free_timeout):
> semop error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pktsock_thread): error
> waiting for free databuf [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_free): semctl error
> [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_filled_timeout):
> semop error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pkt_to_FTP_transpose):
> error waiting for input buffer, rv: -2 [Invalid argument]
> -----------------------------------------------------------
>
> If this looks at all familiar to anyone then let's tackle the issue at
> this juncture. If not, then there is a little more to tell!
>
> Other times an error is caught but no full printout from hashpipe_error()
> is made:
>
> Code calls:
> ++++++++++++++++++++++++++++
> hpguppi_databuf_data(struct hpguppi_input_databuf *d, int block_id) {
>     if(block_id < 0 || d->header.n_block < block_id) {
>         hashpipe_error(__FUNCTION__,
>             "block_id %s out of range [0, %d)",
>             block_id, d->header.n_block);
>         return NULL;
> ....
> ++++++++++++++++++++++++++++
>
> Printout:
> ============
> Tue Dec 15 17:37:19 2020 : Error
> (hpguppi_databuf_data)~/src/hpguppi_daq/src:
> ============
>
> Only once have I seen the above printout complete showing that
> d->header.n_block = -23135124... Which indicates some deep rooted rot
> somewhere.
>
> I have made some small changes that have gotten us past the occurrence of
> this issue. I'll be retracing the differences to find the critical fix. At
> that point I'll share.
>
> Kind Regards,
> Ross
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/bd10cd82-63d1-4d3b-bf4e-78903471df59n%40lists.berkeley.edu
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/bd10cd82-63d1-4d3b-bf4e-78903471df59n%40lists.berkeley.edu?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpw8qXMHaMEDGWj1%3D9%3D3Zo5KM5D%2BYjR3D3VKzzepX_TV4g%40mail.gmail.com.

Reply via email to