Hi, Ross,

> On Dec 16, 2020, at 22:08, Ross Andrew Donnachie <radonnac...@gmail.com> 
> wrote:
> 
> Been working on a hashpipe with a pipeline of network, transposition and then 
> disk-dump threads. We have 24 data-buffers that we rotate through. 
> 
> An inconsistent (happens after various amounts of time) crash occurs with 
> this printout:
> -----------------------------------------------------
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_filled): semctl error 
> [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_free_timeout): semop 
> error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pktsock_thread): error 
> waiting for free databuf [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_free): semctl error 
> [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_filled_timeout): 
> semop error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pkt_to_FTP_transpose): 
> error waiting for input buffer, rv: -2 [Invalid argument] 
> -----------------------------------------------------------

This can happen if data are erroneously written to the header of the data 
buffer to pointer arithmetic) and clobber the semaphore ID that is stored 
there.  One way (but certainly not the only way) this can happen is due to bad 
pointer arithmetic.  You can check for this "corruption" by running 
"hashpipe_check_databuf".  It should show something like the following example 
(though obviously with values specific to your application):

$ hashpipe_check_databuf -K /root
databuf 1 stats:
  data_type='unknown'
  header_size=4096

  block_size=134422528
  n_block=24
  shmid=32769
  semid=0

semaphore mask: 000000

Specifically, the shmid and semid value shown should match the values displayed 
by "ipcs -a".

> Other times an error is caught but no full printout from hashpipe_error() is 
> made:
> 
> Code calls:
> ++++++++++++++++++++++++++++
> hpguppi_databuf_data(struct hpguppi_input_databuf *d, int block_id) {
>     if(block_id < 0 || d->header.n_block < block_id) {
>         hashpipe_error(__FUNCTION__,
>             "block_id %s out of range [0, %d)",
>             block_id, d->header.n_block);
>         return NULL;
> ....
> ++++++++++++++++++++++++++++
> 
> Printout:
> ============
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_databuf_data)~/src/hpguppi_daq/src: 
> ============
> 
> Only once have I seen the above printout complete showing that 
> d->header.n_block = -23135124... Which indicates some deep rooted rot 
> somewhere.

Indeed, it looks like corruption of the data buffer header (which can also be 
verified as shown above).

HTH,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AA182455-9AB2-4CD9-B4E3-27BF61B21564%40berkeley.edu.

Reply via email to