> > this will produce reams of output, but will provide you with the next > > hint. > > reams of output collected... > > > problem: the output will affect scheduling, which might make the > > problem go away. i recommend outputting it to a file if possible, and > > viewing after the fact. > > I have big files for jack and ardour, and the problem did not go away. I > have pored over the huge files but I don't seem to find anything that > looks like a glitch. Will keep looking...
I have some data but I have no idea if it is significant. This is running rt_monitor as a safeguard, jack and finally ardour. To speed up the death of the patient (in my experience) a tar cvf usr.tar /usr was also run in a separate terminal... I looked for time transients in the cycle count of the listings. This is the tail of it for ardour (abs cycles followed by delta cycles): 25316620570960 151478724 25314593862618 170551720 25316905845228 174448260 25283295747490 1690268954 So the last delta time is much bigger, a good candidate. Finding the time in the listing gives me this: jack:27813:25281569457264 client.c:jack_client_thread:556: client polling on event_fd and graph_wait_fd... jack:27813:25281605113512 client.c:jack_client_thread:664: client 27813 signalled at 25281605055172, awake for process at 25281605109372 (delay = 32.071006 usecs) (wakeup on graph_wait_fd==30) jack:27813:25281605240108 client.c:jack_client_thread:686: client finished processing at 25281605238028 (elapsed = 76.127811 usecs), writing on graph_next_fd==31 jack:27813:25281605450304 client.c:jack_client_thread:694: client sent message to next stage by 25281605450388, client reading on graph_wait_fd==30 jack:27813:25281605461784 client.c:jack_client_thread:699: reading cleanup byte from pipe jack:27813:25281605472500 client.c:jack_client_thread:710: process cycle fully complete jack:27813:25281605478536 client.c:jack_client_thread:556: client polling on event_fd and graph_wait_fd... GLITCH! (this is not part of the listing :-) jack:27813:25283295747490 client.c:jack_client_thread:556: client polling on event_fd and graph_wait_fd... jack:27813:25283359653444 client.c:jack_client_thread:664: client 27813 signalled at 25281641118776, awake for process at 25283359647196 (delay = 1016880.700592 usecs) (wakeup on graph_wait_fd==30) jack:27813:25283359829424 client.c:jack_client_thread:686: client finished processing at 25283359827480 (elapsed = 106.676923 usecs), writing on graph_next_fd==31 jack:27813:25283360445692 client.c:jack_client_thread:694: client sent message to next stage by 25283360445776, client reading on graph_wait_fd==30 jack:27813:25283360457440 client.c:jack_client_thread:699: reading cleanup byte from pipe jack:27813:25283360468308 client.c:jack_client_thread:710: process cycle fully complete So there are two "client polling" in a row with nothing in between... The second one presumably happens when rt_monitor downgrades the process to SCHED_OTHER? Or maybe that is what is "hanging"? In the jack side of things: 25317681129070 143623260 25316620498912 151380488 25314593789218 170399340 25283359590208 1718469688 jack:27772:25281605266960 engine.c:jack_process:584: reading byte from subgraph_wait_fd==13 jack:27772:25281641074368 engine.c:jack_process:465: considering client alsa_pcm for processing jack:27772:25281641107540 engine.c:jack_process:498: in-process client has no process() function jack:27772:25281641114096 engine.c:jack_process:465: considering client ardour for processing jack:27772:25281641120520 engine.c:jack_process:516: **** alsa_pcm: xrun of at least 996.047 msecs calling process() on an OOP subgraph, fd==7 GLITCH! jack:27772:25283359590208 engine.c:jack_process:535: waiting on fd==13 for process() subgraph to finish jack:27772:25283359952396 engine.c:jack_process:584: reading byte from subgraph_wait_fd==13 jack:27772:25283396656700 engine.c:jack_process:465: considering client alsa_pcm for processing jack:27772:25283396688968 engine.c:jack_process:498: in-process client has no process() function jack:27772:25283396695880 engine.c:jack_process:465: considering client ardour for processing I did another test with the sleep in rt_monitor changed from 2 seconds to 10 seconds (to keep the machine catatonic for a longer time) and the result was not as interesting because jack timed out when it recovered and killed itself and ardour. In that test I found three back to back polls with big timeouts in between and two more somewhere else in the file. -- Fernando