On Fri, May 16, 2025 at 03:34:55PM -0700, Marc Herbert wrote:
> On 2025-05-15 23:08, Dan Williams wrote:
> > alison.schofield@ wrote:
>
> >>
> >> +wait_for_logfile_update()
> >> +{
> >> + local file="$1"
> >> + local prev_size="$2"
> >> + local timeout=30
> >> + local i=0
> >> +
> >> + # prev_size is always zero because start_monitor truncates it.
> >> + # Set and check against it anyway to future proof.
> >> + while [ $i -lt $timeout ]; do
> >> + local new_size=$(stat -c%s "$file" 2>/dev/null || echo 0)
> >> + if [ "$new_size" -gt "$prev_size" ]; then
> >> + return 0
> >> + fi
> >> + sleep 0.1
> >> + i=$((i+1))
> >> + done
> >> +
> >> + echo "logfile not updated within 3 seconds"
> >> + err "$LINENO"
> >
> > Hmm... not a fan of this open coded "wait for file to change" bash
> > function. This feels like something that a tool can do... (searches)
> >
> > Does inotifywait fit the bill here?
> >
> > https://linux.die.net/man/1/inotifywait
>
> If inotify works, go for it. Blocking is always better than polling. It
> might be tricky because the file does not exist yet. Create an empty
> file yourself first, would that work? Probably not if ndctl monitor
> creates a brand new file.
>
> If inotify does not work, consider adding to test/common this generic
> polling function that lets you poll in bash literally anything:
>
> https://github.com/pmem/run_qemu/pull/177/files
>
> It would require making `prev_size` global which does not look like an
> issue to me.
>
> Before adding it run_qemu, that polling function has been used for years
> and thousands of runs in
> https://github.com/thesofproject/sof-test/blob/main/case-lib/lib.sh
> I mean it's been extremely well tested.
>
> Even if you don't need polling here, it's unfortunately fairly common to
> have to poll from shell scripts. Why I'm suggesting a test/common
> location.
Thanks. I'm not sure what other polling type work is going on in
the tests, but I'm sure if any, it'll come out of the woodwork
now that we're mentioning it.
I did drop the polling. Not that I thought polling was so horrid,
but mostly because I found something better to 'look' for, so using
'tail -F' to watch the logfile made more sense.
>
> >> +}
> >> +
> >> start_monitor()
> >> {
> >> logfile=$(mktemp)
> >> $NDCTL monitor -c "$monitor_conf" -l "$logfile" $1 &
> >> monitor_pid=$!
> >> - sync; sleep 3
> >> + sync
> >> + for i in {1..30}; do
> >> + if ps -p "$monitor_pid" > /dev/null; then
> >> + sleep 0.1
> >> + break
> >> + fi
> >> + sleep 0.1
> >> + done
> >> + if ! ps -p "$monitor_pid" > /dev/null; then
> >> + echo "monitor not ready within 3 seconds"
> >> + err "$LINENO"
> >> + fi
> >
> > This does not make sense to. The shell got the pid from the launching
> > the executable. This is effectively testing that bash command execution
> > works. About the only use I can imagine for this is checking that the
> > monitor did not die early, but that should be determined by other parts
> > of the test.
>
> Agreed: I'm afraid the only thing this code does is sleeping 0.1s only
> once instead of 3s. Because not sleeping at all worked for you, no
> surprise a single sleep 0.1 works too.
>
> I suspect the only case where the "for" loop actually iterates is when the
> "monitor" process crashes extremely fast, faster than the
> "sync". Basically racing with its parent to crash before the latter
> notices. That race does not look like a "feature" to me.
>
> I agree this should be replaced by observing side-effects from the
> monitor. Dunno what. grep something in the ndctl monitor -v output?
>
>
> By the way, UNTESTED:
>
Is this because in your testing the monitor was left dangling and
the test hanging? I suspect the that the cleanup on err was missing,
and I added it my v3 patch.
> --- a/test/monitor.sh
> +++ b/test/monitor.sh
> @@ -67,7 +67,19 @@ check_result()
>
> stop_monitor()
> {
> - kill $monitor_pid
> + kill $monitor_pid || die "monitor $monitor_pid was already dead"
> +
> + local ret=0
> + timeout 3 wait $monitor_pid || ret=$?
> + case "$ret") in
> + 124) # timeout
> + die "monitor $monitor_pid ignored SIGTERM" ;;
> + 0|127) # either success or killed fast
> + : ;;
> + *)
> + die "unexpected monitor exit status:$ret" ;;
> + esac
> +
> rm "$logfile"
> }
>
>
> Something like that...
>
> This is all assuming the monitor does not have any kind of "remote
> control"; that would be much better.
>
> I don't know the "monitor", but if it has neither obvious
> side-effects nor any kind of "remote control", then it does a "great"
> job... getting in the way of tests! :-( Maybe the monitor is what should
> be fixed rather than adding shell script "creativity"?