Thanks for the tips on how to approach debugging. I will try that. It might 
take some time, but I will try to report back here with the result or when I 
get stuck.

An alternative would of course also be to try a newer Kernel, e.g. the one 
Daniel Vidal is apparently using successfully.

As such, in case someone has additional experiences or versions that "work for 
me", this information is still highly valuable. 

Best,
Nikolaus




-----Original Message-----
From: sf...@users.sourceforge.net [mailto:sf...@users.sourceforge.net] 
Sent: Dienstag, 17. Januar 2017 04:31
To: Demmel Nikolaus (BOSP/PAR) <nikolaus.dem...@de.bosch.com>
Cc: aufs-users@lists.sourceforge.net
Subject: Re: AUFS and PREEMPT_RT boot issue


"Demmel Nikolaus (BOSP/PAR)":
> I'm assuming from your response that in general you expect AUFS to work wit=
> h PREEMPT_RT, or is this not the case?

Although I myself don't use RT patch, yes it should work. Of course,
some workaround may be necessary. It won't be clear until lots of tests
and diving into the patches.


> What I actually mean is that in about 40% of the time when booting into a k=
> ernel with the RT patch, the boot hangs at=20
>
>       mount -t aufs -o "dirs=3D/rw=3Drw:/ro=3Dro" aufs $ROOT_MOUNT
>
> in our init script and does not appear to return at all. The other 60% it w=
> orks as expected without delay.

I was misunderstanding. Now it is clear that
- mounting aufs sometimes hungs, and you can do nothing but reboot.
- sometimes it doesn't hung.

Often such problem is caused by an unitialized data such as lock
objects. But of course we are not sure currently. The cause may be
somthing like that, or totally different one. Additionally there may
exist the mulitple causes.


> Then exact same configuration just without PREEMPT_RT patch appears to work=
>  100% of the time.
>
> Does your answer still apply? Should we try the strace?

Yes. strace will show us which systemcall hungs. The most suspicous one
is mount(2), but it is better to confirm.
After finding out the systemcall, then we can dive into the kernel
space. Usually embedding printk or MagicSysrq is a good debugging method
to see what is going on and identify the root cause. But in these days,
ftrace and other tracing features are good choices too, though I don't
have much experiences about them.

For debugging the RT patch, git-bisect may be a good choice such as
- prepare linux-4.1.30 git tree.
- apply and git-commit all patches except RT.
- apply RT patch series and git-commit one by one.
- run 'git-bisect start HEAD "just before RT"'
  + HEAD is the last patch/commit in RT series
  + "just before RT" is the commit of 'apply all patches except RT'
  + repeat the rebuild and test based on the bisection.
  + git-bisect will tell you the suspicius patch, if everything goes
    well.

The RT patch seriese may not be bisect-able. In this case, git-bisect
won't help.

Choose any debugging way you like, try harder, and you will find the
root cause and fix it.


> Do you suggest that we should try to change it to the patch you linked?

No.
Because I don't know what is correct currently.


J. R. Okajima

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

Reply via email to