Stephan,

Picking this back up, I am having difficulties repeating it consistently.
Debian 8.4, kernel 4.4.15, OpenAFS master
f14d263a73f0be75e4de92f62e836fb2e55680dd. I see the gerrit for reverting on
master is not in yet, so that's not it. Tried increasing the frequency of
afs_ShakeLooseVCaches.

A smaller git repo (e.g. openafs-robotest) never seems to trip the CWD bug.

Test method:
[vagrant@openafs-debian-dev:/afs/.robotest/test] $ mkdir g; cd g; git clone
git://gerrit.openafs.org/openafs.git;sleep 180;git log
Cloning into 'openafs'...
remote: Counting objects: 192945, done.
remote: Compressing objects: 100% (46882/46882), done.
remote: Total 192945 (delta 159381), reused 177218 (delta 145040)
Receiving objects: 100% (192945/192945), 71.80 MiB | 7.31 MiB/s, done.
Resolving deltas: 100% (159381/159381), done.
Checking connectivity... done.
Checking out files: 100% (5563/5563), done.
fatal: Unable to read current working directory: No such file or directory
[vagrant@openafs-debian-dev:/afs/.robotest/test/g] 3m31s 128 $ date
Fri Jul 15 21:39:16 UTC 2016
[vagrant@openafs-debian-dev:/afs/.robotest/test/g] $

I don't see anything obvious in the log files, though I am not sure what I
would be looking for.
FileLog:
==> /usr/afs/logs/FileLog <==
Fri Jul 15 21:37:43 2016 SAFS_GiveUpCallBacks (Noffids=32)
Fri Jul 15 21:37:43 2016 SAFS_GiveUpCallBacks (Noffids=32)
Fri Jul 15 21:37:43 2016 SAFS_GiveUpCallBacks (Noffids=32)
Fri Jul 15 21:37:43 2016 SAFS_GiveUpCallBacks (Noffids=17)
Fri Jul 15 21:38:51 2016 SRXAFS_FetchData, Fid = 536870915.1.1
Fri Jul 15 21:38:51 2016 SAFS_FetchStatus,  Fid = 536870915.4.3, Host
10.0.2.15:7001, Id 1
Fri Jul 15 21:38:51 2016 SRXAFS_FetchData, Fid = 536870915.4.3
Fri Jul 15 21:38:51 2016 SRXAFS_FetchData, Fid = 536870918.1.1
Fri Jul 15 21:38:51 2016 SRXAFS_FetchData, Fid = 536870918.3.12879
Fri Jul 15 21:39:33 2016 SAFS_GiveUpCallBacks (Noffids=2)

So how do we repeat this more reliably?

Cheers,
Joe


On Sat, Jun 18, 2016 at 2:27 PM, Stephan Wiesand <stephan.wies...@desy.de>
wrote:

> Joe,
>
> thanks for the feedback.
>
> On Jun 17, 2016, at 23:39 , Joe Gorse wrote:
> > FWIW, I am able to reproduce a "cwd" message for "git log" command on
> > Fedora 23, 4.5.6-200.fc23.x86_64. "git log" reads:
> >
> > fatal: Unable to read current working directory: No such file or
> directory
> >
> > Though it should read:
> >
> > fatal: Not a git repository (or any of the parent directories): .git
>
> Exactly the problem I'm seeing.
> >
> > However, I am not having any trouble with the git checkout. It seems to
> > consistently work on Fedora 23. Even the "git checkout
> > openafs-stable-1_6_18". Perhaps try on 4.5.6 on Fedora?
>
> That's where I started, see the first message in this thread. And in all
> cases, the git checkout actually works. I'm just using it to trigger the
> cwd problem - there are probably many other ways. Note that also "git log"
> is just one way of exposing the problem
>
> > Though I have seen some more of this issue on Debian 8 with kernel 4.6.0.
> > Three of three tests failed to checkout the openafs tree on this system.
> I
> > will test some other kernels on this system later and note anything
> > interesting.
>
> Sounds even considerably worse :-(
>
> Any errors logged? Does the client actually have some variant of gerrit
> 12228 applied?
>
> Cheers,
>         Stephan
>
> > Cheers,
> > Joe
> >
> > On Fri, Jun 17, 2016 at 11:30 AM, Stephan Wiesand <
> stephan.wies...@desy.de>
> > wrote:
> >
> >>
> >> On Jun 17, 2016, at 04:45 , Benjamin Kaduk wrote:
> >>
> >>> On Thu, 16 Jun 2016, Stephan Wiesand wrote:
> >>>
> >>>> I smoke tested what was planned to be OpenAFS 1.6.18.1, as discussed
> in
> >> yesterday's release team meeting, on a Fedora 23 x86_64 VM with kernel
> >> 4.5.6-200 today. The result was disappointing:
> >>>>
> >>>> git clone git://gerrit.openafs.org/openafs.git
> >>>
> >>> Is the pwd the root of a volume?
> >>
> >> No, everything happens at least one level below.
> >>
> >>>> cd openafs
> >>>> git log
> >>>> # scrolled through a few dozen changes, took a couple of seconds
> >>>> git checkout openafs-stable-1_6_18
> >>>>
> >>>> At this point I got the following error:
> >>>>
> >>>> fatal: Unable to read current working directory: No such file or
> >> directory
> >>>>
> >>>> A "cd; cd -" cures this for a while, and there's no apparent data
> >> corruption. I'm still worried. The problem isn't 100% reproducible, but
> it
> >> doesn't take too may tries checking out random tags or branches.
> >>>>
> >>>> This was plain 1.6.18 + gerrit 12300 12301 12302 12274.
> >>>>
> >>>> Cache is on ext4, no separate partition, default size as set by our
> RPM
> >> (I think 100MB, but I don't have access to the VM right now to check).
> >>>>
> >>>> The small cache size may contribute to the problem. But I found no
> >> errors logged anywhere, and this shouldn't happen no matter how small
> the
> >> cache is.
> >>>
> >>> Please check if the cmdebug output is empty (I expect it is, but it is
> >>> good to check).
> >>
> >> It is empty.
> >>
> >>>> NB we have a user report of exactly this problem happening frequently
> >> while just editing files in a local git repo in AFS space. The data is a
> >> bit sketchy, but it's probably Ubuntu 14.04 with its current default
> kernel
> >> and the openafs packages from Anders' ppa. I'll try to get us more data.
> >>>>
> >>>>
> >>>> Any thoughts? For the time being I'm considering this a showstopper
> for
> >>>> 1.6.18.1, and it looks like we're not quite there yet regarding Linux
> >>>> 4.5, let alone 4.6 or the 4.7 due in a few weeks :-(
> >>>
> >>> Can you run the same test on a 4.4 kernel for comparison?
> >>
> >> I tried under the last F22 kernel, 4.4.6-200.fc22. And ok, it's not 4.5
> >> specific, though it seems to happen more frequently with 4.5.2 than with
> >> 4.4.6.
> >>
> >> By chance I found a pretty reliable reproducer:
> >>
> >>        cd /vol/ume/root
> >>        mkdir g; cd g
> >>        git clone git://gerrit.openafs.org/openafs.git; sleep 180; git
> log
> >>
> >> Note indeed no "cd openafs". Of course this should complain about the
> cwd
> >> not being a git repo. But most of the time it will complain about the
> cwd
> >> issue instead.
> >>
> >> I'm planning to verify that plain 1.6.18 behaves the same on 4.4.6, and
> if
> >> it does I'll proceed with the 1.6.18.1 release.
> >>
> >> I couldn't reproduce this with any EL clients, but those have larger
> >> caches (it's indeed 100 MB on that Fedora VM), so there's more to test.
> >> Help welcome...
> >>
> >>
> >> _______________________________________________
> >> OpenAFS-devel mailing list
> >> OpenAFS-devel@openafs.org
> >> https://lists.openafs.org/mailman/listinfo/openafs-devel
> >>
> >
> >
> >
> > --
> > Joe Gorse
> >
> > C: 440-552-0730
> > LI: Joe Gorse <http://www.linkedin.com/pub/joe-gorse/7/12/397>
>
> --
> Stephan Wiesand
> DESY -DV-
> Platanenenallee 6
> 15738 Zeuthen, Germany
>
>


-- 
Joe Gorse

C: 440-552-0730
LI: Joe Gorse <http://www.linkedin.com/pub/joe-gorse/7/12/397>

Reply via email to