On Fri, Nov 11, 2016 at 1:15 PM, songxin <songxin_1...@126.com> wrote:
> Hi Atin, > Thank you for your reply. > Actually it is very difficult to reproduce because I don't know when there > was an ongoing commit happening.It is just a coincidence. > But I want to make sure the root cause. > I'll give it a another try and see if this situation can be simulated/reproduced and will keep you posted. > > So I would be grateful if you could answer my questions below. > > You said that "This issue is hit at part of the negative testing where > while gluster volume set was executed at the same point of time glusterd in > another instance was brought down. In the faulty node we could see > /var/lib/glusterd/vols/<volname>info file been empty whereas the info.tmp > file has the correct contents." in comment. > > I have two questions for you. > > 1.Could you reproduce this issue by gluster volume set glusterd which was > brought down? > 2.Could you be certain that this issue is cause by rename is interrupted in > kernel? > > In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, > are both empty. > But in my view only one rename can be running at the same time because of the > big lock. > Why there are two files are empty? > > > Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") be > running in two thread? > > Thanks, > Xin > > > 在 2016-11-11 15:27:03,"Atin Mukherjee" <amukh...@redhat.com> 写道: > > > > On Fri, Nov 11, 2016 at 12:38 PM, songxin <songxin_1...@126.com> wrote: > >> >> Hi Atin, >> Thank you for your reply. >> >> As you said that the info file can only be changed in the >> glusterd_store_volinfo() >> sequentially because of the big lock. >> >> I have found the similar issue as below that you mentioned. >> https://bugzilla.redhat.com/show_bug.cgi?id=1308487 >> > > Great, so this is what I was actually trying to refer in my first email > that I saw a similar issue. Have you got a chance to look at > https://bugzilla.redhat.com/show_bug.cgi?id=1308487#c4 ? But in your > case, did you try to bring down glusterd when there was an ongoing commit > happening? > > >> >> You said that "This issue is hit at part of the negative testing where >> while gluster volume set was executed at the same point of time glusterd in >> another instance was brought down. In the faulty node we could see >> /var/lib/glusterd/vols/<volname>info file been empty whereas the >> info.tmp file has the correct contents." in comment. >> >> I have two questions for you. >> >> 1.Could you reproduce this issue by gluster volume set glusterd which was >> brought down? >> 2.Could you be certain that this issue is cause by rename is interrupted in >> kernel? >> >> In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, >> are both empty. >> But in my view only one rename can be running at the same time because of >> the big lock. >> Why there are two files are empty? >> >> >> Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") be >> running in two thread? >> >> Thanks, >> Xin >> >> >> >> >> 在 2016-11-11 14:36:40,"Atin Mukherjee" <amukh...@redhat.com> 写道: >> >> >> >> On Fri, Nov 11, 2016 at 8:33 AM, songxin <songxin_1...@126.com> wrote: >> >>> Hi Atin, >>> >>> Thank you for your reply. >>> I have two questions for you. >>> >>> 1.Are the two files info and info.tmp are only to be created or changed >>> in function glusterd_store_volinfo()? I did not find other point in which >>> the two file are changed. >>> >> >> If we are talking about info file volume then yes, the mentioned function >> actually takes care of it. >> >> >>> 2.I found that glusterd_store_volinfo() will be call in many point by >>> glusterd.Is there a problem of thread synchronization?If so, one thread may >>> open a same file info.tmp using O_TRUNC flag when another thread is >>> writing the info,tmp.Could this case happen? >>> >> >> In glusterd threads are big lock protected and I don't see a possibility >> (theoretically) to have two glusterd_store_volinfo () calls at a given >> point of time. >> >> >>> >>> Thanks, >>> Xin >>> >>> >>> At 2016-11-10 21:41:06, "Atin Mukherjee" <amukh...@redhat.com> wrote: >>> >>> Did you run out of disk space by any chance? AFAIK, the code is like we >>> write new stuffs to .tmp file and rename it back to the original file. In >>> case of a disk space issue I expect both the files to be of non zero size. >>> But having said that I vaguely remember a similar issue (in the form of a >>> bug or an email) landed up once but we couldn't reproduce it, so something >>> is wrong with the atomic update here is what I guess. I'll be glad if you >>> have a reproducer for the same and then we can dig into it further. >>> >>> On Thu, Nov 10, 2016 at 1:32 PM, songxin <songxin_1...@126.com> wrote: >>> >>>> Hi, >>>> When I start the glusterd some error happened. >>>> And the log is following. >>>> >>>> [2016-11-08 07:58:34.989365] I [MSGID: 100030] [glusterfsd.c:2318:main] >>>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 >>>> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) >>>> [2016-11-08 07:58:34.998356] I [MSGID: 106478] [glusterd.c:1350:init] >>>> 0-management: Maximum allowed open file descriptors set to 65536 >>>> [2016-11-08 07:58:35.000667] I [MSGID: 106479] [glusterd.c:1399:init] >>>> 0-management: Using /system/glusterd as working directory >>>> [2016-11-08 07:58:35.024508] I [MSGID: 106514] >>>> [glusterd-store.c:2075:glusterd_restore_op_version] 0-management: >>>> Upgrade detected. Setting op-version to minimum : 1 >>>> *[2016-11-08 07:58:35.025356] E [MSGID: 106206] >>>> [glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: Failed >>>> to get next store iter * >>>> *[2016-11-08 07:58:35.025401] E [MSGID: 106207] >>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: Failed >>>> to update volinfo for c_glusterfs volume * >>>> *[2016-11-08 07:58:35.025463] E [MSGID: 106201] >>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: >>>> Unable to restore volume: c_glusterfs * >>>> *[2016-11-08 07:58:35.025544] E [MSGID: 101019] >>>> [xlator.c:428:xlator_init] 0-management: Initialization of volume >>>> 'management' failed, review your volfile again * >>>> *[2016-11-08 07:58:35.025582] E [graph.c:322:glusterfs_graph_init] >>>> 0-management: initializing translator failed * >>>> *[2016-11-08 07:58:35.025629] E [graph.c:661:glusterfs_graph_activate] >>>> 0-graph: init failed * >>>> [2016-11-08 07:58:35.026109] W [glusterfsd.c:1236:cleanup_and_exit] >>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260) [0x1000a718] >>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8) [0x1000a5a8] >>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c) [0x100098bc] ) 0-: >>>> received signum (0), shutting down >>>> >>>> >>>> And then I found that the size of vols/volume_name/info is 0.It cause >>>> glusterd shutdown. >>>> But I found that vols/volume_name_info.tmp is not 0. >>>> And I found that there is a brick file vols/volume_name/bricks/xxxx.brick >>>> is 0, but vols/volume_name/bricks/xxxx.brick.tmp is not 0. >>>> >>>> I read the function code glusterd_store_volinfo () in glusterd-store.c >>>> . >>>> I know that the info.tmp will be rename to info in function >>>> glusterd_store_volume_atomic_update(). >>>> >>>> But my question is that why the info file is 0 but info.tmp is not 0. >>>> >>>> >>>> Thanks, >>>> Xin >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users@gluster.org >>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>> >>> >>> >>> >>> -- >>> >>> ~ Atin (atinm) >>> >>> >>> >>> >>> >> >> >> >> -- >> >> ~ Atin (atinm) >> >> >> >> >> > > > > -- > > ~ Atin (atinm) > > > > > -- ~ Atin (atinm)
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users