On Tue, Nov 15, 2016 at 8:58 AM, songxin <songxin_1...@126.com> wrote:
> Hi Atin, > I have some clues about this issue. > I could reproduce this issue use the scrip that mentioned in > https://bugzilla.redhat.com/show_bug.cgi?id=1308487 . > I really appreciate your help in trying to nail down this issue. While I am at your email and going through the code to figure out the possible cause for it, unfortunately I don't see any script in the attachment of the bug. Could you please cross check? > > After I added some debug print,which like below, in glusterd-store.c and I > found that the /var/lib/glusterd/vols/xxx/info and > /var/lib/glusterd/vols/xxx/bricks/* > are removed. > But other files in /var/lib/glusterd/vols/xxx/ will not be remove. > > int32_t > glusterd_store_volinfo (glusterd_volinfo_t *volinfo, > glusterd_volinfo_ver_ac_t ac) > { > int32_t ret = -1; > > GF_ASSERT (volinfo) > > ret = access("/var/lib/glusterd/vols/gv0/info", F_OK); > if(ret < 0) > { > gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info is not > exit(%d)", errno); > } > else > { > ret = stat("/var/lib/glusterd/vols/gv0/info", &buf); > if(ret < 0) > { > gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "stat info > error"); > } > else > { > gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info size > is %lu, inode num is %lu", buf.st_size, buf.st_ino); > } > } > > glusterd_perform_volinfo_version_action (volinfo, ac); > ret = glusterd_store_create_volume_dir (volinfo); > if (ret) > goto out; > > ... > } > > So it is easy to understand why the info or 10.32.1.144.-opt-lvmdir-c2- > brick sometimes is empty. > It is becaue the info file is not exist, and it will be create by “fd = > open (path, O_RDWR | O_CREAT | O_APPEND, 0600);” in function > gf_store_handle_new. > And the info file is empty before rename. > So the info file is empty if glusterd shutdown before rename. > > > My question is following. > 1.I did not find the point the info is removed.Could you tell me the point > where the info and /bricks/* are removed? > 2.why the file info and bricks/* is removed?But other files in > var/lib/glusterd/vols/xxx/ > are not be removed? > AFAIK, we never delete the info file and hence this file is opened with O_APPEND flag. As I said I will go back and cross check the code once again. > Thanks, > Xin > > > 在 2016-11-11 20:34:05,"Atin Mukherjee" <amukh...@redhat.com> 写道: > > > > On Fri, Nov 11, 2016 at 4:00 PM, songxin <songxin_1...@126.com> wrote: > >> Hi Atin, >> >> Thank you for your support. >> Sincerely wait for your reply. >> >> By the way, could you make sure that the issue, file info is empty, cause >> by rename is interrupted in kernel? >> > > As per my RCA on that bug, it looked to be. > > >> >> Thanks, >> Xin >> >> 在 2016-11-11 15:49:02,"Atin Mukherjee" <amukh...@redhat.com> 写道: >> >> >> >> On Fri, Nov 11, 2016 at 1:15 PM, songxin <songxin_1...@126.com> wrote: >> >>> Hi Atin, >>> Thank you for your reply. >>> Actually it is very difficult to reproduce because I don't know when there >>> was an ongoing commit happening.It is just a coincidence. >>> But I want to make sure the root cause. >>> >> >> I'll give it a another try and see if this situation can be >> simulated/reproduced and will keep you posted. >> >> >>> >>> So I would be grateful if you could answer my questions below. >>> >>> You said that "This issue is hit at part of the negative testing where >>> while gluster volume set was executed at the same point of time glusterd in >>> another instance was brought down. In the faulty node we could see >>> /var/lib/glusterd/vols/<volname>info file been empty whereas the >>> info.tmp file has the correct contents." in comment. >>> >>> I have two questions for you. >>> >>> 1.Could you reproduce this issue by gluster volume set glusterd which was >>> brought down? >>> 2.Could you be certain that this issue is cause by rename is interrupted in >>> kernel? >>> >>> In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, >>> are both empty. >>> But in my view only one rename can be running at the same time because of >>> the big lock. >>> Why there are two files are empty? >>> >>> >>> Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") >>> be running in two thread? >>> >>> Thanks, >>> Xin >>> >>> >>> 在 2016-11-11 15:27:03,"Atin Mukherjee" <amukh...@redhat.com> 写道: >>> >>> >>> >>> On Fri, Nov 11, 2016 at 12:38 PM, songxin <songxin_1...@126.com> wrote: >>> >>>> >>>> Hi Atin, >>>> Thank you for your reply. >>>> >>>> As you said that the info file can only be changed in the >>>> glusterd_store_volinfo() >>>> sequentially because of the big lock. >>>> >>>> I have found the similar issue as below that you mentioned. >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487 >>>> >>> >>> Great, so this is what I was actually trying to refer in my first email >>> that I saw a similar issue. Have you got a chance to look at >>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487#c4 ? But in your >>> case, did you try to bring down glusterd when there was an ongoing commit >>> happening? >>> >>> >>>> >>>> You said that "This issue is hit at part of the negative testing where >>>> while gluster volume set was executed at the same point of time glusterd in >>>> another instance was brought down. In the faulty node we could see >>>> /var/lib/glusterd/vols/<volname>info file been empty whereas the >>>> info.tmp file has the correct contents." in comment. >>>> >>>> I have two questions for you. >>>> >>>> 1.Could you reproduce this issue by gluster volume set glusterd which was >>>> brought down? >>>> 2.Could you be certain that this issue is cause by rename is interrupted >>>> in kernel? >>>> >>>> In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, >>>> are both empty. >>>> But in my view only one rename can be running at the same time because of >>>> the big lock. >>>> Why there are two files are empty? >>>> >>>> >>>> Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") >>>> be running in two thread? >>>> >>>> Thanks, >>>> Xin >>>> >>>> >>>> >>>> >>>> 在 2016-11-11 14:36:40,"Atin Mukherjee" <amukh...@redhat.com> 写道: >>>> >>>> >>>> >>>> On Fri, Nov 11, 2016 at 8:33 AM, songxin <songxin_1...@126.com> wrote: >>>> >>>>> Hi Atin, >>>>> >>>>> Thank you for your reply. >>>>> I have two questions for you. >>>>> >>>>> 1.Are the two files info and info.tmp are only to be created or >>>>> changed in function glusterd_store_volinfo()? I did not find other point >>>>> in >>>>> which the two file are changed. >>>>> >>>> >>>> If we are talking about info file volume then yes, the mentioned >>>> function actually takes care of it. >>>> >>>> >>>>> 2.I found that glusterd_store_volinfo() will be call in many point by >>>>> glusterd.Is there a problem of thread synchronization?If so, one thread >>>>> may >>>>> open a same file info.tmp using O_TRUNC flag when another thread is >>>>> writing the info,tmp.Could this case happen? >>>>> >>>> >>>> In glusterd threads are big lock protected and I don't see a >>>> possibility (theoretically) to have two glusterd_store_volinfo () calls at >>>> a given point of time. >>>> >>>> >>>>> >>>>> Thanks, >>>>> Xin >>>>> >>>>> >>>>> At 2016-11-10 21:41:06, "Atin Mukherjee" <amukh...@redhat.com> wrote: >>>>> >>>>> Did you run out of disk space by any chance? AFAIK, the code is like >>>>> we write new stuffs to .tmp file and rename it back to the original file. >>>>> In case of a disk space issue I expect both the files to be of non zero >>>>> size. But having said that I vaguely remember a similar issue (in the form >>>>> of a bug or an email) landed up once but we couldn't reproduce it, so >>>>> something is wrong with the atomic update here is what I guess. I'll be >>>>> glad if you have a reproducer for the same and then we can dig into it >>>>> further. >>>>> >>>>> On Thu, Nov 10, 2016 at 1:32 PM, songxin <songxin_1...@126.com> wrote: >>>>> >>>>>> Hi, >>>>>> When I start the glusterd some error happened. >>>>>> And the log is following. >>>>>> >>>>>> [2016-11-08 07:58:34.989365] I [MSGID: 100030] >>>>>> [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running >>>>>> /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd -p >>>>>> /var/run/glusterd.pid --log-level INFO) >>>>>> [2016-11-08 07:58:34.998356] I [MSGID: 106478] [glusterd.c:1350:init] >>>>>> 0-management: Maximum allowed open file descriptors set to 65536 >>>>>> [2016-11-08 07:58:35.000667] I [MSGID: 106479] [glusterd.c:1399:init] >>>>>> 0-management: Using /system/glusterd as working directory >>>>>> [2016-11-08 07:58:35.024508] I [MSGID: 106514] >>>>>> [glusterd-store.c:2075:glusterd_restore_op_version] 0-management: >>>>>> Upgrade detected. Setting op-version to minimum : 1 >>>>>> *[2016-11-08 07:58:35.025356] E [MSGID: 106206] >>>>>> [glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: >>>>>> Failed >>>>>> to get next store iter * >>>>>> *[2016-11-08 07:58:35.025401] E [MSGID: 106207] >>>>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: >>>>>> Failed >>>>>> to update volinfo for c_glusterfs volume * >>>>>> *[2016-11-08 07:58:35.025463] E [MSGID: 106201] >>>>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: >>>>>> Unable to restore volume: c_glusterfs * >>>>>> *[2016-11-08 07:58:35.025544] E [MSGID: 101019] >>>>>> [xlator.c:428:xlator_init] 0-management: Initialization of volume >>>>>> 'management' failed, review your volfile again * >>>>>> *[2016-11-08 07:58:35.025582] E [graph.c:322:glusterfs_graph_init] >>>>>> 0-management: initializing translator failed * >>>>>> *[2016-11-08 07:58:35.025629] E >>>>>> [graph.c:661:glusterfs_graph_activate] 0-graph: init failed * >>>>>> [2016-11-08 07:58:35.026109] W [glusterfsd.c:1236:cleanup_and_exit] >>>>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260) [0x1000a718] >>>>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8) [0x1000a5a8] >>>>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c) [0x100098bc] ) 0-: >>>>>> received signum (0), shutting down >>>>>> >>>>>> >>>>>> And then I found that the size of vols/volume_name/info is 0.It cause >>>>>> glusterd shutdown. >>>>>> But I found that vols/volume_name_info.tmp is not 0. >>>>>> And I found that there is a brick file vols/volume_name/bricks/xxxx.brick >>>>>> is 0, but vols/volume_name/bricks/xxxx.brick.tmp is not 0. >>>>>> >>>>>> I read the function code glusterd_store_volinfo () in >>>>>> glusterd-store.c . >>>>>> I know that the info.tmp will be rename to info in function >>>>>> glusterd_store_volume_atomic_update(). >>>>>> >>>>>> But my question is that why the info file is 0 but info.tmp is not 0. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Xin >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users@gluster.org >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ~ Atin (atinm) >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> ~ Atin (atinm) >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> >>> ~ Atin (atinm) >>> >>> >>> >>> >>> >> >> >> >> -- >> >> ~ Atin (atinm) >> >> >> >> >> > > > > -- > > ~ Atin (atinm) > > > > > -- ~ Atin (atinm)
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users