Re: [Gluster-devel] [Gluster-users] gluster-block v0.4 is alive!
On Mon, May 20, 2019 at 9:05 PM Vlad Kopylov wrote: > > Thank you Prasanna. > > Do we have architecture somewhere? Vlad, Although the complete set of details might be missing at one place right now, some pointers to start are available at, https://github.com/gluster/gluster-block#gluster-block and https://pkalever.wordpress.com/2019/05/06/starting-with-gluster-block, hopefully that should give some clarity about the project. Also checkout the man pages. > Dies it bypass Fuse and go directly gfapi ? yes, we don't use Fuse access with gluster-block. The management as-well-as IO happens over gfapi. Please go through the docs pointed above, if you have any specific queries, feel free to ask them here or on github. Best Regards, -- Prasanna > > v > > On Mon, May 20, 2019, 8:36 AM Prasanna Kalever wrote: >> >> Hey Vlad, >> >> Thanks for trying gluster-block. Appreciate your feedback. >> >> Here is the patch which should fix the issue you have noticed: >> https://github.com/gluster/gluster-block/pull/233 >> >> Thanks! >> -- >> Prasanna >> >> On Sat, May 18, 2019 at 4:48 AM Vlad Kopylov wrote: >> > >> > >> > straight from >> > >> > ./autogen.sh && ./configure && make -j install >> > >> > >> > CentOS Linux release 7.6.1810 (Core) >> > >> > >> > May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No such >> > file or directory >> > May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr. >> > May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992] >> > CRIT: trying to change logDir from /var/log/gluster-block to >> > /var/log/gluster-block [at utils.c+495 :] >> > May 17 19:13:19 vm2 gluster-blockd[24294]: No such path >> > /backstores/user:glfs >> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process >> > exited, code=exited, status=1/FAILURE >> > May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered failed >> > state. >> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed. >> > >> > >> > >> > On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever >> > wrote: >> >> >> >> Hello Gluster folks, >> >> >> >> Gluster-block team is happy to announce the v0.4 release [1]. >> >> >> >> This is the new stable version of gluster-block, lots of new and >> >> exciting features and interesting bug fixes are made available as part >> >> of this release. >> >> Please find the big list of release highlights and notable fixes at [2]. >> >> >> >> Details about installation can be found in the easy install guide at >> >> [3]. Find the details about prerequisites and setup guide at [4]. >> >> If you are a new user, checkout the demo video attached in the README >> >> doc [5], which will be a good source of intro to the project. >> >> There are good examples about how to use gluster-block both in the man >> >> pages [6] and test file [7] (also in the README). >> >> >> >> gluster-block is part of fedora package collection, an updated package >> >> with release version v0.4 will be soon made available. And the >> >> community provided packages will be soon made available at [8]. >> >> >> >> Please spend a minute to report any kind of issue that comes to your >> >> notice with this handy link [9]. >> >> We look forward to your feedback, which will help gluster-block get >> >> better! >> >> >> >> We would like to thank all our users, contributors for bug filing and >> >> fixes, also the whole team who involved in the huge effort with >> >> pre-release testing. >> >> >> >> >> >> [1] https://github.com/gluster/gluster-block >> >> [2] https://github.com/gluster/gluster-block/releases >> >> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL >> >> [4] https://github.com/gluster/gluster-block#usage >> >> [5] https://github.com/gluster/gluster-block/blob/master/README.md >> >> [6] https://github.com/gluster/gluster-block/tree/master/docs >> >> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t >> >> [8] https://download.gluster.org/pub/gluster/gluster-block/ >> >> [9] https://github.com/gluster/gluster-block/issues/new >> >> >> >> Cheers, >> >> Team Gluster-Block! >> >> ___ >> >> Gluster-users mailing list >> >> gluster-us...@gluster.org >> >> https://lists.gluster.org/mailman/listinfo/gluster-users ___ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] tests are timing out in master branch
Looks like after reverting a patch on RPC layer reconnection logic ( https://review.gluster.org/22750) things are back to normal. For those who submitted a patch in last 1 week, please resubmit. (which should take care of rebasing on top of this patch). This event proves that there are very delicate races in our RPC layer, which can trigger random failures. While it was discussed in brief earlier. We need to debug this further, and come up with possible next actions. Volunteers welcome. I recommend to use https://github.com/gluster/glusterfs/issues/391 to capture our observations, and continue on github from here. -Amar On Wed, May 15, 2019 at 11:46 AM Sankarshan Mukhopadhyay < sankarshan.mukhopadh...@gmail.com> wrote: > On Wed, May 15, 2019 at 11:24 AM Atin Mukherjee > wrote: > > > > There're random tests which are timing out after 200 secs. My belief is > this is a major regression introduced by some commit recently or the > builders have become extremely slow which I highly doubt. I'd request that > we first figure out the cause, get master back to it's proper health and > then get back to the review/merge queue. > > > > For such dire situations, we also need to consider a proposal to back > out patches in order to keep the master healthy. The outcome we seek > is a healthy master - the isolation of the cause allows us to not > repeat the same offense. > > > Sanju has already started looking into > /tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t to understand > what test is specifically hanging and consuming more time. > ___ > Atin Mukherjee , Sankarshan Mukhopadhyay < > sankarshan.mukhopadh...@gmail.com> > Community Meeting Calendar: > > APAC Schedule -https://review.gluster.org/22750 > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/836554017 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/486278655 > > Gluster-devel mailing list > Gluster-devel@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > > -- Amar Tumballi (amarts) ___ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] VMs blocked for more than 120 seconds
Hi Martin, Glad it worked! And yes, 3.7.6 is really old! :) So the issue is occurring when the vm flushes outstanding data to disk. And this is taking > 120s because there's lot of buffered writes to flush, possibly followed by an fsync too which needs to sync them to disk (volume profile would have been helpful in confirming this). All these two options do is to truly honor O_DIRECT flag (which is what we want anyway given the vms are opened with 'cache=none' qemu option). This will skip write-caching on gluster client side and also bypass the page-cache on the gluster-bricks, and so data gets flushed faster, thereby eliminating these timeouts. -Krutika On Mon, May 20, 2019 at 3:38 PM Martin wrote: > Hi Krutika, > > Also, gluster version please? > > I am running old 3.7.6. (Yes I know I should upgrade asap) > > I’ve applied firstly "network.remote-dio off", behaviour did not changed, > VMs got stuck after some time again. > Then I’ve set "performance.strict-o-direct on" and problem completly > disappeared. No more stucks at all (7 days without any problems at all). > This SOLVED the issue. > > Can you explain what remote-dio and strict-o-direct variables changed in > behaviour of my Gluster? It would be great for later archive/users to > understand what and why this solved my issue. > > Anyway, Thanks a LOT!!! > > BR, > Martin > > On 13 May 2019, at 10:20, Krutika Dhananjay wrote: > > OK. In that case, can you check if the following two changes help: > > # gluster volume set $VOL network.remote-dio off > # gluster volume set $VOL performance.strict-o-direct on > > preferably one option changed at a time, its impact tested and then the > next change applied and tested. > > Also, gluster version please? > > -Krutika > > On Mon, May 13, 2019 at 1:02 PM Martin Toth wrote: > >> Cache in qemu is none. That should be correct. This is full command : >> >> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine >> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp >> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 >> -no-user-config -nodefaults -chardev >> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait >> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime >> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device >> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 >> >> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 >> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 >> -drive file=/var/lib/one//datastores/116/312/*disk.0* >> ,format=raw,if=none,id=drive-virtio-disk1,cache=none >> -device >> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 >> -drive file=gluster://localhost:24007/imagestore/ >> *7b64d6757acc47a39503f68731f89b8e* >> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none >> -device >> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 >> -drive file=/var/lib/one//datastores/116/312/*disk.1* >> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on >> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 >> >> -netdev tap,fd=26,id=hostnet0 >> -device >> e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 >> -chardev pty,id=charserial0 -device >> isa-serial,chardev=charserial0,id=serial0 >> -chardev >> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait >> -device >> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 >> -vnc 0.0.0.0:312,password -device >> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device >> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on >> >> I’ve highlighted disks. First is VM context disk - Fuse used, second is >> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used. >> >> Krutika, >> I will start profiling on Gluster Volumes and wait for next VM to fail. >> Than I will attach/send profiling info after some VM will be failed. I >> suppose this is correct profiling strategy. >> > > About this, how many vms do you need to recreate it? A single vm? Or > multiple vms doing IO in parallel? > > >> Thanks, >> BR! >> Martin >> >> On 13 May 2019, at 09:21, Krutika Dhananjay wrote: >> >> Also, what's the caching policy that qemu is using on the affected vms? >> Is it cache=none? Or something else? You can get this information in the >> command line of qemu-kvm process corresponding to your vm in the ps output. >> >> -Krutika >> >> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay >> wrote: >> >>> What version of gluster are you using? >>> Also, can you capture and share volume-profile output for a run where >>> you manage to recreate this issue? >>> >>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command >>> Let me know if you have any
[Gluster-devel] glusterfs coredump--mempool
Hi glusterfs expert, I meet glusterfs process coredump again in my env, short after glusterfs process startup. The local become NULL, but seems this frame is not destroyed yet since the magic number(GF_MEM_HEADER_MAGIC) still untouched. Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs --acl --volfile-server=mn-0.local --volfile-server=mn-1.loc'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x7f867fcd2971 in client3_3_inodelk_cbk (req=, iov=, count=, myframe=0x7f8654008830) at client-rpc-fops.c:1510 1510 CLIENT_STACK_UNWIND (inodelk, frame, rsp.op_ret, [Current thread is 1 (Thread 0x7f867d6d4700 (LWP 3046))] Missing separate debuginfos, use: dnf debuginfo-install glusterfs-fuse-3.12.15-1.wos2.wf29.x86_64 (gdb) bt #0 0x7f867fcd2971 in client3_3_inodelk_cbk (req=, iov=, count=, myframe=0x7f8654008830) at client-rpc-fops.c:1510 #1 0x7f8685ea5584 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f8678070030, pollin=pollin@entry=0x7f86702833e0) at rpc-clnt.c:782 #2 0x7f8685ea587b in rpc_clnt_notify (trans=, mydata=0x7f8678070060, event=, data=0x7f86702833e0) at rpc-clnt.c:975 #3 0x7f8685ea1b83 in rpc_transport_notify (this=this@entry=0x7f8678070270, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f86702833e0) at rpc-transport.c:538 #4 0x7f8680b99867 in socket_event_poll_in (notify_handled=_gf_true, this=0x7f8678070270) at socket.c:2260 #5 socket_event_handler (fd=, idx=3, gen=1, data=0x7f8678070270, poll_in=, poll_out=, poll_err=) at socket.c:2645 #6 0x7f8686132911 in event_dispatch_epoll_handler (event=0x7f867d6d3e6c, event_pool=0x55e1b2792b00) at event-epoll.c:583 #7 event_dispatch_epoll_worker (data=0x7f867805ece0) at event-epoll.c:659 #8 0x7f8684ea65da in start_thread () from /lib64/libpthread.so.0 #9 0x7f868474eeaf in clone () from /lib64/libc.so.6 (gdb) print *(call_frame_t*)myframe $3 = {root = 0x7f86540271a0, parent = 0x0, frames = {next = 0x7f8654027898, prev = 0x7f8654027898}, local = 0x0, this = 0x7f8678013080, ret = 0x0, ref_count = 0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' , __align = 0}}, cookie = 0x0, complete = _gf_false, xid = 0, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x0, wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0} (gdb) x/4xw 0x7f8654008810 0x7f8654008810: 0xcafebabe 0x 0x 0x (gdb) p *(pooled_obj_hdr_t *)0x7f8654008810 $2 = {magic = 3405691582, next = 0x0, pool_list = 0x7f8654000b80, power_of_two = 8} I add "uint32_t xid" in data structure _call_frame, and set it according to the rcpreq->xid in __save_frame function. In normal situation this xid should only be 0 immediately after create_frame from memory pool. But in this case this xid is 0, so seems like that the frame has been given out for use before freed. Have you any idea how this happen? cynthia ___ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel