Re: [Gluster-devel] [Gluster-users] gluster-block v0.4 is alive!

2019-05-21 Thread Prasanna Kalever
On Mon, May 20, 2019 at 9:05 PM Vlad Kopylov  wrote:
>
> Thank you Prasanna.
>
> Do we have architecture somewhere?

Vlad,

Although the complete set of details might be missing at one place
right now, some pointers to start are available at,
https://github.com/gluster/gluster-block#gluster-block and
https://pkalever.wordpress.com/2019/05/06/starting-with-gluster-block,
hopefully that should give some clarity about the project. Also
checkout the man pages.

> Dies it bypass Fuse and go directly gfapi ?

yes, we don't use Fuse access with gluster-block. The management
as-well-as IO happens over gfapi.

Please go through the docs pointed above, if you have any specific
queries, feel free to ask them here or on github.

Best Regards,
--
Prasanna

>
> v
>
> On Mon, May 20, 2019, 8:36 AM Prasanna Kalever  wrote:
>>
>> Hey Vlad,
>>
>> Thanks for trying gluster-block. Appreciate your feedback.
>>
>> Here is the patch which should fix the issue you have noticed:
>> https://github.com/gluster/gluster-block/pull/233
>>
>> Thanks!
>> --
>> Prasanna
>>
>> On Sat, May 18, 2019 at 4:48 AM Vlad Kopylov  wrote:
>> >
>> >
>> > straight from
>> >
>> > ./autogen.sh && ./configure && make -j install
>> >
>> >
>> > CentOS Linux release 7.6.1810 (Core)
>> >
>> >
>> > May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No such 
>> > file or directory
>> > May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr.
>> > May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992] 
>> > CRIT: trying to change logDir from /var/log/gluster-block to 
>> > /var/log/gluster-block [at utils.c+495 :]
>> > May 17 19:13:19 vm2 gluster-blockd[24294]: No such path 
>> > /backstores/user:glfs
>> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process 
>> > exited, code=exited, status=1/FAILURE
>> > May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered failed 
>> > state.
>> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed.
>> >
>> >
>> >
>> > On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever  
>> > wrote:
>> >>
>> >> Hello Gluster folks,
>> >>
>> >> Gluster-block team is happy to announce the v0.4 release [1].
>> >>
>> >> This is the new stable version of gluster-block, lots of new and
>> >> exciting features and interesting bug fixes are made available as part
>> >> of this release.
>> >> Please find the big list of release highlights and notable fixes at [2].
>> >>
>> >> Details about installation can be found in the easy install guide at
>> >> [3]. Find the details about prerequisites and setup guide at [4].
>> >> If you are a new user, checkout the demo video attached in the README
>> >> doc [5], which will be a good source of intro to the project.
>> >> There are good examples about how to use gluster-block both in the man
>> >> pages [6] and test file [7] (also in the README).
>> >>
>> >> gluster-block is part of fedora package collection, an updated package
>> >> with release version v0.4 will be soon made available. And the
>> >> community provided packages will be soon made available at [8].
>> >>
>> >> Please spend a minute to report any kind of issue that comes to your
>> >> notice with this handy link [9].
>> >> We look forward to your feedback, which will help gluster-block get 
>> >> better!
>> >>
>> >> We would like to thank all our users, contributors for bug filing and
>> >> fixes, also the whole team who involved in the huge effort with
>> >> pre-release testing.
>> >>
>> >>
>> >> [1] https://github.com/gluster/gluster-block
>> >> [2] https://github.com/gluster/gluster-block/releases
>> >> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL
>> >> [4] https://github.com/gluster/gluster-block#usage
>> >> [5] https://github.com/gluster/gluster-block/blob/master/README.md
>> >> [6] https://github.com/gluster/gluster-block/tree/master/docs
>> >> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t
>> >> [8] https://download.gluster.org/pub/gluster/gluster-block/
>> >> [9] https://github.com/gluster/gluster-block/issues/new
>> >>
>> >> Cheers,
>> >> Team Gluster-Block!
>> >> ___
>> >> Gluster-users mailing list
>> >> gluster-us...@gluster.org
>> >> https://lists.gluster.org/mailman/listinfo/gluster-users
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] tests are timing out in master branch

2019-05-21 Thread Amar Tumballi Suryanarayan
Looks like after reverting a patch on RPC layer reconnection logic (
https://review.gluster.org/22750) things are back to normal.

For those who submitted a patch in last 1 week, please resubmit. (which
should take care of rebasing on top of this patch).

This event proves that there are very delicate races in our RPC layer,
which can trigger random failures. While it was discussed in brief earlier.
We need to debug this further, and come up with possible next actions.
Volunteers welcome.

I recommend to use https://github.com/gluster/glusterfs/issues/391 to
capture our observations, and continue on github from here.

-Amar


On Wed, May 15, 2019 at 11:46 AM Sankarshan Mukhopadhyay <
sankarshan.mukhopadh...@gmail.com> wrote:

> On Wed, May 15, 2019 at 11:24 AM Atin Mukherjee 
> wrote:
> >
> > There're random tests which are timing out after 200 secs. My belief is
> this is a major regression introduced by some commit recently or the
> builders have become extremely slow which I highly doubt. I'd request that
> we first figure out the cause, get master back to it's proper health and
> then get back to the review/merge queue.
> >
>
> For such dire situations, we also need to consider a proposal to back
> out patches in order to keep the master healthy. The outcome we seek
> is a healthy master - the isolation of the cause allows us to not
> repeat the same offense.
>
> > Sanju has already started looking into
> /tests/bugs/glusterd/optimized-basic-testcases-in-cluster.t to understand
> what test is specifically hanging and consuming more time.
> ___
> Atin Mukherjee , Sankarshan Mukhopadhyay <
> sankarshan.mukhopadh...@gmail.com>
> Community Meeting Calendar:
>
> APAC Schedule -https://review.gluster.org/22750
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>

-- 
Amar Tumballi (amarts)
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] [Gluster-users] VMs blocked for more than 120 seconds

2019-05-21 Thread Krutika Dhananjay
Hi Martin,

Glad it worked! And yes, 3.7.6 is really old! :)

So the issue is occurring when the vm flushes outstanding data to disk. And
this
is taking > 120s because there's lot of buffered writes to flush, possibly
followed
by an fsync too which needs to sync them to disk (volume profile would have
been helpful in confirming this). All these two options do is to truly
honor O_DIRECT flag
(which is what we want anyway given the vms are opened with 'cache=none'
qemu option).
This will skip write-caching on gluster client side and also bypass the
page-cache on the
gluster-bricks, and so data gets flushed faster, thereby eliminating these
timeouts.

-Krutika


On Mon, May 20, 2019 at 3:38 PM Martin  wrote:

> Hi Krutika,
>
> Also, gluster version please?
>
> I am running old 3.7.6. (Yes I know I should upgrade asap)
>
> I’ve applied firstly "network.remote-dio off", behaviour did not changed,
> VMs got stuck after some time again.
> Then I’ve set "performance.strict-o-direct on" and problem completly
> disappeared. No more stucks at all (7 days without any problems at all).
> This SOLVED the issue.
>
> Can you explain what remote-dio and strict-o-direct variables changed in
> behaviour of my Gluster? It would be great for later archive/users to
> understand what and why this solved my issue.
>
> Anyway, Thanks a LOT!!!
>
> BR,
> Martin
>
> On 13 May 2019, at 10:20, Krutika Dhananjay  wrote:
>
> OK. In that case, can you check if the following two changes help:
>
> # gluster volume set $VOL network.remote-dio off
> # gluster volume set $VOL performance.strict-o-direct on
>
> preferably one option changed at a time, its impact tested and then the
> next change applied and tested.
>
> Also, gluster version please?
>
> -Krutika
>
> On Mon, May 13, 2019 at 1:02 PM Martin Toth  wrote:
>
>> Cache in qemu is none. That should be correct. This is full command :
>>
>> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine
>> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp
>> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1
>> -no-user-config -nodefaults -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
>> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
>>
>> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
>> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
>> -drive file=/var/lib/one//datastores/116/312/*disk.0*
>> ,format=raw,if=none,id=drive-virtio-disk1,cache=none
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
>> -drive file=gluster://localhost:24007/imagestore/
>> *7b64d6757acc47a39503f68731f89b8e*
>> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
>> -device
>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
>> -drive file=/var/lib/one//datastores/116/312/*disk.1*
>> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on
>> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
>>
>> -netdev tap,fd=26,id=hostnet0
>> -device 
>> e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0
>> -chardev 
>> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait
>> -device
>> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>> -vnc 0.0.0.0:312,password -device
>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>>
>> I’ve highlighted disks. First is VM context disk - Fuse used, second is
>> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.
>>
>> Krutika,
>> I will start profiling on Gluster Volumes and wait for next VM to fail.
>> Than I will attach/send profiling info after some VM will be failed. I
>> suppose this is correct profiling strategy.
>>
>
> About this, how many vms do you need to recreate it? A single vm? Or
> multiple vms doing IO in parallel?
>
>
>> Thanks,
>> BR!
>> Martin
>>
>> On 13 May 2019, at 09:21, Krutika Dhananjay  wrote:
>>
>> Also, what's the caching policy that qemu is using on the affected vms?
>> Is it cache=none? Or something else? You can get this information in the
>> command line of qemu-kvm process corresponding to your vm in the ps output.
>>
>> -Krutika
>>
>> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay 
>> wrote:
>>
>>> What version of gluster are you using?
>>> Also, can you capture and share volume-profile output for a run where
>>> you manage to recreate this issue?
>>>
>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>>> Let me know if you have any 

[Gluster-devel] glusterfs coredump--mempool

2019-05-21 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Hi glusterfs expert,
I meet glusterfs process coredump again in my env, short after glusterfs 
process startup. The local become NULL, but seems this frame is not destroyed 
yet since the magic number(GF_MEM_HEADER_MAGIC) still untouched.
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --acl --volfile-server=mn-0.local 
--volfile-server=mn-1.loc'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f867fcd2971 in client3_3_inodelk_cbk (req=, 
iov=, count=, myframe=0x7f8654008830)
at client-rpc-fops.c:1510
1510  CLIENT_STACK_UNWIND (inodelk, frame, rsp.op_ret,
[Current thread is 1 (Thread 0x7f867d6d4700 (LWP 3046))]
Missing separate debuginfos, use: dnf debuginfo-install 
glusterfs-fuse-3.12.15-1.wos2.wf29.x86_64
(gdb) bt
#0  0x7f867fcd2971 in client3_3_inodelk_cbk (req=, 
iov=, count=, myframe=0x7f8654008830)
at client-rpc-fops.c:1510
#1  0x7f8685ea5584 in rpc_clnt_handle_reply 
(clnt=clnt@entry=0x7f8678070030, pollin=pollin@entry=0x7f86702833e0) at 
rpc-clnt.c:782
#2  0x7f8685ea587b in rpc_clnt_notify (trans=, 
mydata=0x7f8678070060, event=, data=0x7f86702833e0) at 
rpc-clnt.c:975
#3  0x7f8685ea1b83 in rpc_transport_notify (this=this@entry=0x7f8678070270, 
event=event@entry=RPC_TRANSPORT_MSG_RECEIVED,
data=data@entry=0x7f86702833e0) at rpc-transport.c:538
#4  0x7f8680b99867 in socket_event_poll_in (notify_handled=_gf_true, 
this=0x7f8678070270) at socket.c:2260
#5  socket_event_handler (fd=, idx=3, gen=1, 
data=0x7f8678070270, poll_in=, poll_out=,
poll_err=) at socket.c:2645
#6  0x7f8686132911 in event_dispatch_epoll_handler (event=0x7f867d6d3e6c, 
event_pool=0x55e1b2792b00) at event-epoll.c:583
#7  event_dispatch_epoll_worker (data=0x7f867805ece0) at event-epoll.c:659
#8  0x7f8684ea65da in start_thread () from /lib64/libpthread.so.0
#9  0x7f868474eeaf in clone () from /lib64/libc.so.6
(gdb) print *(call_frame_t*)myframe
$3 = {root = 0x7f86540271a0, parent = 0x0, frames = {next = 0x7f8654027898, 
prev = 0x7f8654027898}, local = 0x0, this = 0x7f8678013080, ret = 0x0,
  ref_count = 0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 
0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0,
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' , __align = 0}}, cookie = 0x0, complete = _gf_false, xid = 0,
  op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, 
tv_usec = 0}, wind_from = 0x0, wind_to = 0x0, unwind_from = 0x0, unwind_to = 
0x0}
(gdb) x/4xw  0x7f8654008810
0x7f8654008810:   0xcafebabe 0x 0x 0x
(gdb) p *(pooled_obj_hdr_t *)0x7f8654008810
$2 = {magic = 3405691582, next = 0x0, pool_list = 0x7f8654000b80, power_of_two 
= 8}

I add "uint32_t xid" in data structure _call_frame, and set it according to the 
rcpreq->xid in __save_frame function. In normal situation this xid should only 
be 0 immediately after create_frame from memory pool. But in this case this xid 
is 0, so seems like that the frame has been given out for use before freed. 
Have you any idea how this happen?


cynthia
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel