Re: [Gluster-users] Gluster 3.7.13 NFS Crash
Also, could you print local->fop please? -Krutika On Fri, Aug 5, 2016 at 10:46 AM, Krutika Dhananjaywrote: > Were the images being renamed (specifically to a pathname that already > exists) while they were being written to? > > -Krutika > > On Thu, Aug 4, 2016 at 1:14 PM, Mahdi Adnan > wrote: > >> Hi, >> >> Kindly check the following link for all 7 bricks logs; >> >> https://db.tt/YP5qTGXk >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> -- >> From: kdhan...@redhat.com >> Date: Thu, 4 Aug 2016 13:00:43 +0530 >> >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> To: mahdi.ad...@outlook.com >> CC: gluster-users@gluster.org >> >> Could you also attach the brick logs please? >> >> -Krutika >> >> On Thu, Aug 4, 2016 at 12:48 PM, Mahdi Adnan >> wrote: >> >> appreciate your help, >> >> (gdb) frame 2 >> #2 0x7f195deb1787 in shard_common_inode_write_do >> (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:3716 >> 3716anon_fd = fd_anonymous >> (local->inode_list[i]); >> (gdb) p local->inode_list[0] >> $4 = (inode_t *) 0x7f195c532b18 >> (gdb) p local->inode_list[1] >> $5 = (inode_t *) 0x0 >> (gdb) >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> -- >> From: kdhan...@redhat.com >> Date: Thu, 4 Aug 2016 12:43:10 +0530 >> >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> To: mahdi.ad...@outlook.com >> CC: gluster-users@gluster.org >> >> OK. >> Could you also print the values of the following variables from the >> original core: >> i. i >> ii. local->inode_list[0] >> iii. local->inode_list[1] >> >> -Krutika >> >> On Wed, Aug 3, 2016 at 9:01 PM, Mahdi Adnan >> wrote: >> >> Hi, >> >> Unfortunately no, but i can setup a test bench and see if it gets the >> same results. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> -- >> From: kdhan...@redhat.com >> Date: Wed, 3 Aug 2016 20:59:50 +0530 >> >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> To: mahdi.ad...@outlook.com >> CC: gluster-users@gluster.org >> >> Do you have a test case that consistently recreates this problem? >> >> -Krutika >> >> On Wed, Aug 3, 2016 at 8:32 PM, Mahdi Adnan >> wrote: >> >> Hi, >> >> So i have updated to 3.7.14 and i still have the same issue with NFS. >> based on what i have provided so far from logs and dumps do you think >> it's an NFS issue ? should i switch to nfs-ganesha ? >> the problem is, the current setup is used in a production environment, >> and switching the mount point of +50 VMs from native nfs to nfs-ganesha is >> not going to be smooth and without downtime, so i really appreciate your >> thoughts on this matter. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> >> >> -- >> From: mahdi.ad...@outlook.com >> To: kdhan...@redhat.com >> Date: Tue, 2 Aug 2016 08:44:16 +0300 >> >> CC: gluster-users@gluster.org >> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash >> >> Hi, >> >> The NFS just crashed again, latest bt; >> >> (gdb) bt >> #0 0x7f0b71a9f210 in pthread_spin_lock () from /lib64/libpthread.so.0 >> #1 0x7f0b72c6fcd5 in fd_anonymous (inode=0x0) at fd.c:804 >> #2 0x7f0b64ca5787 in shard_common_inode_write_do >> (frame=0x7f0b707c062c, this=0x7f0b6002ac10) at shard.c:3716 >> #3 0x7f0b64ca5a53 in shard_common_inode_write_post_lookup_shards_handler >> (frame=, this=) at shard.c:3769 >> #4 0x7f0b64c9eff5 in shard_common_lookup_shards_cbk >> (frame=0x7f0b707c062c, cookie=, this=0x7f0b6002ac10, >> op_ret=0, >> op_errno=, inode=, buf=0x7f0b51407640, >> xdata=0x7f0b72f57648, postparent=0x7f0b514076b0) at shard.c:1601 >> #5 0x7f0b64efe141 in dht_lookup_cbk (frame=0x7f0b7075fcdc, >> cookie=, this=, op_ret=0, op_errno=0, >> inode=0x7f0b5f1d1f58, >> stbuf=0x7f0b51407640, xattr=0x7f0b72f57648, >> postparent=0x7f0b514076b0) at dht-common.c:2174 >> #6 0x7f0b651871f3 in afr_lookup_done (frame=frame@entry=0x7f0b7079a4c8, >> this=this@entry=0x7f0b60023ba0) at afr-common.c:1825 >> #7 0x7f0b65187b84 in afr_lookup_metadata_heal_check >> (frame=frame@entry=0x7f0b7079a4c8, this=0x7f0b60023ba0, this@entry >> =0xca0bd88259f5a800) >> at afr-common.c:2068 >> #8 0x7f0b6518834f in afr_lookup_entry_heal (frame=frame@entry >> =0x7f0b7079a4c8, this=0xca0bd88259f5a800, this@entry=0x7f0b60023ba0) at >> afr-common.c:2157 >> #9 0x7f0b6518867d in afr_lookup_cbk (frame=0x7f0b7079a4c8, >> cookie=, this=0x7f0b60023ba0, op_ret=, >> op_errno=, inode=, buf=0x7f0b564e9940, >> xdata=0x7f0b72f708c8, postparent=0x7f0b564e99b0) at afr-common.c:2205 >> #10 0x7f0b653d6e42 in client3_3_lookup_cbk (req=, >> iov=, count=, myframe=0x7f0b7076354c) >> at client-rpc-fops.c:2981 >> #11 0x7f0b72a00a30 in rpc_clnt_handle_reply (clnt=clnt@entry >> =0x7f0b603393c0,
Re: [Gluster-users] Gluster 3.7.13 NFS Crash
Were the images being renamed (specifically to a pathname that already exists) while they were being written to? -Krutika On Thu, Aug 4, 2016 at 1:14 PM, Mahdi Adnanwrote: > Hi, > > Kindly check the following link for all 7 bricks logs; > > https://db.tt/YP5qTGXk > > > -- > > Respectfully > *Mahdi A. Mahdi* > > > > -- > From: kdhan...@redhat.com > Date: Thu, 4 Aug 2016 13:00:43 +0530 > > Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash > To: mahdi.ad...@outlook.com > CC: gluster-users@gluster.org > > Could you also attach the brick logs please? > > -Krutika > > On Thu, Aug 4, 2016 at 12:48 PM, Mahdi Adnan > wrote: > > appreciate your help, > > (gdb) frame 2 > #2 0x7f195deb1787 in shard_common_inode_write_do > (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:3716 > 3716anon_fd = fd_anonymous (local->inode_list[i]); > (gdb) p local->inode_list[0] > $4 = (inode_t *) 0x7f195c532b18 > (gdb) p local->inode_list[1] > $5 = (inode_t *) 0x0 > (gdb) > > > -- > > Respectfully > *Mahdi A. Mahdi* > > > > -- > From: kdhan...@redhat.com > Date: Thu, 4 Aug 2016 12:43:10 +0530 > > Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash > To: mahdi.ad...@outlook.com > CC: gluster-users@gluster.org > > OK. > Could you also print the values of the following variables from the > original core: > i. i > ii. local->inode_list[0] > iii. local->inode_list[1] > > -Krutika > > On Wed, Aug 3, 2016 at 9:01 PM, Mahdi Adnan > wrote: > > Hi, > > Unfortunately no, but i can setup a test bench and see if it gets the same > results. > > -- > > Respectfully > *Mahdi A. Mahdi* > > > > -- > From: kdhan...@redhat.com > Date: Wed, 3 Aug 2016 20:59:50 +0530 > > Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash > To: mahdi.ad...@outlook.com > CC: gluster-users@gluster.org > > Do you have a test case that consistently recreates this problem? > > -Krutika > > On Wed, Aug 3, 2016 at 8:32 PM, Mahdi Adnan > wrote: > > Hi, > > So i have updated to 3.7.14 and i still have the same issue with NFS. > based on what i have provided so far from logs and dumps do you think it's > an NFS issue ? should i switch to nfs-ganesha ? > the problem is, the current setup is used in a production environment, and > switching the mount point of +50 VMs from native nfs to nfs-ganesha is not > going to be smooth and without downtime, so i really appreciate your > thoughts on this matter. > > -- > > Respectfully > *Mahdi A. Mahdi* > > > > -- > From: mahdi.ad...@outlook.com > To: kdhan...@redhat.com > Date: Tue, 2 Aug 2016 08:44:16 +0300 > > CC: gluster-users@gluster.org > Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash > > Hi, > > The NFS just crashed again, latest bt; > > (gdb) bt > #0 0x7f0b71a9f210 in pthread_spin_lock () from /lib64/libpthread.so.0 > #1 0x7f0b72c6fcd5 in fd_anonymous (inode=0x0) at fd.c:804 > #2 0x7f0b64ca5787 in shard_common_inode_write_do > (frame=0x7f0b707c062c, this=0x7f0b6002ac10) at shard.c:3716 > #3 0x7f0b64ca5a53 in shard_common_inode_write_post_lookup_shards_handler > (frame=, this=) at shard.c:3769 > #4 0x7f0b64c9eff5 in shard_common_lookup_shards_cbk > (frame=0x7f0b707c062c, cookie=, this=0x7f0b6002ac10, > op_ret=0, > op_errno=, inode=, buf=0x7f0b51407640, > xdata=0x7f0b72f57648, postparent=0x7f0b514076b0) at shard.c:1601 > #5 0x7f0b64efe141 in dht_lookup_cbk (frame=0x7f0b7075fcdc, > cookie=, this=, op_ret=0, op_errno=0, > inode=0x7f0b5f1d1f58, > stbuf=0x7f0b51407640, xattr=0x7f0b72f57648, postparent=0x7f0b514076b0) > at dht-common.c:2174 > #6 0x7f0b651871f3 in afr_lookup_done (frame=frame@entry=0x7f0b7079a4c8, > this=this@entry=0x7f0b60023ba0) at afr-common.c:1825 > #7 0x7f0b65187b84 in afr_lookup_metadata_heal_check (frame=frame@entry > =0x7f0b7079a4c8, this=0x7f0b60023ba0, this@entry=0xca0bd88259f5a800) > at afr-common.c:2068 > #8 0x7f0b6518834f in afr_lookup_entry_heal (frame=frame@entry > =0x7f0b7079a4c8, this=0xca0bd88259f5a800, this@entry=0x7f0b60023ba0) at > afr-common.c:2157 > #9 0x7f0b6518867d in afr_lookup_cbk (frame=0x7f0b7079a4c8, > cookie=, this=0x7f0b60023ba0, op_ret=, > op_errno=, inode=, buf=0x7f0b564e9940, > xdata=0x7f0b72f708c8, postparent=0x7f0b564e99b0) at afr-common.c:2205 > #10 0x7f0b653d6e42 in client3_3_lookup_cbk (req=, > iov=, count=, myframe=0x7f0b7076354c) > at client-rpc-fops.c:2981 > #11 0x7f0b72a00a30 in rpc_clnt_handle_reply (clnt=clnt@entry > =0x7f0b603393c0, pollin=pollin@entry=0x7f0b50c1c2d0) at rpc-clnt.c:764 > #12 0x7f0b72a00cef in rpc_clnt_notify (trans=, > mydata=0x7f0b603393f0, event=, data=0x7f0b50c1c2d0) at > rpc-clnt.c:925 > #13 0x7f0b729fc7c3 in rpc_transport_notify (this=this@entry > =0x7f0b60349040, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, >
Re: [Gluster-users] Gluster not saturating 10gb network
Hi, I was mistaken about my server specs. my actual specs for each server are 12 x 4.0TB 3.5" LFF NL-SAS 6G, 128MB, 7.2K rpm HDDs (as Data Store set as RAID 6 achieve 36.0TB usable storage) not the WD RED as i mentioned earlier. I guess i should have a higher transfer rate with these drives in. 400 MB/s is a bit too slow in my opinion. Any help i can get will be greatly appreciated as im not sure where i should start debugging this issue On Fri, Aug 5, 2016 at 2:44 AM, Leno Vowrote: > i got 1.2 gb/s on seagate sshd ST1000LX001 raid 5 x3 (but with the > dreaded cache array on) and 1.1 gb/s on samsung pro ssd 1tb x3 raid5 (no > array caching on for it's not compatible on proliant---not enterprise ssd). > > > On Thursday, August 4, 2016 5:23 AM, Kaamesh Kamalaaharan < > kaam...@novocraft.com> wrote: > > > hi, > thanks for the reply. I have hardware raid 5 storage servers with 4TB WD > red drives. I think they are capable of 6GB/s transfers so it shouldnt be a > drive speed issue. Just for testing i tried to do a dd test directy into > the brick mounted from the storage server itself and got around 800mb/s > transfer rate which is double what i get when the brick is mounted on the > client. Are there any other options or tests that i can perform to figure > out the root cause of my problem as i have exhaused most google searches > and tests. > > Kaamesh > > On Wed, Aug 3, 2016 at 10:58 PM, Leno Vo wrote: > > your 10G nic is capable, the problem is the disk speed, fix ur disk speed > first, use ssd or sshd or sas 15k in a raid 0 or raid 5/6 x4 at least. > > > On Wednesday, August 3, 2016 2:40 AM, Kaamesh Kamalaaharan < > kaam...@novocraft.com> wrote: > > > Hi , > I have gluster 3.6.2 installed on my server network. Due to internal > issues we are not allowed to upgrade the gluster version. All the clients > are on the same version of gluster. When transferring files to/from the > clients or between my nodes over the 10gb network, the transfer rate is > capped at 450Mb/s .Is there any way to increase the transfer speeds for > gluster mounts? > > Our server setup is as following: > > 2 gluster servers -gfs1 and gfs2 > volume name : gfsvolume > 3 clients - hpc1, hpc2,hpc3 > gluster volume mounted on /export/gfsmount/ > > > > > The following is the average results what i did so far: > > 1) test bandwith with iperf between all machines - 9.4 GiB/s > 2) test write speed with dd > > dd if=/dev/zero of=/export/gfsmount/testfile bs=1G count=1 > > result=399Mb/s > > > 3) test read speed with dd > > dd if=/export/gfsmount/testfile of=/dev/zero bs=1G count=1 > > > result=284MB/s > > > My gluster volume configuration: > > > Volume Name: gfsvolume > > Type: Replicate > > Volume ID: a29bd2fb-b1ef-4481-be10-c2f4faf4059b > > Status: Started > > Number of Bricks: 1 x 2 = 2 > > Transport-type: tcp > > Bricks: > > Brick1: gfs1:/export/sda/brick > > Brick2: gfs2:/export/sda/brick > > Options Reconfigured: > > performance.quick-read: off > > network.ping-timeout: 30 > > network.frame-timeout: 90 > > performance.cache-max-file-size: 2MB > > cluster.server-quorum-type: none > > nfs.addr-namelookup: off > > nfs.trusted-write: off > > performance.write-behind-window-size: 4MB > > cluster.data-self-heal-algorithm: diff > > performance.cache-refresh-timeout: 60 > > performance.cache-size: 1GB > > cluster.quorum-type: fixed > > auth.allow: 172.* > > cluster.quorum-count: 1 > > diagnostics.latency-measurement: on > > diagnostics.count-fop-hits: on > > cluster.server-quorum-ratio: 50% > > > Any help would be appreciated. > > Thanks, > > Kaamesh > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster not saturating 10gb network
i got 1.2 gb/s on seagate sshd ST1000LX001 raid 5 x3 (but with the dreaded cache array on) and 1.1 gb/s on samsung pro ssd 1tb x3 raid5 (no array caching on for it's not compatible on proliant---not enterprise ssd). On Thursday, August 4, 2016 5:23 AM, Kaamesh Kamalaaharanwrote: hi, thanks for the reply. I have hardware raid 5 storage servers with 4TB WD red drives. I think they are capable of 6GB/s transfers so it shouldnt be a drive speed issue. Just for testing i tried to do a dd test directy into the brick mounted from the storage server itself and got around 800mb/s transfer rate which is double what i get when the brick is mounted on the client. Are there any other options or tests that i can perform to figure out the root cause of my problem as i have exhaused most google searches and tests. Kaamesh On Wed, Aug 3, 2016 at 10:58 PM, Leno Vo wrote: your 10G nic is capable, the problem is the disk speed, fix ur disk speed first, use ssd or sshd or sas 15k in a raid 0 or raid 5/6 x4 at least. On Wednesday, August 3, 2016 2:40 AM, Kaamesh Kamalaaharan wrote: Hi , I have gluster 3.6.2 installed on my server network. Due to internal issues we are not allowed to upgrade the gluster version. All the clients are on the same version of gluster. When transferring files to/from the clients or between my nodes over the 10gb network, the transfer rate is capped at 450Mb/s .Is there any way to increase the transfer speeds for gluster mounts? Our server setup is as following: 2 gluster servers -gfs1 and gfs2 volume name : gfsvolume3 clients - hpc1, hpc2,hpc3gluster volume mounted on /export/gfsmount/ The following is the average results what i did so far: 1) test bandwith with iperf between all machines - 9.4 GiB/s2) test write speed with dd dd if=/dev/zero of=/export/gfsmount/testfile bs=1G count=1 result=399Mb/s 3) test read speed with dd dd if=/export/gfsmount/testfile of=/dev/zero bs=1G count=1 result=284MB/s My gluster volume configuration: Volume Name: gfsvolumeType: ReplicateVolume ID: a29bd2fb-b1ef-4481-be10-c2f4faf4059bStatus: StartedNumber of Bricks: 1 x 2 = 2Transport-type: tcpBricks:Brick1: gfs1:/export/sda/brickBrick2: gfs2:/export/sda/brickOptions Reconfigured:performance.quick-read: offnetwork.ping-timeout: 30network.frame-timeout: 90performance.cache-max-file-size: 2MBcluster.server-quorum-type: nonenfs.addr-namelookup: offnfs.trusted-write: offperformance.write-behind-window-size: 4MBcluster.data-self-heal-algorithm: diffperformance.cache-refresh-timeout: 60performance.cache-size: 1GBcluster.quorum-type: fixedauth.allow: 172.*cluster.quorum-count: 1diagnostics.latency-measurement: ondiagnostics.count-fop-hits: oncluster.server-quorum-ratio: 50% Any help would be appreciated. Thanks,Kaamesh ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.7.13 two node ssd solid rock
ahhh i'm giving up running gluster on production... good thing i have a replication and i can do hybrid networking with my DR site but there'a an app that network senstive so i have to bump my bandwidth hopefully im not going to pay additional on it. by the way my hpsan two nodes, also died on the blaockout but never had corruption. On Wednesday, August 3, 2016 6:18 PM, Leno Vowrote: i have to reboot each node to make it working with a time interval of 5-8 mins, after that it got stable but still lots of sharding didn't heal but there's no split-brain. some vms lost it's vmx, so i created new vm and put to the storage to make working, wew!!! sharding is still faulty, won't recommend on yet. going back without it. On Wednesday, August 3, 2016 4:34 PM, Leno Vo wrote: my mistakes, the corruption happened after 6 hours, some vm had sharding won't heal but there's no split brain On Wednesday, August 3, 2016 11:13 AM, Leno Vo wrote: One of my gluster 3713 is on two nodes only with samsung ssd 1tb pro raid 5 x3,it already crashed two time because of brown out and block out, it got production vms on it, about 1.3TB. Never got split-brain, and healed quickly. Can we say 3.7.13 two nodes with ssd is solid rock or just lucky? My other gluster is on 3 nodes 3713, but one node never got up (old server proliant wants to retire), ssh raid 5 with combination sshd lol laptop seagate, it never healed about 586 occurences but there's no split-brain too. and vms are intact too, working fine and fast. ahh never turned on caching on the array, the esx might not come up right away, u need to go to setup first to make it work and restart and then you can go to array setup (hp array F8) and turned off caching. then esx finally boot up. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster not saturating 10gb network
There is no way you’ll see 6GB/s out of a single disk. I think you’re referring to the rated SATA speed, which has nothing to do with the actual data rates you’ll see from the spinning rust. You might see ~130-150MB/s from a single platter in really nice, artificial workloads, more in RAID configurations that can read from multiple disks. I have 6 WD Red 6TBs in a RAIDZ2 array (ZFS software RAID, nothing even vaguely approaching high-end hardware otherwise) and for typical file-serving workloads, I see about 120-130MBs from it. In contrast, I have a Samsung 950 Pro NVME SSD, and do see over 1G/s throughput in some real-world workloads with it. But it costs >8x the price per storage unit. -j > On Aug 4, 2016, at 2:23 AM, Kaamesh Kamalaaharan> wrote: > > hi, > thanks for the reply. I have hardware raid 5 storage servers with 4TB WD red > drives. I think they are capable of 6GB/s transfers so it shouldnt be a drive > speed issue. Just for testing i tried to do a dd test directy into the brick > mounted from the storage server itself and got around 800mb/s transfer rate > which is double what i get when the brick is mounted on the client. Are there > any other options or tests that i can perform to figure out the root cause of > my problem as i have exhaused most google searches and tests. > > Kaamesh > > On Wed, Aug 3, 2016 at 10:58 PM, Leno Vo wrote: > your 10G nic is capable, the problem is the disk speed, fix ur disk speed > first, use ssd or sshd or sas 15k in a raid 0 or raid 5/6 x4 at least. > > > On Wednesday, August 3, 2016 2:40 AM, Kaamesh Kamalaaharan > wrote: > > > Hi , > I have gluster 3.6.2 installed on my server network. Due to internal issues > we are not allowed to upgrade the gluster version. All the clients are on the > same version of gluster. When transferring files to/from the clients or > between my nodes over the 10gb network, the transfer rate is capped at > 450Mb/s .Is there any way to increase the transfer speeds for gluster mounts? > > Our server setup is as following: > > 2 gluster servers -gfs1 and gfs2 > volume name : gfsvolume > 3 clients - hpc1, hpc2,hpc3 > gluster volume mounted on /export/gfsmount/ > > > > The following is the average results what i did so far: > > 1) test bandwith with iperf between all machines - 9.4 GiB/s > 2) test write speed with dd > dd if=/dev/zero of=/export/gfsmount/testfile bs=1G count=1 > > result=399Mb/s > > 3) test read speed with dd > dd if=/export/gfsmount/testfile of=/dev/zero bs=1G count=1 > > result=284MB/s > > My gluster volume configuration: > > Volume Name: gfsvolume > Type: Replicate > Volume ID: a29bd2fb-b1ef-4481-be10-c2f4faf4059b > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: gfs1:/export/sda/brick > Brick2: gfs2:/export/sda/brick > Options Reconfigured: > performance.quick-read: off > network.ping-timeout: 30 > network.frame-timeout: 90 > performance.cache-max-file-size: 2MB > cluster.server-quorum-type: none > nfs.addr-namelookup: off > nfs.trusted-write: off > performance.write-behind-window-size: 4MB > cluster.data-self-heal-algorithm: diff > performance.cache-refresh-timeout: 60 > performance.cache-size: 1GB > cluster.quorum-type: fixed > auth.allow: 172.* > cluster.quorum-count: 1 > diagnostics.latency-measurement: on > diagnostics.count-fop-hits: on > cluster.server-quorum-ratio: 50% > > Any help would be appreciated. > Thanks, > Kaamesh > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.8.1 issue in compiling from source tarball
Le 04/08/2016 à 18:13, Yannick Perret a écrit : Le 04/08/2016 à 17:39, Niels de Vos a écrit : On Thu, Aug 04, 2016 at 05:13:40PM +0200, Yannick Perret wrote: Le 03/08/2016 à 17:01, Kaleb S. KEITHLEY a écrit : On 08/03/2016 10:42 AM, Yannick Perret wrote: Le 03/08/2016 à 15:33, Amudhan P a écrit : Hi, I am trying to install gluster 3.8.1 from tarball in Ubuntu 14.04. 1. when i run "./configure --disable-tiering" at the end showing msg configure: WARNING: cache variable ac_cv_build contains a newline configure: WARNING: cache variable ac_cv_host contains a newline 2. running "make" command throws below msg and stops Makefile:90: *** missing separator. Stop. Got the same problem when trying to compile it on Debian 8.2. try ./autogen.sh && ./configure --disable-tiering (works for me on my debian 8 box) Thanks. Worked fine. I did not had to do that on 3.6.x and 3.7.x series, btw. Could you check with the latest 3.7.14 release? Both tarballs are generated with the same autotools version. If there is an issue with only the 3.8.x release, we might be able to fix it. If it happens on both, I guess it'll be more difficult. Also, patches welcome :) I just tested on 3.7.14 and I don't have the problem: wget https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.14/glusterfs-3.7.14.tar.gz tar zxvf glusterfs-3.7.14.tar.gz cd glusterfs-3.7.14 ./configure --disable-tiering → no error/warning make → no error Note: the concerned variables (ac_cv_build/host) seems to be extracted from scripts ./config.guess and ./config.sub. On 3.7.14 these are real scripts, but on 3.8.1 they both contains: #!/bin/sh # # This script is intentionally left empty. Distributions that package GlusterFS # may want to to replace it with an updated copy from the automake project. # cat << EOM It is not expected to execute this script. When you are building from a released tarball (generated with 'make dist'), you are expected to pass --build=... and --host=... to ./configure or replace this config.sub script in the sources with an updated version. EOM exit 0 Don't know why, but I guess it is related to the problem :) -- Y. Regards, -- Y. Thanks, Niels ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users smime.p7s Description: Signature cryptographique S/MIME ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.8.1 issue in compiling from source tarball
Le 04/08/2016 à 17:39, Niels de Vos a écrit : On Thu, Aug 04, 2016 at 05:13:40PM +0200, Yannick Perret wrote: Le 03/08/2016 à 17:01, Kaleb S. KEITHLEY a écrit : On 08/03/2016 10:42 AM, Yannick Perret wrote: Le 03/08/2016 à 15:33, Amudhan P a écrit : Hi, I am trying to install gluster 3.8.1 from tarball in Ubuntu 14.04. 1. when i run "./configure --disable-tiering" at the end showing msg configure: WARNING: cache variable ac_cv_build contains a newline configure: WARNING: cache variable ac_cv_host contains a newline 2. running "make" command throws below msg and stops Makefile:90: *** missing separator. Stop. Got the same problem when trying to compile it on Debian 8.2. try ./autogen.sh && ./configure --disable-tiering (works for me on my debian 8 box) Thanks. Worked fine. I did not had to do that on 3.6.x and 3.7.x series, btw. Could you check with the latest 3.7.14 release? Both tarballs are generated with the same autotools version. If there is an issue with only the 3.8.x release, we might be able to fix it. If it happens on both, I guess it'll be more difficult. Also, patches welcome :) I just tested on 3.7.14 and I don't have the problem: wget https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.14/glusterfs-3.7.14.tar.gz tar zxvf glusterfs-3.7.14.tar.gz cd glusterfs-3.7.14 ./configure --disable-tiering → no error/warning make → no error Regards, -- Y. Thanks, Niels smime.p7s Description: Signature cryptographique S/MIME ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.8.1 issue in compiling from source tarball
On Thu, Aug 04, 2016 at 05:13:40PM +0200, Yannick Perret wrote: > Le 03/08/2016 à 17:01, Kaleb S. KEITHLEY a écrit : > > On 08/03/2016 10:42 AM, Yannick Perret wrote: > > > Le 03/08/2016 à 15:33, Amudhan P a écrit : > > > > Hi, > > > > > > > > I am trying to install gluster 3.8.1 from tarball in Ubuntu 14.04. > > > > > > > > 1. when i run "./configure --disable-tiering" at the end showing msg > > > > > > > > configure: WARNING: cache variable ac_cv_build contains a newline > > > > configure: WARNING: cache variable ac_cv_host contains a newline > > > > > > > > 2. running "make" command throws below msg and stops > > > > > > > > Makefile:90: *** missing separator. Stop. > > > > > > > Got the same problem when trying to compile it on Debian 8.2. > > > > > try ./autogen.sh && ./configure --disable-tiering > > > > (works for me on my debian 8 box) > > > Thanks. Worked fine. > > I did not had to do that on 3.6.x and 3.7.x series, btw. Could you check with the latest 3.7.14 release? Both tarballs are generated with the same autotools version. If there is an issue with only the 3.8.x release, we might be able to fix it. If it happens on both, I guess it'll be more difficult. Also, patches welcome :) Thanks, Niels signature.asc Description: PGP signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.8.1 issue in compiling from source tarball
Le 03/08/2016 à 17:01, Kaleb S. KEITHLEY a écrit : On 08/03/2016 10:42 AM, Yannick Perret wrote: Le 03/08/2016 à 15:33, Amudhan P a écrit : Hi, I am trying to install gluster 3.8.1 from tarball in Ubuntu 14.04. 1. when i run "./configure --disable-tiering" at the end showing msg configure: WARNING: cache variable ac_cv_build contains a newline configure: WARNING: cache variable ac_cv_host contains a newline 2. running "make" command throws below msg and stops Makefile:90: *** missing separator. Stop. Got the same problem when trying to compile it on Debian 8.2. try ./autogen.sh && ./configure --disable-tiering (works for me on my debian 8 box) Thanks. Worked fine. I did not had to do that on 3.6.x and 3.7.x series, btw. -- Y. smime.p7s Description: Signature cryptographique S/MIME ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster not saturating 10gb network
hi, thanks for the reply. I have hardware raid 5 storage servers with 4TB WD red drives. I think they are capable of 6GB/s transfers so it shouldnt be a drive speed issue. Just for testing i tried to do a dd test directy into the brick mounted from the storage server itself and got around 800mb/s transfer rate which is double what i get when the brick is mounted on the client. Are there any other options or tests that i can perform to figure out the root cause of my problem as i have exhaused most google searches and tests. Kaamesh On Wed, Aug 3, 2016 at 10:58 PM, Leno Vowrote: > your 10G nic is capable, the problem is the disk speed, fix ur disk speed > first, use ssd or sshd or sas 15k in a raid 0 or raid 5/6 x4 at least. > > > On Wednesday, August 3, 2016 2:40 AM, Kaamesh Kamalaaharan < > kaam...@novocraft.com> wrote: > > > Hi , > I have gluster 3.6.2 installed on my server network. Due to internal > issues we are not allowed to upgrade the gluster version. All the clients > are on the same version of gluster. When transferring files to/from the > clients or between my nodes over the 10gb network, the transfer rate is > capped at 450Mb/s .Is there any way to increase the transfer speeds for > gluster mounts? > > Our server setup is as following: > > 2 gluster servers -gfs1 and gfs2 > volume name : gfsvolume > 3 clients - hpc1, hpc2,hpc3 > gluster volume mounted on /export/gfsmount/ > > > > > The following is the average results what i did so far: > > 1) test bandwith with iperf between all machines - 9.4 GiB/s > 2) test write speed with dd > > dd if=/dev/zero of=/export/gfsmount/testfile bs=1G count=1 > > result=399Mb/s > > > 3) test read speed with dd > > dd if=/export/gfsmount/testfile of=/dev/zero bs=1G count=1 > > > result=284MB/s > > > My gluster volume configuration: > > > Volume Name: gfsvolume > > Type: Replicate > > Volume ID: a29bd2fb-b1ef-4481-be10-c2f4faf4059b > > Status: Started > > Number of Bricks: 1 x 2 = 2 > > Transport-type: tcp > > Bricks: > > Brick1: gfs1:/export/sda/brick > > Brick2: gfs2:/export/sda/brick > > Options Reconfigured: > > performance.quick-read: off > > network.ping-timeout: 30 > > network.frame-timeout: 90 > > performance.cache-max-file-size: 2MB > > cluster.server-quorum-type: none > > nfs.addr-namelookup: off > > nfs.trusted-write: off > > performance.write-behind-window-size: 4MB > > cluster.data-self-heal-algorithm: diff > > performance.cache-refresh-timeout: 60 > > performance.cache-size: 1GB > > cluster.quorum-type: fixed > > auth.allow: 172.* > > cluster.quorum-count: 1 > > diagnostics.latency-measurement: on > > diagnostics.count-fop-hits: on > > cluster.server-quorum-ratio: 50% > > > Any help would be appreciated. > > Thanks, > > Kaamesh > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Failed file system
Replace brick commit force command can be used. If you are on glusterfs 3.7.3 and above, self-heal will be automatically triggered from good bricks to the newly added brick. But you can't replace a brick on the same path as before, your new brick path will have to be different that the existing ones in the volume. - Original Message - > From: "Mahdi Adnan"> To: "Andres E. Moya" , "gluster-users" > > Sent: Thursday, August 4, 2016 1:25:59 AM > Subject: Re: [Gluster-users] Failed file system > > Hi, > > I'm not expert in Gluster but, i think it would be better to replace the > downed brick with a new one. > Maybe start from here; > > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick > > > -- > > Respectfully > Mahdi A. Mahdi > > > > > Date: Wed, 3 Aug 2016 15:39:35 -0400 > From: am...@moyasolutions.com > To: gluster-users@gluster.org > Subject: Re: [Gluster-users] Failed file system > > Does anyone else have input? > > we are currently only running off 1 node and one node is offline in replicate > brick. > > we are not experiencing any downtime because the 1 node is up. > > I do not understand which is the best way to bring up a second node. > > Do we just re create a file system on the node that is down and the mount > points and allow gluster to heal( my concern with this is whether the node > that is down will some how take precedence and wipe out the data on the > healthy node instead of vice versa) > > Or do we fully wipe out the config on the node that is down, re create the > file system and re add the node that is down into gluster using the add > brick command replica 3, and then wait for it to heal then run the remove > brick command for the failed brick > > which would be the safest and easiest to accomplish > > thanks for any input > > > > > From: "Leno Vo" > To: "Andres E. Moya" > Cc: "gluster-users" > Sent: Tuesday, August 2, 2016 6:45:27 PM > Subject: Re: [Gluster-users] Failed file system > > if you don't want any downtime (in the case that your node 2 really die), you > have to create a new gluster san (if you have the resources of course, 3 > nodes as much as possible this time), and then just migrate your vms (or > files), therefore no downtime but you have to cross your finger that the > only node will not die too... also without sharding the vm migration > especially an rdp one, will be slow access from users till it migrated. > > you have to start testing sharding, it's fast and cool... > > > > > On Tuesday, August 2, 2016 2:51 PM, Andres E. Moya > wrote: > > > couldnt we just add a new server by > > gluster peer probe > gluster volume add-brick replica 3 (will this command succeed with 1 current > failed brick?) > > let it heal, then > > gluster volume remove remove-brick > > From: "Leno Vo" > To: "Andres E. Moya" , "gluster-users" > > Sent: Tuesday, August 2, 2016 1:26:42 PM > Subject: Re: [Gluster-users] Failed file system > > you need to have a downtime to recreate the second node, two nodes is > actually not good for production and you should have put raid 1 or raid 5 as > your gluster storage, when you recreate the second node you might try > running some VMs that need to be up and rest of vm need to be down but stop > all backup and if you have replication, stop it too. if you have 1G nic, > 2cpu and less 8Gram, then i suggest all turn off the VMs during recreation > of second node. someone said if you have sharding with 3.7.x, maybe some vip > vm can be up... > > if it just a filesystem, then just turn off the backup service until you > recreate the second node. depending on your resources and how big is your > storage, it might be hours to recreate it and even days... > > here's my process on recreating the second or third node (copied and modifed > from the net), > > #make sure partition is already added > This procedure is for replacing a failed server, IF your newly installed > server has the same hostname as the failed one: > > (If your new server will have a different hostname, see this article > instead.) > > For purposes of this example, the server that crashed will be server3 and the > other servers will be server1 and server2 > > On both server1 and server2, make sure hostname server3 resolves to the > correct IP address of the new replacement server. > #On either server1 or server2, do > grep server3 /var/lib/glusterd/peers/* > > This will return a uuid followed by ":hostname1=server3" > > #On server3, make sure glusterd is stopped, then do > echo UUID={uuid from previous step}>/var/lib/glusterd/glusterd.info > > #actual testing below, > [root@node1 ~]# cat
Re: [Gluster-users] Gluster 3.7.13 NFS Crash
Hi, Kindly check the following link for all 7 bricks logs; https://db.tt/YP5qTGXk -- Respectfully Mahdi A. Mahdi From: kdhan...@redhat.com Date: Thu, 4 Aug 2016 13:00:43 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.ad...@outlook.com CC: gluster-users@gluster.org Could you also attach the brick logs please? -Krutika On Thu, Aug 4, 2016 at 12:48 PM, Mahdi Adnanwrote: appreciate your help, (gdb) frame 2#2 0x7f195deb1787 in shard_common_inode_write_do (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:37163716 anon_fd = fd_anonymous (local->inode_list[i]);(gdb) p local->inode_list[0]$4 = (inode_t *) 0x7f195c532b18(gdb) p local->inode_list[1]$5 = (inode_t *) 0x0(gdb) -- Respectfully Mahdi A. Mahdi From: kdhan...@redhat.com Date: Thu, 4 Aug 2016 12:43:10 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.ad...@outlook.com CC: gluster-users@gluster.org OK. Could you also print the values of the following variables from the original core: i. i ii. local->inode_list[0] iii. local->inode_list[1] -Krutika On Wed, Aug 3, 2016 at 9:01 PM, Mahdi Adnan wrote: Hi, Unfortunately no, but i can setup a test bench and see if it gets the same results. -- Respectfully Mahdi A. Mahdi From: kdhan...@redhat.com Date: Wed, 3 Aug 2016 20:59:50 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.ad...@outlook.com CC: gluster-users@gluster.org Do you have a test case that consistently recreates this problem? -Krutika On Wed, Aug 3, 2016 at 8:32 PM, Mahdi Adnan wrote: Hi, So i have updated to 3.7.14 and i still have the same issue with NFS.based on what i have provided so far from logs and dumps do you think it's an NFS issue ? should i switch to nfs-ganesha ? the problem is, the current setup is used in a production environment, and switching the mount point of +50 VMs from native nfs to nfs-ganesha is not going to be smooth and without downtime, so i really appreciate your thoughts on this matter. -- Respectfully Mahdi A. Mahdi From: mahdi.ad...@outlook.com To: kdhan...@redhat.com Date: Tue, 2 Aug 2016 08:44:16 +0300 CC: gluster-users@gluster.org Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash Hi, The NFS just crashed again, latest bt; (gdb) bt#0 0x7f0b71a9f210 in pthread_spin_lock () from /lib64/libpthread.so.0#1 0x7f0b72c6fcd5 in fd_anonymous (inode=0x0) at fd.c:804#2 0x7f0b64ca5787 in shard_common_inode_write_do (frame=0x7f0b707c062c, this=0x7f0b6002ac10) at shard.c:3716#3 0x7f0b64ca5a53 in shard_common_inode_write_post_lookup_shards_handler (frame=, this=) at shard.c:3769#4 0x7f0b64c9eff5 in shard_common_lookup_shards_cbk (frame=0x7f0b707c062c, cookie=, this=0x7f0b6002ac10, op_ret=0, op_errno=, inode=, buf=0x7f0b51407640, xdata=0x7f0b72f57648, postparent=0x7f0b514076b0) at shard.c:1601#5 0x7f0b64efe141 in dht_lookup_cbk (frame=0x7f0b7075fcdc, cookie=, this=, op_ret=0, op_errno=0, inode=0x7f0b5f1d1f58, stbuf=0x7f0b51407640, xattr=0x7f0b72f57648, postparent=0x7f0b514076b0) at dht-common.c:2174#6 0x7f0b651871f3 in afr_lookup_done (frame=frame@entry=0x7f0b7079a4c8, this=this@entry=0x7f0b60023ba0) at afr-common.c:1825#7 0x7f0b65187b84 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f0b7079a4c8, this=0x7f0b60023ba0, this@entry=0xca0bd88259f5a800)at afr-common.c:2068#8 0x7f0b6518834f in afr_lookup_entry_heal (frame=frame@entry=0x7f0b7079a4c8, this=0xca0bd88259f5a800, this@entry=0x7f0b60023ba0) at afr-common.c:2157#9 0x7f0b6518867d in afr_lookup_cbk (frame=0x7f0b7079a4c8, cookie=, this=0x7f0b60023ba0, op_ret=, op_errno=, inode=, buf=0x7f0b564e9940, xdata=0x7f0b72f708c8, postparent=0x7f0b564e99b0) at afr-common.c:2205#10 0x7f0b653d6e42 in client3_3_lookup_cbk (req=, iov=, count=, myframe=0x7f0b7076354c)at client-rpc-fops.c:2981#11 0x7f0b72a00a30 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f0b603393c0, pollin=pollin@entry=0x7f0b50c1c2d0) at rpc-clnt.c:764#12 0x7f0b72a00cef in rpc_clnt_notify (trans=, mydata=0x7f0b603393f0, event=, data=0x7f0b50c1c2d0) at rpc-clnt.c:925#13 0x7f0b729fc7c3 in rpc_transport_notify (this=this@entry=0x7f0b60349040, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f0b50c1c2d0) at rpc-transport.c:546#14 0x7f0b678c39a4 in socket_event_poll_in (this=this@entry=0x7f0b60349040) at socket.c:2353#15 0x7f0b678c65e4 in socket_event_handler (fd=fd@entry=29, idx=idx@entry=17, data=0x7f0b60349040, poll_in=1, poll_out=0, poll_err=0) at socket.c:2466#16 0x7f0b72ca0f7a in event_dispatch_epoll_handler (event=0x7f0b564e9e80, event_pool=0x7f0b7349bf20) at event-epoll.c:575#17 event_dispatch_epoll_worker (data=0x7f0b60152d40) at event-epoll.c:678#18 0x7f0b71a9adc5 in start_thread () from
Re: [Gluster-users] Gluster 3.7.13 NFS Crash
appreciate your help, (gdb) frame 2#2 0x7f195deb1787 in shard_common_inode_write_do (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:37163716 anon_fd = fd_anonymous (local->inode_list[i]);(gdb) p local->inode_list[0]$4 = (inode_t *) 0x7f195c532b18(gdb) p local->inode_list[1]$5 = (inode_t *) 0x0(gdb) -- Respectfully Mahdi A. Mahdi From: kdhan...@redhat.com Date: Thu, 4 Aug 2016 12:43:10 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.ad...@outlook.com CC: gluster-users@gluster.org OK. Could you also print the values of the following variables from the original core: i. i ii. local->inode_list[0] iii. local->inode_list[1] -Krutika On Wed, Aug 3, 2016 at 9:01 PM, Mahdi Adnanwrote: Hi, Unfortunately no, but i can setup a test bench and see if it gets the same results. -- Respectfully Mahdi A. Mahdi From: kdhan...@redhat.com Date: Wed, 3 Aug 2016 20:59:50 +0530 Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash To: mahdi.ad...@outlook.com CC: gluster-users@gluster.org Do you have a test case that consistently recreates this problem? -Krutika On Wed, Aug 3, 2016 at 8:32 PM, Mahdi Adnan wrote: Hi, So i have updated to 3.7.14 and i still have the same issue with NFS.based on what i have provided so far from logs and dumps do you think it's an NFS issue ? should i switch to nfs-ganesha ? the problem is, the current setup is used in a production environment, and switching the mount point of +50 VMs from native nfs to nfs-ganesha is not going to be smooth and without downtime, so i really appreciate your thoughts on this matter. -- Respectfully Mahdi A. Mahdi From: mahdi.ad...@outlook.com To: kdhan...@redhat.com Date: Tue, 2 Aug 2016 08:44:16 +0300 CC: gluster-users@gluster.org Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash Hi, The NFS just crashed again, latest bt; (gdb) bt#0 0x7f0b71a9f210 in pthread_spin_lock () from /lib64/libpthread.so.0#1 0x7f0b72c6fcd5 in fd_anonymous (inode=0x0) at fd.c:804#2 0x7f0b64ca5787 in shard_common_inode_write_do (frame=0x7f0b707c062c, this=0x7f0b6002ac10) at shard.c:3716#3 0x7f0b64ca5a53 in shard_common_inode_write_post_lookup_shards_handler (frame=, this=) at shard.c:3769#4 0x7f0b64c9eff5 in shard_common_lookup_shards_cbk (frame=0x7f0b707c062c, cookie=, this=0x7f0b6002ac10, op_ret=0, op_errno=, inode=, buf=0x7f0b51407640, xdata=0x7f0b72f57648, postparent=0x7f0b514076b0) at shard.c:1601#5 0x7f0b64efe141 in dht_lookup_cbk (frame=0x7f0b7075fcdc, cookie=, this=, op_ret=0, op_errno=0, inode=0x7f0b5f1d1f58, stbuf=0x7f0b51407640, xattr=0x7f0b72f57648, postparent=0x7f0b514076b0) at dht-common.c:2174#6 0x7f0b651871f3 in afr_lookup_done (frame=frame@entry=0x7f0b7079a4c8, this=this@entry=0x7f0b60023ba0) at afr-common.c:1825#7 0x7f0b65187b84 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f0b7079a4c8, this=0x7f0b60023ba0, this@entry=0xca0bd88259f5a800)at afr-common.c:2068#8 0x7f0b6518834f in afr_lookup_entry_heal (frame=frame@entry=0x7f0b7079a4c8, this=0xca0bd88259f5a800, this@entry=0x7f0b60023ba0) at afr-common.c:2157#9 0x7f0b6518867d in afr_lookup_cbk (frame=0x7f0b7079a4c8, cookie=, this=0x7f0b60023ba0, op_ret=, op_errno=, inode=, buf=0x7f0b564e9940, xdata=0x7f0b72f708c8, postparent=0x7f0b564e99b0) at afr-common.c:2205#10 0x7f0b653d6e42 in client3_3_lookup_cbk (req=, iov=, count=, myframe=0x7f0b7076354c)at client-rpc-fops.c:2981#11 0x7f0b72a00a30 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f0b603393c0, pollin=pollin@entry=0x7f0b50c1c2d0) at rpc-clnt.c:764#12 0x7f0b72a00cef in rpc_clnt_notify (trans=, mydata=0x7f0b603393f0, event=, data=0x7f0b50c1c2d0) at rpc-clnt.c:925#13 0x7f0b729fc7c3 in rpc_transport_notify (this=this@entry=0x7f0b60349040, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f0b50c1c2d0) at rpc-transport.c:546#14 0x7f0b678c39a4 in socket_event_poll_in (this=this@entry=0x7f0b60349040) at socket.c:2353#15 0x7f0b678c65e4 in socket_event_handler (fd=fd@entry=29, idx=idx@entry=17, data=0x7f0b60349040, poll_in=1, poll_out=0, poll_err=0) at socket.c:2466#16 0x7f0b72ca0f7a in event_dispatch_epoll_handler (event=0x7f0b564e9e80, event_pool=0x7f0b7349bf20) at event-epoll.c:575#17 event_dispatch_epoll_worker (data=0x7f0b60152d40) at event-epoll.c:678#18 0x7f0b71a9adc5 in start_thread () from /lib64/libpthread.so.0#19 0x7f0b713dfced in clone () from /lib64/libc.so.6 -- Respectfully Mahdi A. Mahdi From: mahdi.ad...@outlook.com To: kdhan...@redhat.com Date: Mon, 1 Aug 2016 16:31:50 +0300 CC: gluster-users@gluster.org Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash Many thanks, here's the results; (gdb) p cur_block$15 = 4088(gdb) p last_block$16 = 4088(gdb) p local->first_block$17 = 4087(gdb) p odirect$18 =
Re: [Gluster-users] GlusterFS-3.7.14 released
On Thu, Aug 4, 2016 at 11:30 AM, Serkan Çobanwrote: > Thanks Pranith, > I am waiting for RPMs to show, I will do the tests as soon as possible > and inform you. > I guess on 3.7.x the RPMs are not automatically built. Let me find how it can be done. I will inform you after finding that out. Give me a day. > > On Wed, Aug 3, 2016 at 11:19 PM, Pranith Kumar Karampuri > wrote: > > > > > > On Thu, Aug 4, 2016 at 1:47 AM, Pranith Kumar Karampuri > > wrote: > >> > >> > >> > >> On Thu, Aug 4, 2016 at 12:51 AM, Serkan Çoban > >> wrote: > >>> > >>> I use rpms for installation. Redhat/Centos 6.8. > >> > >> > >> http://review.gluster.org/#/c/15084 is the patch. In some time the rpms > >> will be built actually. > > > > > > In the same URL above it will actually post the rpms for fedora/el6/el7 > at > > the end of the page. > > > >> > >> > >> Use gluster volume set disperse.shd-max-threads >> (range: 1-64)> > >> > >> While testing this I thought of ways to decrease the number of crawls as > >> well. But they are a bit involved. Try to create same set of data and > see > >> what is the time it takes to complete heals using number of threads as > you > >> increase the number of parallel heals from 1 to 64. > >> > >>> > >>> On Wed, Aug 3, 2016 at 10:16 PM, Pranith Kumar Karampuri > >>> wrote: > >>> > > >>> > > >>> > On Thu, Aug 4, 2016 at 12:45 AM, Serkan Çoban > > >>> > wrote: > >>> >> > >>> >> I prefer 3.7 if it is ok for you. Can you also provide build > >>> >> instructions? > >>> > > >>> > > >>> > 3.7 should be fine. Do you use rpms/debs/anything-else? > >>> > > >>> >> > >>> >> > >>> >> On Wed, Aug 3, 2016 at 10:12 PM, Pranith Kumar Karampuri > >>> >> wrote: > >>> >> > > >>> >> > > >>> >> > On Thu, Aug 4, 2016 at 12:37 AM, Serkan Çoban > >>> >> > > >>> >> > wrote: > >>> >> >> > >>> >> >> Yes, but I can create 2+1(or 8+2) ec using two servers right? I > >>> >> >> have > >>> >> >> 26 disks on each server. > >>> >> > > >>> >> > > >>> >> > On which release-branch do you want the patch? I am testing it on > >>> >> > master-branch now. > >>> >> > > >>> >> >> > >>> >> >> > >>> >> >> On Wed, Aug 3, 2016 at 9:59 PM, Pranith Kumar Karampuri > >>> >> >> wrote: > >>> >> >> > > >>> >> >> > > >>> >> >> > On Thu, Aug 4, 2016 at 12:23 AM, Serkan Çoban > >>> >> >> > > >>> >> >> > wrote: > >>> >> >> >> > >>> >> >> >> I have two of my storage servers free, I think I can use them > >>> >> >> >> for > >>> >> >> >> testing. Is two server testing environment ok for you? > >>> >> >> > > >>> >> >> > > >>> >> >> > I think it would be better if you have at least 3. You can test > >>> >> >> > it > >>> >> >> > with > >>> >> >> > 2+1 > >>> >> >> > ec configuration. > >>> >> >> > > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> On Wed, Aug 3, 2016 at 9:44 PM, Pranith Kumar Karampuri > >>> >> >> >> wrote: > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> > On Wed, Aug 3, 2016 at 6:01 PM, Serkan Çoban > >>> >> >> >> > > >>> >> >> >> > wrote: > >>> >> >> >> >> > >>> >> >> >> >> Hi, > >>> >> >> >> >> > >>> >> >> >> >> May I ask if multi-threaded self heal for distributed > >>> >> >> >> >> disperse > >>> >> >> >> >> volumes > >>> >> >> >> >> implemented in this release? > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> > Serkan, > >>> >> >> >> > At the moment I am a bit busy with different work, > Is > >>> >> >> >> > it > >>> >> >> >> > possible > >>> >> >> >> > for you to help test the feature if I provide a patch? > >>> >> >> >> > Actually > >>> >> >> >> > the > >>> >> >> >> > patch > >>> >> >> >> > should be small. Testing is where lot of time will be spent > >>> >> >> >> > on. > >>> >> >> >> > > >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> Thanks, > >>> >> >> >> >> Serkan > >>> >> >> >> >> > >>> >> >> >> >> On Tue, Aug 2, 2016 at 5:30 PM, David Gossage > >>> >> >> >> >> wrote: > >>> >> >> >> >> > On Tue, Aug 2, 2016 at 6:01 AM, Lindsay Mathieson > >>> >> >> >> >> > wrote: > >>> >> >> >> >> >> > >>> >> >> >> >> >> On 2/08/2016 5:07 PM, Kaushal M wrote: > >>> >> >> >> >> >>> > >>> >> >> >> >> >>> GlusterFS-3.7.14 has been released. This is a regular > >>> >> >> >> >> >>> minor > >>> >> >> >> >> >>> release. > >>> >> >> >> >> >>> The release-notes are available at > >>> >> >> >> >> >>> > >>> >> >> >> >> >>> > >>> >> >> >> >> >>> > >>> >> >> >> >> >>> > >>> >> >> >> >> >>> > >>> >> >> >> >> >>> > >>> >> >> >> >> >>> > https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.14.md > >>> >> >> >> >> >> > >>> >> >> >> >> >> > >>> >> >> >> >> >> Thanks Kaushal, I'll check it out > >>> >> >> >> >> >> > >>> >> >> >> >> > > >>> >> >> >> >> > So far on my test box its working as expected. At
Re: [Gluster-users] Gluster 3.7.13 NFS Crash
OK. Could you also print the values of the following variables from the original core: i. i ii. local->inode_list[0] iii. local->inode_list[1] -Krutika On Wed, Aug 3, 2016 at 9:01 PM, Mahdi Adnanwrote: > Hi, > > Unfortunately no, but i can setup a test bench and see if it gets the same > results. > > -- > > Respectfully > *Mahdi A. Mahdi* > > > > -- > From: kdhan...@redhat.com > Date: Wed, 3 Aug 2016 20:59:50 +0530 > > Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash > To: mahdi.ad...@outlook.com > CC: gluster-users@gluster.org > > Do you have a test case that consistently recreates this problem? > > -Krutika > > On Wed, Aug 3, 2016 at 8:32 PM, Mahdi Adnan > wrote: > > Hi, > > So i have updated to 3.7.14 and i still have the same issue with NFS. > based on what i have provided so far from logs and dumps do you think it's > an NFS issue ? should i switch to nfs-ganesha ? > the problem is, the current setup is used in a production environment, and > switching the mount point of +50 VMs from native nfs to nfs-ganesha is not > going to be smooth and without downtime, so i really appreciate your > thoughts on this matter. > > -- > > Respectfully > *Mahdi A. Mahdi* > > > > -- > From: mahdi.ad...@outlook.com > To: kdhan...@redhat.com > Date: Tue, 2 Aug 2016 08:44:16 +0300 > > CC: gluster-users@gluster.org > Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash > > Hi, > > The NFS just crashed again, latest bt; > > (gdb) bt > #0 0x7f0b71a9f210 in pthread_spin_lock () from /lib64/libpthread.so.0 > #1 0x7f0b72c6fcd5 in fd_anonymous (inode=0x0) at fd.c:804 > #2 0x7f0b64ca5787 in shard_common_inode_write_do > (frame=0x7f0b707c062c, this=0x7f0b6002ac10) at shard.c:3716 > #3 0x7f0b64ca5a53 in shard_common_inode_write_post_lookup_shards_handler > (frame=, this=) at shard.c:3769 > #4 0x7f0b64c9eff5 in shard_common_lookup_shards_cbk > (frame=0x7f0b707c062c, cookie=, this=0x7f0b6002ac10, > op_ret=0, > op_errno=, inode=, buf=0x7f0b51407640, > xdata=0x7f0b72f57648, postparent=0x7f0b514076b0) at shard.c:1601 > #5 0x7f0b64efe141 in dht_lookup_cbk (frame=0x7f0b7075fcdc, > cookie=, this=, op_ret=0, op_errno=0, > inode=0x7f0b5f1d1f58, > stbuf=0x7f0b51407640, xattr=0x7f0b72f57648, postparent=0x7f0b514076b0) > at dht-common.c:2174 > #6 0x7f0b651871f3 in afr_lookup_done (frame=frame@entry=0x7f0b7079a4c8, > this=this@entry=0x7f0b60023ba0) at afr-common.c:1825 > #7 0x7f0b65187b84 in afr_lookup_metadata_heal_check (frame=frame@entry > =0x7f0b7079a4c8, this=0x7f0b60023ba0, this@entry=0xca0bd88259f5a800) > at afr-common.c:2068 > #8 0x7f0b6518834f in afr_lookup_entry_heal > (frame=frame@entry=0x7f0b7079a4c8, > this=0xca0bd88259f5a800, this@entry=0x7f0b60023ba0) at afr-common.c:2157 > #9 0x7f0b6518867d in afr_lookup_cbk (frame=0x7f0b7079a4c8, > cookie=, this=0x7f0b60023ba0, op_ret=, > op_errno=, inode=, buf=0x7f0b564e9940, > xdata=0x7f0b72f708c8, postparent=0x7f0b564e99b0) at afr-common.c:2205 > #10 0x7f0b653d6e42 in client3_3_lookup_cbk (req=, > iov=, count=, myframe=0x7f0b7076354c) > at client-rpc-fops.c:2981 > #11 0x7f0b72a00a30 in rpc_clnt_handle_reply > (clnt=clnt@entry=0x7f0b603393c0, > pollin=pollin@entry=0x7f0b50c1c2d0) at rpc-clnt.c:764 > #12 0x7f0b72a00cef in rpc_clnt_notify (trans=, > mydata=0x7f0b603393f0, event=, data=0x7f0b50c1c2d0) at > rpc-clnt.c:925 > #13 0x7f0b729fc7c3 in rpc_transport_notify > (this=this@entry=0x7f0b60349040, > event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry= > 0x7f0b50c1c2d0) > at rpc-transport.c:546 > #14 0x7f0b678c39a4 in socket_event_poll_in > (this=this@entry=0x7f0b60349040) > at socket.c:2353 > #15 0x7f0b678c65e4 in socket_event_handler (fd=fd@entry=29, > idx=idx@entry=17, data=0x7f0b60349040, poll_in=1, poll_out=0, poll_err=0) > at socket.c:2466 > #16 0x7f0b72ca0f7a in event_dispatch_epoll_handler > (event=0x7f0b564e9e80, event_pool=0x7f0b7349bf20) at event-epoll.c:575 > #17 event_dispatch_epoll_worker (data=0x7f0b60152d40) at event-epoll.c:678 > #18 0x7f0b71a9adc5 in start_thread () from /lib64/libpthread.so.0 > #19 0x7f0b713dfced in clone () from /lib64/libc.so.6 > > > -- > > Respectfully > *Mahdi A. Mahdi* > > -- > From: mahdi.ad...@outlook.com > To: kdhan...@redhat.com > Date: Mon, 1 Aug 2016 16:31:50 +0300 > CC: gluster-users@gluster.org > Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash > > Many thanks, > > here's the results; > > > (gdb) p cur_block > $15 = 4088 > (gdb) p last_block > $16 = 4088 > (gdb) p local->first_block > $17 = 4087 > (gdb) p odirect > $18 = _gf_false > (gdb) p fd->flags > $19 = 2 > (gdb) p local->call_count > $20 = 2 > > > If you need more core dumps, i have several files i can upload. > > -- > > Respectfully > *Mahdi A. Mahdi* > > > > -- > From: kdhan...@redhat.com
Re: [Gluster-users] Glusterfs 3.7.13 node suddenly stops healing
Hi, Please attach the logs and "gluster volume info $VOLUMENAME" output here; -- Respectfully Mahdi A. Mahdi > From: davy.croo...@smartbit.be > To: gluster-users@gluster.org > Date: Wed, 3 Aug 2016 13:01:36 + > Subject: [Gluster-users] Glusterfs 3.7.13 node suddenly stops healing > > Hi all, > > About a month ago we deployed a Glusterfs 3.7.13 cluster with 6 nodes (3 x 2 > replication). Suddenly since this week one node in the cluster started > reporting unsynced entries once a day. If I then run a gluster volume heal > full command the unsynced entries disappear until the next day. For > completeness the reported unsynced entries are always different. > > I checked all logs but could find a clue what’s causing this. Anybody any > ideas? > > Kind regards > Davy > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Glusterfs 3.7.13 node suddenly stops healing
Hi all, About a month ago we deployed a Glusterfs 3.7.13 cluster with 6 nodes (3 x 2 replication). Suddenly since this week one node in the cluster started reporting unsynced entries once a day. If I then run a gluster volume heal full command the unsynced entries disappear until the next day. For completeness the reported unsynced entries are always different. I checked all logs but could find a clue what’s causing this. Anybody any ideas? Kind regards Davy ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS-3.7.14 released
Thanks Pranith, I am waiting for RPMs to show, I will do the tests as soon as possible and inform you. On Wed, Aug 3, 2016 at 11:19 PM, Pranith Kumar Karampuriwrote: > > > On Thu, Aug 4, 2016 at 1:47 AM, Pranith Kumar Karampuri > wrote: >> >> >> >> On Thu, Aug 4, 2016 at 12:51 AM, Serkan Çoban >> wrote: >>> >>> I use rpms for installation. Redhat/Centos 6.8. >> >> >> http://review.gluster.org/#/c/15084 is the patch. In some time the rpms >> will be built actually. > > > In the same URL above it will actually post the rpms for fedora/el6/el7 at > the end of the page. > >> >> >> Use gluster volume set disperse.shd-max-threads > (range: 1-64)> >> >> While testing this I thought of ways to decrease the number of crawls as >> well. But they are a bit involved. Try to create same set of data and see >> what is the time it takes to complete heals using number of threads as you >> increase the number of parallel heals from 1 to 64. >> >>> >>> On Wed, Aug 3, 2016 at 10:16 PM, Pranith Kumar Karampuri >>> wrote: >>> > >>> > >>> > On Thu, Aug 4, 2016 at 12:45 AM, Serkan Çoban >>> > wrote: >>> >> >>> >> I prefer 3.7 if it is ok for you. Can you also provide build >>> >> instructions? >>> > >>> > >>> > 3.7 should be fine. Do you use rpms/debs/anything-else? >>> > >>> >> >>> >> >>> >> On Wed, Aug 3, 2016 at 10:12 PM, Pranith Kumar Karampuri >>> >> wrote: >>> >> > >>> >> > >>> >> > On Thu, Aug 4, 2016 at 12:37 AM, Serkan Çoban >>> >> > >>> >> > wrote: >>> >> >> >>> >> >> Yes, but I can create 2+1(or 8+2) ec using two servers right? I >>> >> >> have >>> >> >> 26 disks on each server. >>> >> > >>> >> > >>> >> > On which release-branch do you want the patch? I am testing it on >>> >> > master-branch now. >>> >> > >>> >> >> >>> >> >> >>> >> >> On Wed, Aug 3, 2016 at 9:59 PM, Pranith Kumar Karampuri >>> >> >> wrote: >>> >> >> > >>> >> >> > >>> >> >> > On Thu, Aug 4, 2016 at 12:23 AM, Serkan Çoban >>> >> >> > >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> I have two of my storage servers free, I think I can use them >>> >> >> >> for >>> >> >> >> testing. Is two server testing environment ok for you? >>> >> >> > >>> >> >> > >>> >> >> > I think it would be better if you have at least 3. You can test >>> >> >> > it >>> >> >> > with >>> >> >> > 2+1 >>> >> >> > ec configuration. >>> >> >> > >>> >> >> >> >>> >> >> >> >>> >> >> >> On Wed, Aug 3, 2016 at 9:44 PM, Pranith Kumar Karampuri >>> >> >> >> wrote: >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > On Wed, Aug 3, 2016 at 6:01 PM, Serkan Çoban >>> >> >> >> > >>> >> >> >> > wrote: >>> >> >> >> >> >>> >> >> >> >> Hi, >>> >> >> >> >> >>> >> >> >> >> May I ask if multi-threaded self heal for distributed >>> >> >> >> >> disperse >>> >> >> >> >> volumes >>> >> >> >> >> implemented in this release? >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > Serkan, >>> >> >> >> > At the moment I am a bit busy with different work, Is >>> >> >> >> > it >>> >> >> >> > possible >>> >> >> >> > for you to help test the feature if I provide a patch? >>> >> >> >> > Actually >>> >> >> >> > the >>> >> >> >> > patch >>> >> >> >> > should be small. Testing is where lot of time will be spent >>> >> >> >> > on. >>> >> >> >> > >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> Thanks, >>> >> >> >> >> Serkan >>> >> >> >> >> >>> >> >> >> >> On Tue, Aug 2, 2016 at 5:30 PM, David Gossage >>> >> >> >> >> wrote: >>> >> >> >> >> > On Tue, Aug 2, 2016 at 6:01 AM, Lindsay Mathieson >>> >> >> >> >> > wrote: >>> >> >> >> >> >> >>> >> >> >> >> >> On 2/08/2016 5:07 PM, Kaushal M wrote: >>> >> >> >> >> >>> >>> >> >> >> >> >>> GlusterFS-3.7.14 has been released. This is a regular >>> >> >> >> >> >>> minor >>> >> >> >> >> >>> release. >>> >> >> >> >> >>> The release-notes are available at >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.14.md >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> Thanks Kaushal, I'll check it out >>> >> >> >> >> >> >>> >> >> >> >> > >>> >> >> >> >> > So far on my test box its working as expected. At least >>> >> >> >> >> > the >>> >> >> >> >> > issues >>> >> >> >> >> > that >>> >> >> >> >> > prevented it from running as before have disappeared. Will >>> >> >> >> >> > need >>> >> >> >> >> > to >>> >> >> >> >> > see >>> >> >> >> >> > how >>> >> >> >> >> > my test VM behaves after a few days. >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> >> -- >>> >> >> >> >> >> Lindsay Mathieson >>> >> >> >> >> >> >>> >> >> >> >> >> ___ >>> >> >> >> >> >> Gluster-users