Re: [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-16 Thread Martin Toth
Thanks for clarification, one more question.

When I will recover(boot) failed node back and this peer will be available 
again to remaining two nodes. How do I tell gluster to mark this brick as 
failed ?

I mean, I’ve booted failed node back without networking. Disk partition (ZFS 
pool on another disks) where brick was before failure is lost.
Now I can start gluster event when I don't have ZFS pool where failed brick was 
before ?

This wont be a problem when I will connect this node back to cluster ? (before 
brick replace/reset command will be issued)

Thanks. BR!
Martin

> On 11 Apr 2019, at 15:40, Karthik Subrahmanya  wrote:
> 
> 
> 
> On Thu, Apr 11, 2019 at 6:38 PM Martin Toth  > wrote:
> Hi Karthik,
> 
>> On Thu, Apr 11, 2019 at 12:43 PM Martin Toth > > wrote:
>> Hi Karthik,
>> 
>> more over, I would like to ask if there are some recommended 
>> settings/parameters for SHD in order to achieve good or fair I/O while 
>> volume will be healed when I will replace Brick (this should trigger healing 
>> process). 
>> If I understand you concern correctly, you need to get fair I/O performance 
>> for clients while healing takes place as part of  the replace brick 
>> operation. For this you can turn off the "data-self-heal" and 
>> "metadata-self-heal" options until the heal completes on the new brick.
> 
> This is exactly what I mean. I am running VM disks on remaining 2 (out of 3 - 
> one failed as mentioned) nodes and I need to ensure there will be fair I/O 
> performance available on these two nodes while replace brick operation will 
> heal volume.
> I will not run any VMs on node where replace brick operation will be running. 
> So if I understand correctly, when I will set :
> 
> # gluster volume set  cluster.data-self-heal off
> # gluster volume set  cluster.metadata-self-heal off
> 
> this will tell Gluster clients (libgfapi and FUSE mount) not to read from 
> node “where replace brick operation” is in place but from remaing two healthy 
> nodes. Is this correct ? Thanks for clarification.
> The reads will be served from one of the good bricks since the file will 
> either be not present on the replaced brick at the time of read or it will be 
> present but marked for heal if it is not already healed. If already healed by 
> SHD, then it could be served from the new brick as well, but there won't be 
> any problem in reading from there in that scenario.
> By setting these two options whenever a read comes from client it will not 
> try to heal the file for data/metadata. Otherwise it would try to heal (if 
> not already healed by SHD) when the read comes on this, hence slowing down 
> the client.
> 
>> Turning off client side healing doesn't compromise data integrity and 
>> consistency. During the read request from client, pending xattr is evaluated 
>> for replica copies and read is only served from correct copy. During writes, 
>> IO will continue on both the replicas, SHD will take care of healing files.
>> After replacing the brick, we strongly recommend you to consider upgrading 
>> your gluster to one of the maintained versions. We have many stability 
>> related fixes there, which can handle some critical issues and corner cases 
>> which you could hit during these kind of scenarios.
> 
> This will be first priority in infrastructure after fixing this cluster back 
> to fully functional replica3. I will upgrade to 3.12.x and then to version 5 
> or 6.
> Sounds good.
> 
> If you are planning to have the same name for the new brick and if you get 
> the error like "Brick may be containing or be contained by an existing brick" 
> even after using the force option, try  using a different name. That should 
> work.
> 
> Regards,
> Karthik 
> 
> BR, 
> Martin
> 
>> Regards,
>> Karthik
>> I had some problems in past when healing was triggered, VM disks became 
>> unresponsive because healing took most of I/O. My volume containing only big 
>> files with VM disks.
>> 
>> Thanks for suggestions.
>> BR, 
>> Martin
>> 
>>> On 10 Apr 2019, at 12:38, Martin Toth >> > wrote:
>>> 
>>> Thanks, this looks ok to me, I will reset brick because I don't have any 
>>> data anymore on failed node so I can use same path / brick name.
>>> 
>>> Is reseting brick dangerous command? Should I be worried about some 
>>> possible failure that will impact remaining two nodes? I am running really 
>>> old 3.7.6 but stable version.
>>> 
>>> Thanks,
>>> BR!
>>> 
>>> Martin
>>>  
>>> 
 On 10 Apr 2019, at 12:20, Karthik Subrahmanya >>> > wrote:
 
 Hi Martin,
 
 After you add the new disks and creating raid array, you can run the 
 following command to replace the old brick with new one:
 
 - If you are going to use a different name to the new brick you can run
 gluster volume replace-brickcommit force
 
 - If you are planning to use the same name for the new 

Re: [Gluster-users] Volume stuck unable to add a brick

2019-04-16 Thread Karthik Subrahmanya
You're welcome!

On Tue 16 Apr, 2019, 7:12 PM Boris Goldowsky,  wrote:

> That worked!  Thank you SO much!
>
>
>
> Boris
>
>
>
>
>
> *From: *Karthik Subrahmanya 
> *Date: *Tuesday, April 16, 2019 at 8:20 AM
> *To: *Boris Goldowsky 
> *Cc: *Atin Mukherjee , Gluster-users <
> gluster-users@gluster.org>
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
> Hi Boris,
>
>
>
> Thank you for providing the logs.
>
> The problem here is because of the "auth.allow: 127.0.0.1" setting on the
> volume.
>
> When you try to add a new brick to the volume internally replication
> module will try to set some metadata on the existing bricks to mark pending
> heal on the new brick, by creating a temporary mount. Because of the
> auth.allow setting that mount gets permission errors as seen in the below
> logs, leading to add-brick failure.
>
>
>
> From data-gluster-dockervols.log-webserver9 :
>
> [2019-04-15 14:00:34.226838] I [addr.c:55:compare_addr_and_update]
> 0-/data/gluster/dockervols: allowed = "127.0.0.1", received addr =
> "192.168.200.147"
>
> [2019-04-15 14:00:34.226895] E [MSGID: 115004]
> [authenticate.c:224:gf_authenticate] 0-auth: no authentication module is
> interested in accepting remote-client (null)
>
> [2019-04-15 14:00:34.227129] E [MSGID: 115001]
> [server-handshake.c:848:server_setvolume] 0-dockervols-server: Cannot
> authenticate client from
> webserver8.cast.org-55674-2019/04/15-14:00:20:495333-dockervols-client-2-0-0
> 3.12.2 [Permission denied]
>
>
>
> From dockervols-add-brick-mount.log :
>
> [2019-04-15 14:00:20.672033] W [MSGID: 114043]
> [client-handshake.c:1109:client_setvolume_cbk] 0-dockervols-client-2:
> failed to set the volume [Permission denied]
>
> [2019-04-15 14:00:20.672102] W [MSGID: 114007]
> [client-handshake.c:1138:client_setvolume_cbk] 0-dockervols-client-2:
> failed to get 'process-uuid' from reply dict [Invalid argument]
>
> [2019-04-15 14:00:20.672129] E [MSGID: 114044]
> [client-handshake.c:1144:client_setvolume_cbk] 0-dockervols-client-2:
> SETVOLUME on remote-host failed: Authentication failed [Permission denied]
>
> [2019-04-15 14:00:20.672151] I [MSGID: 114049]
> [client-handshake.c:1258:client_setvolume_cbk] 0-dockervols-client-2:
> sending AUTH_FAILED event
>
>
>
> This is a known issue and we are planning to fix this. For the time being
> we have a workaround for this.
>
> - Before you try adding the brick set the auth.allow option to default
> i.e., "*" or you can do this by running "gluster v reset 
> auth.allow"
>
> - Add the brick
>
> - After it succeeds set back the auth.allow option to the previous value.
>
>
>
> Regards,
>
> Karthik
>
>
>
> On Tue, Apr 16, 2019 at 5:20 PM Boris Goldowsky 
> wrote:
>
> OK, log files attached.
>
>
>
> Boris
>
>
>
>
>
> *From: *Karthik Subrahmanya 
> *Date: *Tuesday, April 16, 2019 at 2:52 AM
> *To: *Atin Mukherjee , Boris Goldowsky <
> bgoldow...@cast.org>
> *Cc: *Gluster-users 
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee 
> wrote:
>
> +Karthik Subrahmanya 
>
>
>
> Didn't we we fix this problem recently? Failed to set extended attribute
> indicates that temp mount is failing and we don't have quorum number of
> bricks up.
>
>
>
> We had two fixes which handles two kind of add-brick scenarios.
>
> [1] Fails add-brick when increasing the replica count if any of the brick
> is down to avoid data loss. This can be overridden by using the force
> option.
>
> [2] Allow add-brick to set the extended attributes by the temp mount if
> the volume is already mounted (has clients).
>
>
>
> They are in version 3.12.2 so, patch [1] is present there. But since they
> are using the force option it should not have any problem even if they have
> any brick down. The error message they are getting is also different, so it
> is not because of any brick being down I guess.
>
> Patch [2] is not present in 3.12.2 and it is not the conversion from plain
> distribute to replicate volume. So the scenario is different here.
>
> It seems like they are hitting some other issue.
>
>
>
> @Boris,
>
> Can you attach the add-brick's temp mount log. The file name should look
> something like "dockervols-add-brick-mount.log". Can you also provide all
> the brick logs of that volume during that time.
>
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/16330/
>
> [2] https://review.gluster.org/#/c/glusterfs/+/21791/
>
>
>
> Regards,
>
> Karthik
>
>
>
> Boris - What's the gluster version are you using?
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky 
> wrote:
>
> Atin, thank you for the reply.  Here are all of those pieces of
> information:
>
>
>
> [bgoldowsky@webserver9 ~]$ gluster --version
>
> glusterfs 3.12.2
>
> (same on all nodes)
>
>
>
> [bgoldowsky@webserver9 ~]$ sudo gluster peer status
>
> Number of Peers: 3
>
>
>
> Hostname: webserver11.cast.org
>
> Uuid: c2b147fd-cab4-4859-9922-db5730f8549d
>
> State: Peer in Cluster (Connected)
>

Re: [Gluster-users] Reg: Gluster

2019-04-16 Thread Aravinda
On Tue, 2019-04-16 at 11:27 +0530, Poornima Gurusiddaiah wrote:
> +Sunny
> 
> On Wed, Apr 10, 2019, 9:02 PM Gomathi Nayagam <
> gomathinayaga...@gmail.com> wrote:
> > Hi User,
> > 
> > We are testing geo-replication of gluster it is
> > taking nearly 8 mins to transfer 16 GB size of data between the DCs
> > while when transferred the same data over plain rsync it took only
> > 2 mins. Can we know if we are missing something?
> > 
> > 
> > 
> > 
> > Thanks & Regards,
> > Gomathi Nayagam.D
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


Geo-replication does many things to keep information about synced data
and track the new changes happened in the master Volume. Geo-
replication shines better when doing incremental sync that is when new
data is created or existing data is modified in Master volume.

Are you observing slowness even during incremental sync? (Current time
- Last Synced time in status output shows how much Slave Volume is
lagging compared to Master Volume)

-- 
regards
Aravinda

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed?

2019-04-16 Thread Hu Bert
Hi Poornima,

thx for your efforts. I made a couple of tests and the results are the
same, so the options are not related. Anyway, i'm not able to
reproduce the problem on my testing system, although the volume
options are the same.

About 1.5 hours ago i set performance.quick-read to on again and
watched: load/iowait went up (not bad at the moment, little traffic),
but network traffic went up - from <20 MBit/s up to 160 MBit/s. After
deactivating quick-read traffic dropped to < 20 MBit/s again.

munin graph: https://abload.de/img/network-client4s0kle.png

The 2nd peak is from the last test.


Thx,
Hubert

Am Di., 16. Apr. 2019 um 09:43 Uhr schrieb Hu Bert :
>
> In my first test on my testing setup the traffic was on a normal
> level, so i thought i was "safe". But on my live system the network
> traffic was a multiple of the traffic one would expect.
> performance.quick-read was enabled in both, the only difference in the
> volume options between live and testing are:
>
> performance.read-ahead: testing on, live off
> performance.io-cache: testing on, live off
>
> I ran another test on my testing setup, deactivated both and copied 9
> GB of data. Now the traffic went up as well, from before ~9-10 MBit/s
> up to 100 MBit/s with both options off. Does performance.quick-read
> require one of those options set to 'on'?
>
> I'll start another test shortly, and activate on of those 2 options,
> maybe there's a connection between those 3 options?
>
>
> Best Regards,
> Hubert
>
> Am Di., 16. Apr. 2019 um 08:57 Uhr schrieb Poornima Gurusiddaiah
> :
> >
> > Thank you for reporting this. I had done testing on my local setup and the 
> > issue was resolved even with quick-read enabled. Let me test it again.
> >
> > Regards,
> > Poornima
> >
> > On Mon, Apr 15, 2019 at 12:25 PM Hu Bert  wrote:
> >>
> >> fyi: after setting performance.quick-read to off network traffic
> >> dropped to normal levels, client load/iowait back to normal as well.
> >>
> >> client: https://abload.de/img/network-client-afterihjqi.png
> >> server: https://abload.de/img/network-server-afterwdkrl.png
> >>
> >> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert 
> >> :
> >> >
> >> > Good Morning,
> >> >
> >> > today i updated my replica 3 setup (debian stretch) from version 5.5
> >> > to 5.6, as i thought the network traffic bug (#1673058) was fixed and
> >> > i could re-activate 'performance.quick-read' again. See release notes:
> >> >
> >> > https://review.gluster.org/#/c/glusterfs/+/22538/
> >> > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795
> >> >
> >> > Upgrade went fine, and then i was watching iowait and network traffic.
> >> > It seems that the network traffic went up after upgrade and
> >> > reactivation of performance.quick-read. Here are some graphs:
> >> >
> >> > network client1: https://abload.de/img/network-clientfwj1m.png
> >> > network client2: https://abload.de/img/network-client2trkow.png
> >> > network server: https://abload.de/img/network-serverv3jjr.png
> >> >
> >> > gluster volume info: https://pastebin.com/ZMuJYXRZ
> >> >
> >> > Just wondering if the network traffic bug really got fixed or if this
> >> > is a new problem. I'll wait a couple of minutes and then deactivate
> >> > performance.quick-read again, just to see if network traffic goes down
> >> > to normal levels.
> >> >
> >> >
> >> > Best regards,
> >> > Hubert
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Volume stuck unable to add a brick

2019-04-16 Thread Karthik Subrahmanya
Hi Boris,

Thank you for providing the logs.
The problem here is because of the "auth.allow: 127.0.0.1" setting on the
volume.
When you try to add a new brick to the volume internally replication module
will try to set some metadata on the existing bricks to mark pending heal
on the new brick, by creating a temporary mount. Because of the auth.allow
setting that mount gets permission errors as seen in the below logs,
leading to add-brick failure.

>From data-gluster-dockervols.log-webserver9 :
[2019-04-15 14:00:34.226838] I [addr.c:55:compare_addr_and_update]
0-/data/gluster/dockervols: allowed = "127.0.0.1", received addr =
"192.168.200.147"
[2019-04-15 14:00:34.226895] E [MSGID: 115004]
[authenticate.c:224:gf_authenticate] 0-auth: no authentication module is
interested in accepting remote-client (null)
[2019-04-15 14:00:34.227129] E [MSGID: 115001]
[server-handshake.c:848:server_setvolume] 0-dockervols-server: Cannot
authenticate client from
webserver8.cast.org-55674-2019/04/15-14:00:20:495333-dockervols-client-2-0-0
3.12.2 [Permission denied]

>From dockervols-add-brick-mount.log :
[2019-04-15 14:00:20.672033] W [MSGID: 114043]
[client-handshake.c:1109:client_setvolume_cbk] 0-dockervols-client-2:
failed to set the volume [Permission denied]
[2019-04-15 14:00:20.672102] W [MSGID: 114007]
[client-handshake.c:1138:client_setvolume_cbk] 0-dockervols-client-2:
failed to get 'process-uuid' from reply dict [Invalid argument]
[2019-04-15 14:00:20.672129] E [MSGID: 114044]
[client-handshake.c:1144:client_setvolume_cbk] 0-dockervols-client-2:
SETVOLUME on remote-host failed: Authentication failed [Permission denied]
[2019-04-15 14:00:20.672151] I [MSGID: 114049]
[client-handshake.c:1258:client_setvolume_cbk] 0-dockervols-client-2:
sending AUTH_FAILED event

This is a known issue and we are planning to fix this. For the time being
we have a workaround for this.
- Before you try adding the brick set the auth.allow option to default
i.e., "*" or you can do this by running "gluster v reset 
auth.allow"
- Add the brick
- After it succeeds set back the auth.allow option to the previous value.

Regards,
Karthik

On Tue, Apr 16, 2019 at 5:20 PM Boris Goldowsky  wrote:

> OK, log files attached.
>
>
>
> Boris
>
>
>
>
>
> *From: *Karthik Subrahmanya 
> *Date: *Tuesday, April 16, 2019 at 2:52 AM
> *To: *Atin Mukherjee , Boris Goldowsky <
> bgoldow...@cast.org>
> *Cc: *Gluster-users 
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee 
> wrote:
>
> +Karthik Subrahmanya 
>
>
>
> Didn't we we fix this problem recently? Failed to set extended attribute
> indicates that temp mount is failing and we don't have quorum number of
> bricks up.
>
>
>
> We had two fixes which handles two kind of add-brick scenarios.
>
> [1] Fails add-brick when increasing the replica count if any of the brick
> is down to avoid data loss. This can be overridden by using the force
> option.
>
> [2] Allow add-brick to set the extended attributes by the temp mount if
> the volume is already mounted (has clients).
>
>
>
> They are in version 3.12.2 so, patch [1] is present there. But since they
> are using the force option it should not have any problem even if they have
> any brick down. The error message they are getting is also different, so it
> is not because of any brick being down I guess.
>
> Patch [2] is not present in 3.12.2 and it is not the conversion from plain
> distribute to replicate volume. So the scenario is different here.
>
> It seems like they are hitting some other issue.
>
>
>
> @Boris,
>
> Can you attach the add-brick's temp mount log. The file name should look
> something like "dockervols-add-brick-mount.log". Can you also provide all
> the brick logs of that volume during that time.
>
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/16330/
>
> [2] https://review.gluster.org/#/c/glusterfs/+/21791/
>
>
>
> Regards,
>
> Karthik
>
>
>
> Boris - What's the gluster version are you using?
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky 
> wrote:
>
> Atin, thank you for the reply.  Here are all of those pieces of
> information:
>
>
>
> [bgoldowsky@webserver9 ~]$ gluster --version
>
> glusterfs 3.12.2
>
> (same on all nodes)
>
>
>
> [bgoldowsky@webserver9 ~]$ sudo gluster peer status
>
> Number of Peers: 3
>
>
>
> Hostname: webserver11.cast.org
>
> Uuid: c2b147fd-cab4-4859-9922-db5730f8549d
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: webserver1.cast.org
>
> Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> 192.168.200.131
>
> webserver1
>
>
>
> Hostname: webserver8.cast.org
>
> Uuid: be2f568b-61c5-4016-9264-083e4e6453a2
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> webserver8
>
>
>
> [bgoldowsky@webserver1 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot 

Re: [Gluster-users] Volume stuck unable to add a brick

2019-04-16 Thread Boris Goldowsky
OK, log files attached.

Boris


From: Karthik Subrahmanya 
Date: Tuesday, April 16, 2019 at 2:52 AM
To: Atin Mukherjee , Boris Goldowsky 

Cc: Gluster-users 
Subject: Re: [Gluster-users] Volume stuck unable to add a brick



On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee 
mailto:atin.mukherje...@gmail.com>> wrote:
+Karthik Subrahmanya

Didn't we we fix this problem recently? Failed to set extended attribute 
indicates that temp mount is failing and we don't have quorum number of bricks 
up.

We had two fixes which handles two kind of add-brick scenarios.
[1] Fails add-brick when increasing the replica count if any of the brick is 
down to avoid data loss. This can be overridden by using the force option.
[2] Allow add-brick to set the extended attributes by the temp mount if the 
volume is already mounted (has clients).

They are in version 3.12.2 so, patch [1] is present there. But since they are 
using the force option it should not have any problem even if they have any 
brick down. The error message they are getting is also different, so it is not 
because of any brick being down I guess.
Patch [2] is not present in 3.12.2 and it is not the conversion from plain 
distribute to replicate volume. So the scenario is different here.
It seems like they are hitting some other issue.

@Boris,
Can you attach the add-brick's temp mount log. The file name should look 
something like "dockervols-add-brick-mount.log". Can you also provide all the 
brick logs of that volume during that time.

[1] https://review.gluster.org/#/c/glusterfs/+/16330/
[2] https://review.gluster.org/#/c/glusterfs/+/21791/

Regards,
Karthik

Boris - What's the gluster version are you using?



On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky 
mailto:bgoldow...@cast.org>> wrote:
Atin, thank you for the reply.  Here are all of those pieces of information:


[bgoldowsky@webserver9 ~]$ gluster --version

glusterfs 3.12.2
(same on all nodes)


[bgoldowsky@webserver9 ~]$ sudo gluster peer status

Number of Peers: 3



Hostname: webserver11.cast.org

Uuid: c2b147fd-cab4-4859-9922-db5730f8549d

State: Peer in Cluster (Connected)



Hostname: webserver1.cast.org

Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c

State: Peer in Cluster (Connected)

Other names:

192.168.200.131

webserver1



Hostname: webserver8.cast.org

Uuid: be2f568b-61c5-4016-9264-083e4e6453a2

State: Peer in Cluster (Connected)

Other names:

webserver8


[bgoldowsky@webserver1 ~]$ sudo gluster v info

Volume Name: dockervols

Type: Replicate

Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/dockervols

Brick2: webserver11:/data/gluster/dockervols

Brick3: webserver9:/data/gluster/dockervols

Options Reconfigured:

nfs.disable: on

transport.address-family: inet

auth.allow: 127.0.0.1



Volume Name: testvol

Type: Replicate

Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 4 = 4

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/testvol

Brick2: webserver9:/data/gluster/testvol

Brick3: webserver11:/data/gluster/testvol

Brick4: webserver8:/data/gluster/testvol

Options Reconfigured:

transport.address-family: inet

nfs.disable: on


[bgoldowsky@webserver8 ~]$ sudo gluster v info

Volume Name: dockervols

Type: Replicate

Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/dockervols

Brick2: webserver11:/data/gluster/dockervols

Brick3: webserver9:/data/gluster/dockervols

Options Reconfigured:

nfs.disable: on

transport.address-family: inet

auth.allow: 127.0.0.1



Volume Name: testvol

Type: Replicate

Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 4 = 4

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/testvol

Brick2: webserver9:/data/gluster/testvol

Brick3: webserver11:/data/gluster/testvol

Brick4: webserver8:/data/gluster/testvol

Options Reconfigured:

nfs.disable: on

transport.address-family: inet


[bgoldowsky@webserver9 ~]$ sudo gluster v info

Volume Name: dockervols

Type: Replicate

Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: webserver1:/data/gluster/dockervols

Brick2: webserver11:/data/gluster/dockervols

Brick3: webserver9:/data/gluster/dockervols

Options Reconfigured:

nfs.disable: on

transport.address-family: inet

auth.allow: 127.0.0.1



Volume Name: testvol

Type: Replicate

Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 4 = 4

Transport-type: tcp

Bricks:

Brick1: 

Re: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed?

2019-04-16 Thread Hu Bert
In my first test on my testing setup the traffic was on a normal
level, so i thought i was "safe". But on my live system the network
traffic was a multiple of the traffic one would expect.
performance.quick-read was enabled in both, the only difference in the
volume options between live and testing are:

performance.read-ahead: testing on, live off
performance.io-cache: testing on, live off

I ran another test on my testing setup, deactivated both and copied 9
GB of data. Now the traffic went up as well, from before ~9-10 MBit/s
up to 100 MBit/s with both options off. Does performance.quick-read
require one of those options set to 'on'?

I'll start another test shortly, and activate on of those 2 options,
maybe there's a connection between those 3 options?


Best Regards,
Hubert

Am Di., 16. Apr. 2019 um 08:57 Uhr schrieb Poornima Gurusiddaiah
:
>
> Thank you for reporting this. I had done testing on my local setup and the 
> issue was resolved even with quick-read enabled. Let me test it again.
>
> Regards,
> Poornima
>
> On Mon, Apr 15, 2019 at 12:25 PM Hu Bert  wrote:
>>
>> fyi: after setting performance.quick-read to off network traffic
>> dropped to normal levels, client load/iowait back to normal as well.
>>
>> client: https://abload.de/img/network-client-afterihjqi.png
>> server: https://abload.de/img/network-server-afterwdkrl.png
>>
>> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert :
>> >
>> > Good Morning,
>> >
>> > today i updated my replica 3 setup (debian stretch) from version 5.5
>> > to 5.6, as i thought the network traffic bug (#1673058) was fixed and
>> > i could re-activate 'performance.quick-read' again. See release notes:
>> >
>> > https://review.gluster.org/#/c/glusterfs/+/22538/
>> > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795
>> >
>> > Upgrade went fine, and then i was watching iowait and network traffic.
>> > It seems that the network traffic went up after upgrade and
>> > reactivation of performance.quick-read. Here are some graphs:
>> >
>> > network client1: https://abload.de/img/network-clientfwj1m.png
>> > network client2: https://abload.de/img/network-client2trkow.png
>> > network server: https://abload.de/img/network-serverv3jjr.png
>> >
>> > gluster volume info: https://pastebin.com/ZMuJYXRZ
>> >
>> > Just wondering if the network traffic bug really got fixed or if this
>> > is a new problem. I'll wait a couple of minutes and then deactivate
>> > performance.quick-read again, just to see if network traffic goes down
>> > to normal levels.
>> >
>> >
>> > Best regards,
>> > Hubert
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed?

2019-04-16 Thread Poornima Gurusiddaiah
Thank you for reporting this. I had done testing on my local setup and the
issue was resolved even with quick-read enabled. Let me test it again.

Regards,
Poornima

On Mon, Apr 15, 2019 at 12:25 PM Hu Bert  wrote:

> fyi: after setting performance.quick-read to off network traffic
> dropped to normal levels, client load/iowait back to normal as well.
>
> client: https://abload.de/img/network-client-afterihjqi.png
> server: https://abload.de/img/network-server-afterwdkrl.png
>
> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert  >:
> >
> > Good Morning,
> >
> > today i updated my replica 3 setup (debian stretch) from version 5.5
> > to 5.6, as i thought the network traffic bug (#1673058) was fixed and
> > i could re-activate 'performance.quick-read' again. See release notes:
> >
> > https://review.gluster.org/#/c/glusterfs/+/22538/
> >
> http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795
> >
> > Upgrade went fine, and then i was watching iowait and network traffic.
> > It seems that the network traffic went up after upgrade and
> > reactivation of performance.quick-read. Here are some graphs:
> >
> > network client1: https://abload.de/img/network-clientfwj1m.png
> > network client2: https://abload.de/img/network-client2trkow.png
> > network server: https://abload.de/img/network-serverv3jjr.png
> >
> > gluster volume info: https://pastebin.com/ZMuJYXRZ
> >
> > Just wondering if the network traffic bug really got fixed or if this
> > is a new problem. I'll wait a couple of minutes and then deactivate
> > performance.quick-read again, just to see if network traffic goes down
> > to normal levels.
> >
> >
> > Best regards,
> > Hubert
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Difference between processes: shrinking volume and replacing faulty brick

2019-04-16 Thread Poornima Gurusiddaiah
Do you have plain distributed volume without any replication? If so replace
brick should copy the data on the faulty brick to the new brick, unless
there is some old data which also would need rebalance.

Having, add brick followed by remove brick and doing a rebalance is
inefficient, i think we should have just the old brick data copied to the
new brick, and rebalance the whole volume when necessary. Adding the
distribute experts to the thread.

If you are ok with downtime, trying xfsdump and restore  of the faulty
brick and reforming the volume may be faster.

Regards,
Poornima

On Mon, Apr 15, 2019, 6:40 PM Greene, Tami McFarlin 
wrote:

> We need to remove a server node from our configuration (distributed
> volume).  There is more than enough space on the remaining bricks to
> accept the data attached to the failing server;  we didn’t know if one
> process or the other would be significantly faster.  We know shrinking the
> volume (remove-brick) rebalances as it moves the data; so moving 506G
> resuled in the rebalancing of 1.8T and took considerable time.
>
>
>
> Reading the documentation, it seems that replacing a brick is simplying
> introducing an empty brick to accept the displaced data, but it is the
> exact same process: remove-brick.
>
>
>
> Is there anyway to migrate the data without rebalancing at the same time
> and then rebalancing once all data has been moved?  I know that is not
> ideal, but it would allow us to remove the problem server much quicker and
> resume production while rebalancing.
>
>
>
> Tami
>
>
>
> Tami McFarlin Greene
>
> Lab Technician
>
> RF, Communications, and Intelligent Systems Group
>
> Electrical and Electronics System Research Division
>
> Oak Ridge National Laboratory
>
> Bldg. 3500, Rm. A15
>
> gree...@ornl.gov   (865)
> 643-0401
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Volume stuck unable to add a brick

2019-04-16 Thread Karthik Subrahmanya
On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee 
wrote:

> +Karthik Subrahmanya 
>
> Didn't we we fix this problem recently? Failed to set extended attribute
> indicates that temp mount is failing and we don't have quorum number of
> bricks up.
>

We had two fixes which handles two kind of add-brick scenarios.
[1] Fails add-brick when increasing the replica count if any of the brick
is down to avoid data loss. This can be overridden by using the force
option.
[2] Allow add-brick to set the extended attributes by the temp mount if the
volume is already mounted (has clients).

They are in version 3.12.2 so, patch [1] is present there. But since they
are using the force option it should not have any problem even if they have
any brick down. The error message they are getting is also different, so it
is not because of any brick being down I guess.
Patch [2] is not present in 3.12.2 and it is not the conversion from plain
distribute to replicate volume. So the scenario is different here.
It seems like they are hitting some other issue.

@Boris,
Can you attach the add-brick's temp mount log. The file name should look
something like "dockervols-add-brick-mount.log". Can you also provide all
the brick logs of that volume during that time.

[1] https://review.gluster.org/#/c/glusterfs/+/16330/
[2] https://review.gluster.org/#/c/glusterfs/+/21791/

Regards,
Karthik

>
> Boris - What's the gluster version are you using?
>
>
>
> On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky 
> wrote:
>
>> Atin, thank you for the reply.  Here are all of those pieces of
>> information:
>>
>>
>>
>> [bgoldowsky@webserver9 ~]$ gluster --version
>>
>> glusterfs 3.12.2
>>
>> (same on all nodes)
>>
>>
>>
>> [bgoldowsky@webserver9 ~]$ sudo gluster peer status
>>
>> Number of Peers: 3
>>
>>
>>
>> Hostname: webserver11.cast.org
>>
>> Uuid: c2b147fd-cab4-4859-9922-db5730f8549d
>>
>> State: Peer in Cluster (Connected)
>>
>>
>>
>> Hostname: webserver1.cast.org
>>
>> Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c
>>
>> State: Peer in Cluster (Connected)
>>
>> Other names:
>>
>> 192.168.200.131
>>
>> webserver1
>>
>>
>>
>> Hostname: webserver8.cast.org
>>
>> Uuid: be2f568b-61c5-4016-9264-083e4e6453a2
>>
>> State: Peer in Cluster (Connected)
>>
>> Other names:
>>
>> webserver8
>>
>>
>>
>> [bgoldowsky@webserver1 ~]$ sudo gluster v info
>>
>> Volume Name: dockervols
>>
>> Type: Replicate
>>
>> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 3 = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/dockervols
>>
>> Brick2: webserver11:/data/gluster/dockervols
>>
>> Brick3: webserver9:/data/gluster/dockervols
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> transport.address-family: inet
>>
>> auth.allow: 127.0.0.1
>>
>>
>>
>> Volume Name: testvol
>>
>> Type: Replicate
>>
>> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 4 = 4
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/testvol
>>
>> Brick2: webserver9:/data/gluster/testvol
>>
>> Brick3: webserver11:/data/gluster/testvol
>>
>> Brick4: webserver8:/data/gluster/testvol
>>
>> Options Reconfigured:
>>
>> transport.address-family: inet
>>
>> nfs.disable: on
>>
>>
>>
>> [bgoldowsky@webserver8 ~]$ sudo gluster v info
>>
>> Volume Name: dockervols
>>
>> Type: Replicate
>>
>> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 3 = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/dockervols
>>
>> Brick2: webserver11:/data/gluster/dockervols
>>
>> Brick3: webserver9:/data/gluster/dockervols
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> transport.address-family: inet
>>
>> auth.allow: 127.0.0.1
>>
>>
>>
>> Volume Name: testvol
>>
>> Type: Replicate
>>
>> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 4 = 4
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/testvol
>>
>> Brick2: webserver9:/data/gluster/testvol
>>
>> Brick3: webserver11:/data/gluster/testvol
>>
>> Brick4: webserver8:/data/gluster/testvol
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> transport.address-family: inet
>>
>>
>>
>> [bgoldowsky@webserver9 ~]$ sudo gluster v info
>>
>> Volume Name: dockervols
>>
>> Type: Replicate
>>
>> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 3 = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/dockervols
>>
>> Brick2: webserver11:/data/gluster/dockervols
>>
>> Brick3: webserver9:/data/gluster/dockervols
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> transport.address-family: inet
>>
>> auth.allow: 127.0.0.1
>>
>>
>>
>> Volume Name: testvol
>>
>> Type: Replicate
>>
>>