Re: [Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Atin Mukherjee
On Thu, Jul 6, 2017 at 3:47 AM, Gianluca Cecchi 
wrote:

> On Wed, Jul 5, 2017 at 6:39 PM, Atin Mukherjee 
> wrote:
>
>> OK, so the log just hints to the following:
>>
>> [2017-07-05 15:04:07.178204] E [MSGID: 106123]
>> [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit
>> failed for operation Reset Brick on local node
>> [2017-07-05 15:04:07.178214] E [MSGID: 106123]
>> [glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases]
>> 0-management: Commit Op Failed
>>
>> While going through the code, glusterd_op_reset_brick () failed resulting
>> into these logs. Now I don't see any error logs generated from
>> glusterd_op_reset_brick () which makes me thing that have we failed from a
>> place where we log the failure in debug mode. Would you be able to restart
>> glusterd service with debug log mode and reran this test and share the log?
>>
>>
> Do you mean to run the reset-brick command for another volume or for the
> same? Can I run it against this "now broken" volume?
>
> Or perhaps can I modify /usr/lib/systemd/system/glusterd.service and
> change in [service] section
>
> from
> Environment="LOG_LEVEL=INFO"
>
> to
> Environment="LOG_LEVEL=DEBUG"
>
> and then
> systemctl daemon-reload
> systemctl restart glusterd
>

Yes, that's how you can run glusterd in debug log mode.

>
> I think it would be better to keep gluster in debug mode the less time
> possible, as there are other volumes active right now, and I want to
> prevent fill the log files file system
> Best to put only some components in debug mode if possible as in the
> example commands above.
>

You can switch back to info mode the moment this is hit one more time with
the debug log enabled. What I'd need here is the glusterd log (with debug
mode) to figure out the exact cause of the failure.


>
> Let me know,
> thanks
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] [New Release] GlusterD2 v4.0dev-7

2017-07-05 Thread Prashanth Pai
On Wednesday, July 5, 2017, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Il 5 lug 2017 11:31 AM, "Kaushal M"  > ha scritto:
>
> - Preliminary support for volume expansion has been added. (Note that
> rebalancing is not available yet)
>
>
> What do you mean with this?
> Any differences in volume expansion from the current architecture?
>

No. It's still the same.
Glusterd2 hasn't implemented volume rebalancing yet. It will be there,
eventually.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-07-05 Thread Krutika Dhananjay
What if you disabled eager lock and run your test again on the sharded
configuration along with the profile output?

# gluster volume set  cluster.eager-lock off

-Krutika

On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay 
wrote:

> Thanks. I think reusing the same volume was the cause of lack of IO
> distribution.
> The latest profile output looks much more realistic and in line with i
> would expect.
>
> Let me analyse the numbers a bit and get back.
>
> -Krutika
>
> On Tue, Jul 4, 2017 at 12:55 PM,  wrote:
>
>> Hi Krutika,
>>
>>
>>
>> Thank you so much for myour reply. Let me answer all:
>>
>>
>>
>>1. I have no idea why it did not get distributed over all bricks.
>>2. Hm.. This is really weird.
>>
>>
>>
>> And others;
>>
>>
>>
>> No. I use only one volume. When I tested sharded and striped volumes, I
>> manually stopped volume, deleted volume, purged data (data inside of
>> bricks/disks) and re-create by using this command:
>>
>>
>>
>> sudo gluster volume create testvol replica 2
>> sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1
>> sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2
>> sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3
>> sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4
>> sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5
>> sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6
>> sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7
>> sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8
>> sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9
>> sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10
>> force
>>
>>
>>
>> and of course after that volume start executed. If shard enabled, I
>> enable that feature BEFORE I start the sharded volume than mount.
>>
>>
>>
>> I tried converting from one to another but then I saw documentation says
>> clean voluje should be better. So I tried clean method. Still same
>> performance.
>>
>>
>>
>> Testfile grows from 1GB to 5GB. And tests are dd. See this example:
>>
>>
>>
>> dd if=/dev/zero of=/mnt/testfile bs=1G count=5
>>
>> 5+0 records in
>>
>> 5+0 records out
>>
>> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
>>
>>
>>
>>
>>
>> >> dd if=/dev/zero of=/mnt/testfile bs=5G count=1
>>
>> This also gives same result. (bs and count reversed)
>>
>>
>>
>>
>>
>> And this example have generated a profile which I also attached to this
>> e-mail.
>>
>>
>>
>> Is there anything that I can try? I am open to all kind of suggestions.
>>
>>
>>
>> Thanks,
>>
>> Gencer.
>>
>>
>>
>> *From:* Krutika Dhananjay [mailto:kdhan...@redhat.com]
>> *Sent:* Tuesday, July 4, 2017 9:39 AM
>>
>> *To:* gen...@gencgiyen.com
>> *Cc:* gluster-user 
>> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>>
>>
>>
>> Hi Gencer,
>>
>> I just checked the volume-profile attachments.
>>
>> Things that seem really odd to me as far as the sharded volume is
>> concerned:
>>
>> 1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
>> seems to have witnessed all the IO. No other bricks witnessed any write
>> operations. This is unacceptable for a volume that has 8 other replica
>> sets. Why didn't the shards get distributed across all of these sets?
>>
>>
>>
>> 2. For replica set consisting of bricks 5 and 6 of node 09, I see that
>> the brick 5 is spending 99% of its time in FINODELK fop, when the fop that
>> should have dominated its profile should have been in fact WRITE.
>>
>> Could you throw some more light on your setup from gluster standpoint?
>> * For instance, are you using two different gluster volumes to gather
>> these numbers - one distributed-replicated-striped and another
>> distributed-replicated-sharded? Or are you merely converting a single
>> volume from one type to another?
>>
>>
>>
>> * And if there are indeed two volumes, could you share both their `volume
>> info` outputs to eliminate any confusion?
>>
>> * If there's just one volume, are you taking care to remove all data from
>> the mount point of this volume before converting it?
>>
>> * What is the size the test file grew to?
>>
>> * These attached profiles are against dd runs? Or the file download test?
>>
>>
>>
>> -Krutika
>>
>>
>>
>>
>>
>> On Mon, Jul 3, 2017 at 8:42 PM,  wrote:
>>
>> Hi Krutika,
>>
>>
>>
>> Have you be able to look out my profiles? Do you have any clue, idea or
>> suggestion?
>>
>>
>>
>> Thanks,
>>
>> -Gencer
>>
>>
>>
>> *From:* Krutika Dhananjay [mailto:kdhan...@redhat.com]
>> *Sent:* Friday, June 30, 2017 3:50 PM
>>
>>
>> *To:* gen...@gencgiyen.com
>> *Cc:* gluster-user 
>> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>>
>>
>>
>> Just noticed that the way you have configured your brick order during
>> volume-create makes 

Re: [Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Gianluca Cecchi
On Wed, Jul 5, 2017 at 6:39 PM, Atin Mukherjee  wrote:

> OK, so the log just hints to the following:
>
> [2017-07-05 15:04:07.178204] E [MSGID: 106123] 
> [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit]
> 0-management: Commit failed for operation Reset Brick on local node
> [2017-07-05 15:04:07.178214] E [MSGID: 106123]
> [glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases]
> 0-management: Commit Op Failed
>
> While going through the code, glusterd_op_reset_brick () failed resulting
> into these logs. Now I don't see any error logs generated from
> glusterd_op_reset_brick () which makes me thing that have we failed from a
> place where we log the failure in debug mode. Would you be able to restart
> glusterd service with debug log mode and reran this test and share the log?
>
>
What's the best way to set glusterd in debug mode?
Can I set this volume, and work on it even if it is now compromised?

I ask because I have tried this:

[root@ovirt01 ~]# gluster volume get export diagnostics.brick-log-level
Option
Value
--
-
diagnostics.brick-log-level INFO


[root@ovirt01 ~]# gluster volume set export diagnostics.brick-log-level
DEBUG
volume set: failed: Error, Validation Failed
[root@ovirt01 ~]#

While on another volume that is in good state, I can run

[root@ovirt01 ~]# gluster volume set iso diagnostics.brick-log-level DEBUG
volume set: success
[root@ovirt01 ~]#

[root@ovirt01 ~]# gluster volume get iso diagnostics.brick-log-level
Option
Value
--
-
diagnostics.brick-log-level DEBUG

[root@ovirt01 ~]# gluster volume set iso diagnostics.brick-log-level INFO
volume set: success
[root@ovirt01 ~]#

 [root@ovirt01 ~]# gluster volume get iso diagnostics.brick-log-level
Option
Value
--
-
diagnostics.brick-log-level
INFO
[root@ovirt01 ~]#

Do you mean to run the reset-brick command for another volume or for the
same? Can I run it against this "now broken" volume?

Or perhaps can I modify /usr/lib/systemd/system/glusterd.service and change
in [service] section

from
Environment="LOG_LEVEL=INFO"

to
Environment="LOG_LEVEL=DEBUG"

and then
systemctl daemon-reload
systemctl restart glusterd

I think it would be better to keep gluster in debug mode the less time
possible, as there are other volumes active right now, and I want to
prevent fill the log files file system
Best to put only some components in debug mode if possible as in the
example commands above.

Let me know,
thanks
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Atin Mukherjee
OK, so the log just hints to the following:

[2017-07-05 15:04:07.178204] E [MSGID: 106123]
[glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit failed
for operation Reset Brick on local node
[2017-07-05 15:04:07.178214] E [MSGID: 106123]
[glusterd-replace-brick.c:649:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases]
0-management: Commit Op Failed

While going through the code, glusterd_op_reset_brick () failed resulting
into these logs. Now I don't see any error logs generated from
glusterd_op_reset_brick () which makes me thing that have we failed from a
place where we log the failure in debug mode. Would you be able to restart
glusterd service with debug log mode and reran this test and share the log?


On Wed, Jul 5, 2017 at 9:12 PM, Gianluca Cecchi 
wrote:

>
>
> On Wed, Jul 5, 2017 at 5:22 PM, Atin Mukherjee 
> wrote:
>
>> And what does glusterd log indicate for these failures?
>>
>
>
> See here in gzip format
>
> https://drive.google.com/file/d/0BwoPbcrMv8mvYmlRLUgyV0pFN0k/
> view?usp=sharing
>
> It seems that on each host the peer files have been updated with a new
> entry "hostname2":
>
> [root@ovirt01 ~]# cat /var/lib/glusterd/peers/*
> uuid=b89311fe-257f-4e44-8e15-9bff6245d689
> state=3
> hostname1=ovirt02.localdomain.local
> hostname2=10.10.2.103
> uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
> state=3
> hostname1=ovirt03.localdomain.local
> hostname2=10.10.2.104
> [root@ovirt01 ~]#
>
> [root@ovirt02 ~]# cat /var/lib/glusterd/peers/*
> uuid=e9717281-a356-42aa-a579-a4647a29a0bc
> state=3
> hostname1=ovirt01.localdomain.local
> hostname2=10.10.2.102
> uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
> state=3
> hostname1=ovirt03.localdomain.local
> hostname2=10.10.2.104
> [root@ovirt02 ~]#
>
> [root@ovirt03 ~]# cat /var/lib/glusterd/peers/*
> uuid=b89311fe-257f-4e44-8e15-9bff6245d689
> state=3
> hostname1=ovirt02.localdomain.local
> hostname2=10.10.2.103
> uuid=e9717281-a356-42aa-a579-a4647a29a0bc
> state=3
> hostname1=ovirt01.localdomain.local
> hostname2=10.10.2.102
> [root@ovirt03 ~]#
>
>
> But not the gluster info on the second and third node that have lost the
> ovirt01/gl01 host brick information...
>
> Eg on ovirt02
>
>
> [root@ovirt02 peers]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 0 x (2 + 1) = 2
> Transport-type: tcp
> Bricks:
> Brick1: ovirt02.localdomain.local:/gluster/brick3/export
> Brick2: ovirt03.localdomain.local:/gluster/brick3/export
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 6
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
> [root@ovirt02 peers]#
>
> And on ovirt03
>
> [root@ovirt03 ~]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 0 x (2 + 1) = 2
> Transport-type: tcp
> Bricks:
> Brick1: ovirt02.localdomain.local:/gluster/brick3/export
> Brick2: ovirt03.localdomain.local:/gluster/brick3/export
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 6
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
> [root@ovirt03 ~]#
>
> While on ovirt01 it seems isolated...
>
> [root@ovirt01 ~]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 0 x (2 + 1) = 1
> Transport-type: tcp
> Bricks:
> Brick1: gl01.localdomain.local:/gluster/brick3/export
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> 

Re: [Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Gianluca Cecchi
On Wed, Jul 5, 2017 at 5:22 PM, Atin Mukherjee  wrote:

> And what does glusterd log indicate for these failures?
>


See here in gzip format

https://drive.google.com/file/d/0BwoPbcrMv8mvYmlRLUgyV0pFN0k/view?usp=sharing


It seems that on each host the peer files have been updated with a new
entry "hostname2":

[root@ovirt01 ~]# cat /var/lib/glusterd/peers/*
uuid=b89311fe-257f-4e44-8e15-9bff6245d689
state=3
hostname1=ovirt02.localdomain.local
hostname2=10.10.2.103
uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
state=3
hostname1=ovirt03.localdomain.local
hostname2=10.10.2.104
[root@ovirt01 ~]#

[root@ovirt02 ~]# cat /var/lib/glusterd/peers/*
uuid=e9717281-a356-42aa-a579-a4647a29a0bc
state=3
hostname1=ovirt01.localdomain.local
hostname2=10.10.2.102
uuid=ec81a04c-a19c-4d31-9d82-7543cefe79f3
state=3
hostname1=ovirt03.localdomain.local
hostname2=10.10.2.104
[root@ovirt02 ~]#

[root@ovirt03 ~]# cat /var/lib/glusterd/peers/*
uuid=b89311fe-257f-4e44-8e15-9bff6245d689
state=3
hostname1=ovirt02.localdomain.local
hostname2=10.10.2.103
uuid=e9717281-a356-42aa-a579-a4647a29a0bc
state=3
hostname1=ovirt01.localdomain.local
hostname2=10.10.2.102
[root@ovirt03 ~]#


But not the gluster info on the second and third node that have lost the
ovirt01/gl01 host brick information...

Eg on ovirt02


[root@ovirt02 peers]# gluster volume info export

Volume Name: export
Type: Replicate
Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
Status: Started
Snapshot Count: 0
Number of Bricks: 0 x (2 + 1) = 2
Transport-type: tcp
Bricks:
Brick1: ovirt02.localdomain.local:/gluster/brick3/export
Brick2: ovirt03.localdomain.local:/gluster/brick3/export
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 1
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on
[root@ovirt02 peers]#

And on ovirt03

[root@ovirt03 ~]# gluster volume info export

Volume Name: export
Type: Replicate
Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
Status: Started
Snapshot Count: 0
Number of Bricks: 0 x (2 + 1) = 2
Transport-type: tcp
Bricks:
Brick1: ovirt02.localdomain.local:/gluster/brick3/export
Brick2: ovirt03.localdomain.local:/gluster/brick3/export
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 1
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on
[root@ovirt03 ~]#

While on ovirt01 it seems isolated...

[root@ovirt01 ~]# gluster volume info export

Volume Name: export
Type: Replicate
Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
Status: Started
Snapshot Count: 0
Number of Bricks: 0 x (2 + 1) = 1
Transport-type: tcp
Bricks:
Brick1: gl01.localdomain.local:/gluster/brick3/export
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 1
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on
[root@ovirt01 ~]#
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow write times to gluster disk

2017-07-05 Thread Pat Haley


Hi Soumya,

(1) In http://mseas.mit.edu/download/phaley/GlusterUsers/TestNFSmount/ 
I've placed the following 2 log files


etc-glusterfs-glusterd.vol.log
gdata.log

The first has repeated messages about nfs disconnects.  The second had 
the .log name (but not much information).


(2) About the gluster-NFS native server:  do you know where we can find 
documentation on how to use/install it?  We haven't had success in our 
searches.


Thanks

Pat


On 07/04/2017 05:01 AM, Soumya Koduri wrote:



On 07/03/2017 09:01 PM, Pat Haley wrote:


Hi Soumya,

When I originally did the tests I ran tcpdump on the client.

I have rerun the tests, doing tcpdump on the server

tcpdump -i any -nnSs 0 host 172.16.1.121 -w /root/capture_nfsfail.pcap

The results are in the same place

http://mseas.mit.edu/download/phaley/GlusterUsers/TestNFSmount/

capture_nfsfail.pcap   has the results from the failed touch experiment
capture_nfssucceed.pcap  has the results from the successful touch
experiment

The brick log files are there too.


Thanks for sharing. Looks like the error is not generated 
@gluster-server side. The permission denied error was caused by either 
kNFS or by fuse-mnt process or probably by the combination.


To check fuse-mnt logs, please look at 
/var/log/glusterfs/.log


For eg.: if you have fuse mounted the gluster volume at /mnt/fuse-mnt 
and exported it via kNFS, the log location for that fuse_mnt shall be 
at /var/log/glusterfs/mnt-fuse-mnt.log



Also why not switch to either gluster-NFS native server or NFS-Ganesha 
instead of using kNFS, as they are recommended NFS servers to use with 
gluster?


Thanks,
Soumya



I believe we are using kernel-NFS exporting a fuse mounted gluster
volume.  I am having Steve confirm this.  I tried to find the fuse-mnt
logs but failed.  Where should I look for them?

Thanks

Pat



On 07/03/2017 07:58 AM, Soumya Koduri wrote:



On 06/30/2017 07:56 PM, Pat Haley wrote:


Hi,

I was wondering if there were any additional test we could perform to
help debug the group write-permissions issue?


Sorry for the delay. Please find response inline --



Thanks

Pat


On 06/27/2017 12:29 PM, Pat Haley wrote:


Hi Soumya,

One example, we have a common working directory dri_fleat in the
gluster volume

drwxrwsr-x 22 root dri_fleat 4.0K May  1 15:14 dri_fleat

my user (phaley) does not own that directory but is a member of the
group  dri_fleat and should have write permissions.  When I go to the
nfs-mounted version and try to use the touch command I get the
following

ibfdr-compute-0-4(dri_fleat)% touch dum
touch: cannot touch `dum': Permission denied

One of the sub-directories under dri_fleat is "test" which phaley 
owns


drwxrwsr-x  2 phaley   dri_fleat 4.0K May  1 15:16 test

Under this directory (mounted via nfs) user phaley can write

ibfdr-compute-0-4(test)% touch dum
ibfdr-compute-0-4(test)%

I have put the packet captures in

http://mseas.mit.edu/download/phaley/GlusterUsers/TestNFSmount/

capture_nfsfail.pcap   has the results from the failed touch 
experiment

capture_nfssucceed.pcap  has the results from the successful touch
experiment

The command I used for these was

tcpdump -i ib0 -nnSs 0 host 172.16.1.119 -w 
/root/capture_nfstest.pcap


I hope these pkts were captured on the node where NFS server is
running. Could you please use '-i any' as I do not see glusterfs
traffic in the tcpdump.

Also looks like NFS v4 is used between client & nfs server. Are you
using kernel-NFS here (i.e, kernel-NFS exporting fuse mounted gluster
volume)?
If that is the case please capture fuse-mnt logs as well. This error
may well be coming from kernel-NFS itself before the request is sent
to fuse-mnt process.

FWIW, we have below option -

Option: server.manage-gids
Default Value: off
Description: Resolve groups on the server-side.

I haven't looked into what this option exactly does. But it may worth
testing with this option on.

Thanks,
Soumya




The brick log files are also in the above link.  If I read them
correctly they both funny times.  Specifically I see entries from
around 2017-06-27 14:02:37.404865  even though the system time was
2017-06-27 12:00:00.

One final item, another reply to my post had a link for possible
problems that could arise from users belonging to too many group. We
have seen the above problem even with a user belonging to only 4
groups.

Let me know what additional information I can provide.




Thanks

Pat


On 06/27/2017 02:45 AM, Soumya Koduri wrote:



On 06/27/2017 10:17 AM, Pranith Kumar Karampuri wrote:

The only problem with using gluster mounted via NFS is that it
does not
respect the group write permissions which we need.

We have an exercise coming up in the a couple of weeks. It seems
to me
that in order to improve our write times before then, it would be
good
to solve the group write permissions for gluster mounted via NFS 
now.

We can then revisit gluster mounted via FUSE afterwards.

What information would you need to help us force 

Re: [Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Atin Mukherjee
And what does glusterd log indicate for these failures?

On Wed, Jul 5, 2017 at 8:43 PM, Gianluca Cecchi 
wrote:

>
>
> On Wed, Jul 5, 2017 at 5:02 PM, Sahina Bose  wrote:
>
>>
>>
>> On Wed, Jul 5, 2017 at 8:16 PM, Gianluca Cecchi <
>> gianluca.cec...@gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Jul 5, 2017 at 7:42 AM, Sahina Bose  wrote:
>>>


> ...
>
> then the commands I need to run would be:
>
> gluster volume reset-brick export 
> ovirt01.localdomain.local:/gluster/brick3/export
> start
> gluster volume reset-brick export 
> ovirt01.localdomain.local:/gluster/brick3/export
> gl01.localdomain.local:/gluster/brick3/export commit force
>
> Correct?
>

 Yes, correct. gl01.localdomain.local should resolve correctly on all 3
 nodes.

>>>
>>>
>>> It fails at first step:
>>>
>>>  [root@ovirt01 ~]# gluster volume reset-brick export
>>> ovirt01.localdomain.local:/gluster/brick3/export start
>>> volume reset-brick: failed: Cannot execute command. The cluster is
>>> operating at version 30712. reset-brick command reset-brick start is
>>> unavailable in this version.
>>> [root@ovirt01 ~]#
>>>
>>> It seems somehow in relation with this upgrade not of the commercial
>>> solution Red Hat Gluster Storage
>>> https://access.redhat.com/documentation/en-US/Red_Hat_Storag
>>> e/3.1/html/Installation_Guide/chap-Upgrading_Red_Hat_Storage.html
>>>
>>> So ti seems I have to run some command of type:
>>>
>>> gluster volume set all cluster.op-version X
>>>
>>> with X > 30712
>>>
>>> It seems that latest version of commercial Red Hat Gluster Storage is
>>> 3.1 and its op-version is indeed 30712..
>>>
>>> So the question is which particular op-version I have to set and if the
>>> command can be set online without generating disruption
>>>
>>
>> It should have worked with the glusterfs 3.10 version from Centos repo.
>> Adding gluster-users for help on the op-version
>>
>>
>>>
>>> Thanks,
>>> Gianluca
>>>
>>
>>
>
> It seems op-version is not updated automatically by default, so that it
> can manage mixed versions while you update one by one...
>
> I followed what described here:
> https://gluster.readthedocs.io/en/latest/Upgrade-Guide/op_version/
>
>
> - Get current version:
>
> [root@ovirt01 ~]# gluster volume get all cluster.op-version
> Option  Value
>
> --  -
>
> cluster.op-version  30712
>
> [root@ovirt01 ~]#
>
>
> - Get maximum version I can set for current setup:
>
> [root@ovirt01 ~]# gluster volume get all cluster.max-op-version
> Option  Value
>
> --  -
>
> cluster.max-op-version  31000
>
> [root@ovirt01 ~]#
>
>
> - Get op version information for all the connected clients:
>
> [root@ovirt01 ~]# gluster volume status all clients | grep ":49" | awk
> '{print $4}' | sort | uniq -c
>  72 31000
> [root@ovirt01 ~]#
>
> --> ok
>
>
> - Update op-version
>
> [root@ovirt01 ~]# gluster volume set all cluster.op-version 31000
> volume set: success
> [root@ovirt01 ~]#
>
>
> - Verify:
> [root@ovirt01 ~]# gluster volume get all cluster.op-versionOption
>  Value
> --  -
>
> cluster.op-version  31000
>
> [root@ovirt01 ~]#
>
> --> ok
>
> [root@ovirt01 ~]# gluster volume reset-brick export
> ovirt01.localdomain.local:/gluster/brick3/export start
> volume reset-brick: success: reset-brick start operation successful
>
> [root@ovirt01 ~]# gluster volume reset-brick export
> ovirt01.localdomain.local:/gluster/brick3/export 
> gl01.localdomain.local:/gluster/brick3/export
> commit force
> volume reset-brick: failed: Commit failed on ovirt02.localdomain.local.
> Please check log file for details.
> Commit failed on ovirt03.localdomain.local. Please check log file for
> details.
> [root@ovirt01 ~]#
>
> [root@ovirt01 bricks]# gluster volume info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gl01.localdomain.local:/gluster/brick3/export
> Brick2: ovirt02.localdomain.local:/gluster/brick3/export
> Brick3: ovirt03.localdomain.local:/gluster/brick3/export (arbiter)
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> 

Re: [Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Gianluca Cecchi
On Wed, Jul 5, 2017 at 5:02 PM, Sahina Bose  wrote:

>
>
> On Wed, Jul 5, 2017 at 8:16 PM, Gianluca Cecchi  > wrote:
>
>>
>>
>> On Wed, Jul 5, 2017 at 7:42 AM, Sahina Bose  wrote:
>>
>>>
>>>
 ...

 then the commands I need to run would be:

 gluster volume reset-brick export 
 ovirt01.localdomain.local:/gluster/brick3/export
 start
 gluster volume reset-brick export 
 ovirt01.localdomain.local:/gluster/brick3/export
 gl01.localdomain.local:/gluster/brick3/export commit force

 Correct?

>>>
>>> Yes, correct. gl01.localdomain.local should resolve correctly on all 3
>>> nodes.
>>>
>>
>>
>> It fails at first step:
>>
>>  [root@ovirt01 ~]# gluster volume reset-brick export
>> ovirt01.localdomain.local:/gluster/brick3/export start
>> volume reset-brick: failed: Cannot execute command. The cluster is
>> operating at version 30712. reset-brick command reset-brick start is
>> unavailable in this version.
>> [root@ovirt01 ~]#
>>
>> It seems somehow in relation with this upgrade not of the commercial
>> solution Red Hat Gluster Storage
>> https://access.redhat.com/documentation/en-US/Red_Hat_Storag
>> e/3.1/html/Installation_Guide/chap-Upgrading_Red_Hat_Storage.html
>>
>> So ti seems I have to run some command of type:
>>
>> gluster volume set all cluster.op-version X
>>
>> with X > 30712
>>
>> It seems that latest version of commercial Red Hat Gluster Storage is 3.1
>> and its op-version is indeed 30712..
>>
>> So the question is which particular op-version I have to set and if the
>> command can be set online without generating disruption
>>
>
> It should have worked with the glusterfs 3.10 version from Centos repo.
> Adding gluster-users for help on the op-version
>
>
>>
>> Thanks,
>> Gianluca
>>
>
>

It seems op-version is not updated automatically by default, so that it can
manage mixed versions while you update one by one...

I followed what described here:
https://gluster.readthedocs.io/en/latest/Upgrade-Guide/op_version/


- Get current version:

[root@ovirt01 ~]# gluster volume get all cluster.op-version
Option  Value

--  -

cluster.op-version  30712

[root@ovirt01 ~]#


- Get maximum version I can set for current setup:

[root@ovirt01 ~]# gluster volume get all cluster.max-op-version
Option  Value

--  -

cluster.max-op-version  31000

[root@ovirt01 ~]#


- Get op version information for all the connected clients:

[root@ovirt01 ~]# gluster volume status all clients | grep ":49" | awk
'{print $4}' | sort | uniq -c
 72 31000
[root@ovirt01 ~]#

--> ok


- Update op-version

[root@ovirt01 ~]# gluster volume set all cluster.op-version 31000
volume set: success
[root@ovirt01 ~]#


- Verify:
[root@ovirt01 ~]# gluster volume get all cluster.op-versionOption
   Value
--  -

cluster.op-version  31000

[root@ovirt01 ~]#

--> ok

[root@ovirt01 ~]# gluster volume reset-brick export
ovirt01.localdomain.local:/gluster/brick3/export start
volume reset-brick: success: reset-brick start operation successful

[root@ovirt01 ~]# gluster volume reset-brick export
ovirt01.localdomain.local:/gluster/brick3/export
gl01.localdomain.local:/gluster/brick3/export commit force
volume reset-brick: failed: Commit failed on ovirt02.localdomain.local.
Please check log file for details.
Commit failed on ovirt03.localdomain.local. Please check log file for
details.
[root@ovirt01 ~]#

[root@ovirt01 bricks]# gluster volume info export

Volume Name: export
Type: Replicate
Volume ID: b00e5839-becb-47e7-844f-6ce6ce1b7153
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gl01.localdomain.local:/gluster/brick3/export
Brick2: ovirt02.localdomain.local:/gluster/brick3/export
Brick3: ovirt03.localdomain.local:/gluster/brick3/export (arbiter)
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 1
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on
[root@ovirt01 bricks]# gluster volume reset-brick export
ovirt02.localdomain.local:/gluster/brick3/export start
volume reset-brick: success: reset-brick start operation successful
[root@ovirt01 bricks]# gluster volume 

Re: [Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Atin Mukherjee
On Wed, Jul 5, 2017 at 8:32 PM, Sahina Bose  wrote:

>
>
> On Wed, Jul 5, 2017 at 8:16 PM, Gianluca Cecchi  > wrote:
>
>>
>>
>> On Wed, Jul 5, 2017 at 7:42 AM, Sahina Bose  wrote:
>>
>>>
>>>
 ...

 then the commands I need to run would be:

 gluster volume reset-brick export 
 ovirt01.localdomain.local:/gluster/brick3/export
 start
 gluster volume reset-brick export 
 ovirt01.localdomain.local:/gluster/brick3/export
 gl01.localdomain.local:/gluster/brick3/export commit force

 Correct?

>>>
>>> Yes, correct. gl01.localdomain.local should resolve correctly on all 3
>>> nodes.
>>>
>>
>>
>> It fails at first step:
>>
>>  [root@ovirt01 ~]# gluster volume reset-brick export
>> ovirt01.localdomain.local:/gluster/brick3/export start
>> volume reset-brick: failed: Cannot execute command. The cluster is
>> operating at version 30712. reset-brick command reset-brick start is
>> unavailable in this version.
>> [root@ovirt01 ~]#
>>
>> It seems somehow in relation with this upgrade not of the commercial
>> solution Red Hat Gluster Storage
>> https://access.redhat.com/documentation/en-US/Red_Hat_Storag
>> e/3.1/html/Installation_Guide/chap-Upgrading_Red_Hat_Storage.html
>>
>> So ti seems I have to run some command of type:
>>
>> gluster volume set all cluster.op-version X
>>
>> with X > 30712
>>
>> It seems that latest version of commercial Red Hat Gluster Storage is 3.1
>> and its op-version is indeed 30712..
>>
>> So the question is which particular op-version I have to set and if the
>> command can be set online without generating disruption
>>
>
> It should have worked with the glusterfs 3.10 version from Centos repo.
> Adding gluster-users for help on the op-version
>

This definitely means your cluster op-version is running < 3.9.0

 if (conf->op_version < GD_OP_VERSION_3_9_0
&&
strcmp (cli_op, "GF_REPLACE_OP_COMMIT_FORCE"))
{
snprintf (msg, sizeof (msg), "Cannot execute command. The
"
  "cluster is operating at version %d. reset-brick
"
  "command %s is unavailable in this
version.",

conf->op_version,
  gd_rb_op_to_str
(cli_op));
ret =
-1;
goto
out;
}

What's the version of gluster bits are you running across the gluster
cluster? Please note cluster.op-version is not exactly the same as of rpm
version and with every upgrades it's recommended to bump up the op-version.


>
>>
>> Thanks,
>> Gianluca
>>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Mess code when encryption

2017-07-05 Thread Liu, Dan
Hi everyone,

I have a question about using encryption in Gluster FS.

1.Created a file (file size is smaller than 1k) in the volume’s mount point.
2.Read the file, finally got a mess code.
I found that the content I got is from cache and no decryption is operated on 
the file content, so mess code returns.

If I set the following property to off, then everything is OK.

performance.quick-read

performance.write-behind

performance.open-behind



My question is :

   Is above metioned phenomenon right?

   Does I have wrong configurations?

OS : CentOS 7.1
Gluster FS:  3.10.3

Configuration
   According quick start of official website, set features.encryption to on and 
encryption.master-key, then start and mount the volume.


Looking forward to your answers. Thanks.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] op-version for reset-brick (Was: Re: [ovirt-users] Upgrading HC from 4.0 to 4.1)

2017-07-05 Thread Sahina Bose
On Wed, Jul 5, 2017 at 8:16 PM, Gianluca Cecchi 
wrote:

>
>
> On Wed, Jul 5, 2017 at 7:42 AM, Sahina Bose  wrote:
>
>>
>>
>>> ...
>>>
>>> then the commands I need to run would be:
>>>
>>> gluster volume reset-brick export 
>>> ovirt01.localdomain.local:/gluster/brick3/export
>>> start
>>> gluster volume reset-brick export 
>>> ovirt01.localdomain.local:/gluster/brick3/export
>>> gl01.localdomain.local:/gluster/brick3/export commit force
>>>
>>> Correct?
>>>
>>
>> Yes, correct. gl01.localdomain.local should resolve correctly on all 3
>> nodes.
>>
>
>
> It fails at first step:
>
>  [root@ovirt01 ~]# gluster volume reset-brick export
> ovirt01.localdomain.local:/gluster/brick3/export start
> volume reset-brick: failed: Cannot execute command. The cluster is
> operating at version 30712. reset-brick command reset-brick start is
> unavailable in this version.
> [root@ovirt01 ~]#
>
> It seems somehow in relation with this upgrade not of the commercial
> solution Red Hat Gluster Storage
> https://access.redhat.com/documentation/en-US/Red_Hat_
> Storage/3.1/html/Installation_Guide/chap-Upgrading_Red_Hat_Storage.html
>
> So ti seems I have to run some command of type:
>
> gluster volume set all cluster.op-version X
>
> with X > 30712
>
> It seems that latest version of commercial Red Hat Gluster Storage is 3.1
> and its op-version is indeed 30712..
>
> So the question is which particular op-version I have to set and if the
> command can be set online without generating disruption
>

It should have worked with the glusterfs 3.10 version from Centos repo.
Adding gluster-users for help on the op-version


>
> Thanks,
> Gianluca
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] I need a sanity check.

2017-07-05 Thread Krist van Besien
You are confusing volume with brick.

You do not have a "Replicate Brick", you have one 1x3 volume, composed of 3
bricks, and one 1x2 volume made up of 2 bricks. You do need to understand
the difference between volume and brick

Also you need to be aware of the differences between server quorum and
client quorum. For client quorum you need three bricks. For the third brick
you can use an arbiter brick however.

Krist




On 4 July 2017 at 19:28, Ernie Dunbar  wrote:

> Hi everyone!
>
> I need a sanity check on our Server Quorum Ratio settings to ensure the
> maximum uptime for our virtual machines. I'd like to modify them slightly,
> but I'm not really interested in experimenting with live servers to see if
> what I'm doing is going to work, but I think that the theory is sound.
>
> We have a Gluster array of 3 servers containing two Replicate bricks.
>
> Brick 1 is a 1x3 arrangement where this brick is replicated on all three
> servers. The quorum ratio is set to 51%, so that if any one Gluster server
> goes down, the brick is still in Read/Write mode and the broken server will
> update itself when it comes back online. The clients won't notice a thing,
> while still ensuring that a split-brain condition doesn't occur.
>
> Brick 2 is a 1x2 arrangement where this brick is replicated across only
> two servers. The quorum ratio is currently also set to 51%, but my
> understanding is that if one of the servers that hosts this brick goes
> down, it will go into read-only mode, which would probably be disruptive to
> the VMs we host on this brick.
>
> My understanding is that since there are three servers in the array, I
> should be able to set the quorum ratio on Brick2 to 50% and the array will
> still be able to prevent a split-brain from occurring, because the other
> two servers will know which one is offline.
>
> The alternative of course, is to simply flesh out Brick2 with a third
> disk. However, I've heard that 1x2 replication is faster than 1x3, and we'd
> prefer that extra speed for this task.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. 

kr...@redhat.comM: +41-79-5936260

TRIED. TESTED. TRUSTED. 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [New Release] GlusterD2 v4.0dev-7

2017-07-05 Thread Gandalf Corvotempesta
Il 5 lug 2017 11:31 AM, "Kaushal M"  ha scritto:

- Preliminary support for volume expansion has been added. (Note that
rebalancing is not available yet)


What do you mean with this?
Any differences in volume expansion from the current architecture?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] [New Release] GlusterD2 v4.0dev-7

2017-07-05 Thread Kaushal M
After nearly 3 months, we have another preview release for GlusterD-2.0.

The highlights for this release are,
- GD2 now uses an auto scaling etcd cluster, which automatically
selects and maintains the required number of etcd servers in the
cluster.
- Preliminary support for volume expansion has been added. (Note that
rebalancing is not available yet)
- An end to end functional testing framework is now available
- And RPMs are available for Fedora >= 25 and EL7.

This release still doesn't provide a CLI. The HTTP ReST API is the
only access method right now.

Prebuilt binaries are available from [1]. RPMs have been built in
Fedora Copr and available at [2]. A Docker image is also available
from [3].

Try this release out and let us know if you face any problems at [4].

The GD2 development team is re-organizing and kicking of development
again. So regular updates can be expected again.

Cheers,
Kaushal and the GD2 developers.

[1]: https://github.com/gluster/glusterd2/releases/tag/v4.0dev-7
[2]: https://copr.fedorainfracloud.org/coprs/kshlm/glusterd2/
[3]: https://hub.docker.com/r/gluster/glusterd2-test/
[4]: https://github.com/gluster/glusterd2/issues
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users