Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-12-02 Thread Дмитрий Глушенок
Hi,

It can be that LIO service starts before /mnt gets mounted. In absence of 
backend file LIO has created the new one on root filesystem (/mnt directory). 
Then gluster volume was mounted over, but as backend file was kept open by LIO 
- it still was used instead of the right one on gluster volume. Then, when you 
turn off the first node - active path for iSCSI disk switches to the second 
node (with empty file, placed on root filesystem).

> 18 нояб. 2016 г., в 19:21, Olivier Lambert  
> написал(а):
> 
> After Node 1 is DOWN, LIO on Node2 (iSCSI target) is not writing
> anymore in the local Gluster mount, but in the root partition.
> 
> Despite "df -h" shows the Gluster brick mounted:
> 
> /dev/mapper/centos-root   3,1G3,1G   20K 100% /
> ...
> /dev/xvdb  61G 61G  956M  99% /bricks/brick1
> localhost:/gv0 61G 61G  956M  99% /mnt
> 
> If I unmount it, I still see the "block.img" in /mnt which is filling
> the root space. So it's like Fuse is messing with the local Gluster
> mount, which could lead to the data corruption on the client level.
> 
> It doesn't make sense for me... What am I missing?
> 
> On Fri, Nov 18, 2016 at 5:00 PM, Olivier Lambert
>  wrote:
>> Yes, I did it only if I have the previous result of heal info ("number
>> of entries: 0"). But same result, as soon as the second Node is
>> offline (after they were both working/back online), everything is
>> corrupted.
>> 
>> To recap:
>> 
>> * Node 1 UP Node 2 UP -> OK
>> * Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
>> the path down and change if necessary)
>> * Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
>> in heal command)
>> * Node 1 DOWN Node 2 UP -> NOT OK (data corruption)
>> 
>> On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
>>  wrote:
>>> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert 
>>> wrote:
 
 Hi David,
 
 What are the exact commands to be sure it's fine?
 
 Right now I got:
 
 # gluster volume heal gv0 info
 Brick 10.0.0.1:/bricks/brick1/gv0
 Status: Connected
 Number of entries: 0
 
 Brick 10.0.0.2:/bricks/brick1/gv0
 Status: Connected
 Number of entries: 0
 
 Brick 10.0.0.3:/bricks/brick1/gv0
 Status: Connected
 Number of entries: 0
 
 
>>> Did you run this before taking down 2nd node to see if any heals were
>>> ongoing?
>>> 
>>> Also I see you have sharding enabled.  Are your files being served sharded
>>> already as well?
>>> 
 
 Everything is online and working, but this command give a strange output:
 
 # gluster volume heal gv0 info heal-failed
 Gathering list of heal failed entries on volume gv0 has been
 unsuccessful on bricks that are down. Please check if all brick
 processes are running.
 
 Is it normal?
>>> 
>>> 
>>> I don't think that is a valid command anymore as whern I run it I get same
>>> message and this is in logs
>>> [2016-11-18 14:35:02.260503] I [MSGID: 106533]
>>> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
>>> Received heal vol req for volume GLUSTER1
>>> [2016-11-18 14:35:02.263341] W [MSGID: 106530]
>>> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
>>> not supported. Please use "gluster volume heal GLUSTER1 info" and logs to
>>> find the heal information.
>>> [2016-11-18 14:35:02.263365] E [MSGID: 106301]
>>> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
>>> operation 'Volume Heal' failed on localhost : Command not supported. Please
>>> use "gluster volume heal GLUSTER1 info" and logs to find the heal
>>> information.
>>> 
 
 On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
  wrote:
> 
> On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert
> 
> wrote:
>> 
>> Okay, used the exact same config you provided, and adding an arbiter
>> node (node3)
>> 
>> After halting node2, VM continues to work after a small "lag"/freeze.
>> I restarted node2 and it was back online: OK
>> 
>> Then, after waiting few minutes, halting node1. And **just** at this
>> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> etc.)
>> 
> Other than waiting a few minutes did you make sure heals had completed?
> 
>> 
>> dmesg of the VM:
>> 
>> [ 1645.852905] EXT4-fs error (device xvda1):
>> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> inode=0, rec_len=0, name_len=0
>> [ 1645.854509] Aborting journal on device xvda1-8.
>> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>> 
>> And got a lot of " comm bash: bad entry in directory" messages 

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Joe Julian
If it's writing to the root partition then the mount went away. Any 
clues in the gluster client log?


On 11/18/2016 08:21 AM, Olivier Lambert wrote:

After Node 1 is DOWN, LIO on Node2 (iSCSI target) is not writing
anymore in the local Gluster mount, but in the root partition.

Despite "df -h" shows the Gluster brick mounted:

/dev/mapper/centos-root   3,1G3,1G   20K 100% /
...
/dev/xvdb  61G 61G  956M  99% /bricks/brick1
localhost:/gv0 61G 61G  956M  99% /mnt

If I unmount it, I still see the "block.img" in /mnt which is filling
the root space. So it's like Fuse is messing with the local Gluster
mount, which could lead to the data corruption on the client level.

It doesn't make sense for me... What am I missing?

On Fri, Nov 18, 2016 at 5:00 PM, Olivier Lambert
 wrote:

Yes, I did it only if I have the previous result of heal info ("number
of entries: 0"). But same result, as soon as the second Node is
offline (after they were both working/back online), everything is
corrupted.

To recap:

* Node 1 UP Node 2 UP -> OK
* Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
the path down and change if necessary)
* Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
in heal command)
* Node 1 DOWN Node 2 UP -> NOT OK (data corruption)

On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
 wrote:

On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert 
wrote:

Hi David,

What are the exact commands to be sure it's fine?

Right now I got:

# gluster volume heal gv0 info
Brick 10.0.0.1:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.2:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.3:/bricks/brick1/gv0
Status: Connected
Number of entries: 0



Did you run this before taking down 2nd node to see if any heals were
ongoing?

Also I see you have sharding enabled.  Are your files being served sharded
already as well?


Everything is online and working, but this command give a strange output:

# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.

Is it normal?


I don't think that is a valid command anymore as whern I run it I get same
message and this is in logs
  [2016-11-18 14:35:02.260503] I [MSGID: 106533]
[glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume GLUSTER1
[2016-11-18 14:35:02.263341] W [MSGID: 106530]
[glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
not supported. Please use "gluster volume heal GLUSTER1 info" and logs to
find the heal information.
[2016-11-18 14:35:02.263365] E [MSGID: 106301]
[glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
operation 'Volume Heal' failed on localhost : Command not supported. Please
use "gluster volume heal GLUSTER1 info" and logs to find the heal
information.


On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
 wrote:

On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert

wrote:

Okay, used the exact same config you provided, and adding an arbiter
node (node3)

After halting node2, VM continues to work after a small "lag"/freeze.
I restarted node2 and it was back online: OK

Then, after waiting few minutes, halting node1. And **just** at this
moment, the VM is corrupted (segmentation fault, /var/log folder empty
etc.)


Other than waiting a few minutes did you make sure heals had completed?


dmesg of the VM:

[ 1645.852905] EXT4-fs error (device xvda1):
htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
entry in directory: rec_len is smaller than minimal - offset=0(0),
inode=0, rec_len=0, name_len=0
[ 1645.854509] Aborting journal on device xvda1-8.
[ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only

And got a lot of " comm bash: bad entry in directory" messages then...

Here is the current config with all Node back online:

# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.0.0.1:/bricks/brick1/gv0
Brick2: 10.0.0.2:/bricks/brick1/gv0
Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.shard: on
features.shard-block-size: 16MB
network.remote-dio: enable
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.stat-prefetch: on
performance.strict-write-ordering: off
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.data-self-heal: on


# gluster volume status
Status of volume: gv0
Gluster process TCP 

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Olivier Lambert
After Node 1 is DOWN, LIO on Node2 (iSCSI target) is not writing
anymore in the local Gluster mount, but in the root partition.

Despite "df -h" shows the Gluster brick mounted:

/dev/mapper/centos-root   3,1G3,1G   20K 100% /
...
/dev/xvdb  61G 61G  956M  99% /bricks/brick1
localhost:/gv0 61G 61G  956M  99% /mnt

If I unmount it, I still see the "block.img" in /mnt which is filling
the root space. So it's like Fuse is messing with the local Gluster
mount, which could lead to the data corruption on the client level.

It doesn't make sense for me... What am I missing?

On Fri, Nov 18, 2016 at 5:00 PM, Olivier Lambert
 wrote:
> Yes, I did it only if I have the previous result of heal info ("number
> of entries: 0"). But same result, as soon as the second Node is
> offline (after they were both working/back online), everything is
> corrupted.
>
> To recap:
>
> * Node 1 UP Node 2 UP -> OK
> * Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
> the path down and change if necessary)
> * Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
> in heal command)
> * Node 1 DOWN Node 2 UP -> NOT OK (data corruption)
>
> On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
>  wrote:
>> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert 
>> wrote:
>>>
>>> Hi David,
>>>
>>> What are the exact commands to be sure it's fine?
>>>
>>> Right now I got:
>>>
>>> # gluster volume heal gv0 info
>>> Brick 10.0.0.1:/bricks/brick1/gv0
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Brick 10.0.0.2:/bricks/brick1/gv0
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Brick 10.0.0.3:/bricks/brick1/gv0
>>> Status: Connected
>>> Number of entries: 0
>>>
>>>
>> Did you run this before taking down 2nd node to see if any heals were
>> ongoing?
>>
>> Also I see you have sharding enabled.  Are your files being served sharded
>> already as well?
>>
>>>
>>> Everything is online and working, but this command give a strange output:
>>>
>>> # gluster volume heal gv0 info heal-failed
>>> Gathering list of heal failed entries on volume gv0 has been
>>> unsuccessful on bricks that are down. Please check if all brick
>>> processes are running.
>>>
>>> Is it normal?
>>
>>
>> I don't think that is a valid command anymore as whern I run it I get same
>> message and this is in logs
>>  [2016-11-18 14:35:02.260503] I [MSGID: 106533]
>> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
>> Received heal vol req for volume GLUSTER1
>> [2016-11-18 14:35:02.263341] W [MSGID: 106530]
>> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
>> not supported. Please use "gluster volume heal GLUSTER1 info" and logs to
>> find the heal information.
>> [2016-11-18 14:35:02.263365] E [MSGID: 106301]
>> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
>> operation 'Volume Heal' failed on localhost : Command not supported. Please
>> use "gluster volume heal GLUSTER1 info" and logs to find the heal
>> information.
>>
>>>
>>> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
>>>  wrote:
>>> >
>>> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert
>>> > 
>>> > wrote:
>>> >>
>>> >> Okay, used the exact same config you provided, and adding an arbiter
>>> >> node (node3)
>>> >>
>>> >> After halting node2, VM continues to work after a small "lag"/freeze.
>>> >> I restarted node2 and it was back online: OK
>>> >>
>>> >> Then, after waiting few minutes, halting node1. And **just** at this
>>> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>>> >> etc.)
>>> >>
>>> > Other than waiting a few minutes did you make sure heals had completed?
>>> >
>>> >>
>>> >> dmesg of the VM:
>>> >>
>>> >> [ 1645.852905] EXT4-fs error (device xvda1):
>>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>>> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
>>> >> inode=0, rec_len=0, name_len=0
>>> >> [ 1645.854509] Aborting journal on device xvda1-8.
>>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>>> >>
>>> >> And got a lot of " comm bash: bad entry in directory" messages then...
>>> >>
>>> >> Here is the current config with all Node back online:
>>> >>
>>> >> # gluster volume info
>>> >>
>>> >> Volume Name: gv0
>>> >> Type: Replicate
>>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>>> >> Status: Started
>>> >> Snapshot Count: 0
>>> >> Number of Bricks: 1 x (2 + 1) = 3
>>> >> Transport-type: tcp
>>> >> Bricks:
>>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
>>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
>>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>>> >> Options Reconfigured:
>>> >> nfs.disable: on
>>> >> performance.readdir-ahead: on
>>> >> transport.address-family: inet
>>> >> features.shard: on
>>> >> features.shard-block-size: 

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Olivier Lambert
Yes, I did it only if I have the previous result of heal info ("number
of entries: 0"). But same result, as soon as the second Node is
offline (after they were both working/back online), everything is
corrupted.

To recap:

* Node 1 UP Node 2 UP -> OK
* Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
the path down and change if necessary)
* Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
in heal command)
* Node 1 DOWN Node 2 UP -> NOT OK (data corruption)

On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
 wrote:
> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert 
> wrote:
>>
>> Hi David,
>>
>> What are the exact commands to be sure it's fine?
>>
>> Right now I got:
>>
>> # gluster volume heal gv0 info
>> Brick 10.0.0.1:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick 10.0.0.2:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick 10.0.0.3:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>>
> Did you run this before taking down 2nd node to see if any heals were
> ongoing?
>
> Also I see you have sharding enabled.  Are your files being served sharded
> already as well?
>
>>
>> Everything is online and working, but this command give a strange output:
>>
>> # gluster volume heal gv0 info heal-failed
>> Gathering list of heal failed entries on volume gv0 has been
>> unsuccessful on bricks that are down. Please check if all brick
>> processes are running.
>>
>> Is it normal?
>
>
> I don't think that is a valid command anymore as whern I run it I get same
> message and this is in logs
>  [2016-11-18 14:35:02.260503] I [MSGID: 106533]
> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
> Received heal vol req for volume GLUSTER1
> [2016-11-18 14:35:02.263341] W [MSGID: 106530]
> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
> not supported. Please use "gluster volume heal GLUSTER1 info" and logs to
> find the heal information.
> [2016-11-18 14:35:02.263365] E [MSGID: 106301]
> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
> operation 'Volume Heal' failed on localhost : Command not supported. Please
> use "gluster volume heal GLUSTER1 info" and logs to find the heal
> information.
>
>>
>> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
>>  wrote:
>> >
>> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert
>> > 
>> > wrote:
>> >>
>> >> Okay, used the exact same config you provided, and adding an arbiter
>> >> node (node3)
>> >>
>> >> After halting node2, VM continues to work after a small "lag"/freeze.
>> >> I restarted node2 and it was back online: OK
>> >>
>> >> Then, after waiting few minutes, halting node1. And **just** at this
>> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> >> etc.)
>> >>
>> > Other than waiting a few minutes did you make sure heals had completed?
>> >
>> >>
>> >> dmesg of the VM:
>> >>
>> >> [ 1645.852905] EXT4-fs error (device xvda1):
>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> >> inode=0, rec_len=0, name_len=0
>> >> [ 1645.854509] Aborting journal on device xvda1-8.
>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>> >>
>> >> And got a lot of " comm bash: bad entry in directory" messages then...
>> >>
>> >> Here is the current config with all Node back online:
>> >>
>> >> # gluster volume info
>> >>
>> >> Volume Name: gv0
>> >> Type: Replicate
>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> >> Status: Started
>> >> Snapshot Count: 0
>> >> Number of Bricks: 1 x (2 + 1) = 3
>> >> Transport-type: tcp
>> >> Bricks:
>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> >> Options Reconfigured:
>> >> nfs.disable: on
>> >> performance.readdir-ahead: on
>> >> transport.address-family: inet
>> >> features.shard: on
>> >> features.shard-block-size: 16MB
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >>
>> >> # gluster volume status
>> >> Status of volume: gv0
>> >> Gluster process TCP Port  RDMA Port  Online
>> >> Pid
>> >>
>> >>
>> >> --
>> >> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>> >> 1331
>> >> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>> >> 2274
>> >> Brick 10.0.0.3:/bricks/brick1/gv0   49152 

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Olivier Lambert
Okay, got it attached :)

On Fri, Nov 18, 2016 at 11:00 AM, Krutika Dhananjay  wrote:
> Assuming you're using FUSE, if your gluster volume is mounted at /some/dir,
> for example,
> then its corresponding logs will be at /var/log/glusterfs/some-dir.log
>
> -Krutika
>
> On Fri, Nov 18, 2016 at 7:13 AM, Olivier Lambert 
> wrote:
>>
>> Attached, bricks log. Where could I find the fuse client log?
>>
>> On Fri, Nov 18, 2016 at 2:22 AM, Krutika Dhananjay 
>> wrote:
>> > Could you attach the fuse client and brick logs?
>> >
>> > -Krutika
>> >
>> > On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert
>> > 
>> > wrote:
>> >>
>> >> Okay, used the exact same config you provided, and adding an arbiter
>> >> node (node3)
>> >>
>> >> After halting node2, VM continues to work after a small "lag"/freeze.
>> >> I restarted node2 and it was back online: OK
>> >>
>> >> Then, after waiting few minutes, halting node1. And **just** at this
>> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> >> etc.)
>> >>
>> >> dmesg of the VM:
>> >>
>> >> [ 1645.852905] EXT4-fs error (device xvda1):
>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> >> inode=0, rec_len=0, name_len=0
>> >> [ 1645.854509] Aborting journal on device xvda1-8.
>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>> >>
>> >> And got a lot of " comm bash: bad entry in directory" messages then...
>> >>
>> >> Here is the current config with all Node back online:
>> >>
>> >> # gluster volume info
>> >>
>> >> Volume Name: gv0
>> >> Type: Replicate
>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> >> Status: Started
>> >> Snapshot Count: 0
>> >> Number of Bricks: 1 x (2 + 1) = 3
>> >> Transport-type: tcp
>> >> Bricks:
>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> >> Options Reconfigured:
>> >> nfs.disable: on
>> >> performance.readdir-ahead: on
>> >> transport.address-family: inet
>> >> features.shard: on
>> >> features.shard-block-size: 16MB
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >>
>> >> # gluster volume status
>> >> Status of volume: gv0
>> >> Gluster process TCP Port  RDMA Port  Online
>> >> Pid
>> >>
>> >>
>> >> --
>> >> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>> >> 1331
>> >> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>> >> 2274
>> >> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
>> >> 2355
>> >> Self-heal Daemon on localhost   N/A   N/AY
>> >> 2300
>> >> Self-heal Daemon on 10.0.0.3N/A   N/AY
>> >> 10530
>> >> Self-heal Daemon on 10.0.0.2N/A   N/AY
>> >> 2425
>> >>
>> >> Task Status of Volume gv0
>> >>
>> >>
>> >> --
>> >> There are no active volume tasks
>> >>
>> >>
>> >>
>> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>> >>  wrote:
>> >> > It's planned to have an arbiter soon :) It was just preliminary
>> >> > tests.
>> >> >
>> >> > Thanks for the settings, I'll test this soon and I'll come back to
>> >> > you!
>> >> >
>> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>> >> >  wrote:
>> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>> >> >>>
>> >> >>> gluster volume info gv0
>> >> >>>
>> >> >>> Volume Name: gv0
>> >> >>> Type: Replicate
>> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> >> >>> Status: Started
>> >> >>> Snapshot Count: 0
>> >> >>> Number of Bricks: 1 x 2 = 2
>> >> >>> Transport-type: tcp
>> >> >>> Bricks:
>> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> >>> Options Reconfigured:
>> >> >>> nfs.disable: on
>> >> >>> performance.readdir-ahead: on
>> >> >>> transport.address-family: inet
>> >> >>> features.shard: on
>> >> >>> features.shard-block-size: 16MB
>> >> >>
>> >> >>
>> >> >>
>> >> >> When hosting VM's its essential to set these options:
>> >> >>
>> >> >> network.remote-dio: enable
>> >> >> cluster.eager-lock: enable
>> >> >> performance.io-cache: off
>> >> >> performance.read-ahead: off
>> >> >> performance.quick-read: off
>> >> >> performance.stat-prefetch: on
>> >> >> performance.strict-write-ordering: off
>> >> >> 

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Krutika Dhananjay
Assuming you're using FUSE, if your gluster volume is mounted at /some/dir,
for example,
then its corresponding logs will be at /var/log/glusterfs/some-dir.log

-Krutika

On Fri, Nov 18, 2016 at 7:13 AM, Olivier Lambert 
wrote:

> Attached, bricks log. Where could I find the fuse client log?
>
> On Fri, Nov 18, 2016 at 2:22 AM, Krutika Dhananjay 
> wrote:
> > Could you attach the fuse client and brick logs?
> >
> > -Krutika
> >
> > On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert <
> lambert.oliv...@gmail.com>
> > wrote:
> >>
> >> Okay, used the exact same config you provided, and adding an arbiter
> >> node (node3)
> >>
> >> After halting node2, VM continues to work after a small "lag"/freeze.
> >> I restarted node2 and it was back online: OK
> >>
> >> Then, after waiting few minutes, halting node1. And **just** at this
> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty
> >> etc.)
> >>
> >> dmesg of the VM:
> >>
> >> [ 1645.852905] EXT4-fs error (device xvda1):
> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
> >> inode=0, rec_len=0, name_len=0
> >> [ 1645.854509] Aborting journal on device xvda1-8.
> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
> >>
> >> And got a lot of " comm bash: bad entry in directory" messages then...
> >>
> >> Here is the current config with all Node back online:
> >>
> >> # gluster volume info
> >>
> >> Volume Name: gv0
> >> Type: Replicate
> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> >> Status: Started
> >> Snapshot Count: 0
> >> Number of Bricks: 1 x (2 + 1) = 3
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> >> Options Reconfigured:
> >> nfs.disable: on
> >> performance.readdir-ahead: on
> >> transport.address-family: inet
> >> features.shard: on
> >> features.shard-block-size: 16MB
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >>
> >> # gluster volume status
> >> Status of volume: gv0
> >> Gluster process TCP Port  RDMA Port  Online
> >> Pid
> >>
> >> 
> --
> >> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
> >> 1331
> >> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
> >> 2274
> >> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
> >> 2355
> >> Self-heal Daemon on localhost   N/A   N/AY
> >> 2300
> >> Self-heal Daemon on 10.0.0.3N/A   N/AY
> >> 10530
> >> Self-heal Daemon on 10.0.0.2N/A   N/AY
> >> 2425
> >>
> >> Task Status of Volume gv0
> >>
> >> 
> --
> >> There are no active volume tasks
> >>
> >>
> >>
> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
> >>  wrote:
> >> > It's planned to have an arbiter soon :) It was just preliminary tests.
> >> >
> >> > Thanks for the settings, I'll test this soon and I'll come back to
> you!
> >> >
> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> >> >  wrote:
> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >> >>>
> >> >>> gluster volume info gv0
> >> >>>
> >> >>> Volume Name: gv0
> >> >>> Type: Replicate
> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >> >>> Status: Started
> >> >>> Snapshot Count: 0
> >> >>> Number of Bricks: 1 x 2 = 2
> >> >>> Transport-type: tcp
> >> >>> Bricks:
> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >> >>> Options Reconfigured:
> >> >>> nfs.disable: on
> >> >>> performance.readdir-ahead: on
> >> >>> transport.address-family: inet
> >> >>> features.shard: on
> >> >>> features.shard-block-size: 16MB
> >> >>
> >> >>
> >> >>
> >> >> When hosting VM's its essential to set these options:
> >> >>
> >> >> network.remote-dio: enable
> >> >> cluster.eager-lock: enable
> >> >> performance.io-cache: off
> >> >> performance.read-ahead: off
> >> >> performance.quick-read: off
> >> >> performance.stat-prefetch: on
> >> >> performance.strict-write-ordering: off
> >> >> cluster.server-quorum-type: server
> >> >> cluster.quorum-type: auto
> >> >> cluster.data-self-heal: on
> >> >>
> >> >> Also with replica two and quorum on (required) your volume will
> become
> >> >> read-only when one node goes down to prevent the possibility of
> >> >> split-brain
> >> >> - 

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Olivier Lambert
Hi David,

What are the exact commands to be sure it's fine?

Right now I got:

# gluster volume heal gv0 info
Brick 10.0.0.1:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.2:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.3:/bricks/brick1/gv0
Status: Connected
Number of entries: 0


Everything is online and working, but this command give a strange output:

# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.

Is it normal?

On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
 wrote:
>
> On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert 
> wrote:
>>
>> Okay, used the exact same config you provided, and adding an arbiter
>> node (node3)
>>
>> After halting node2, VM continues to work after a small "lag"/freeze.
>> I restarted node2 and it was back online: OK
>>
>> Then, after waiting few minutes, halting node1. And **just** at this
>> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> etc.)
>>
> Other than waiting a few minutes did you make sure heals had completed?
>
>>
>> dmesg of the VM:
>>
>> [ 1645.852905] EXT4-fs error (device xvda1):
>> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> inode=0, rec_len=0, name_len=0
>> [ 1645.854509] Aborting journal on device xvda1-8.
>> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>>
>> And got a lot of " comm bash: bad entry in directory" messages then...
>>
>> Here is the current config with all Node back online:
>>
>> # gluster volume info
>>
>> Volume Name: gv0
>> Type: Replicate
>> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> features.shard: on
>> features.shard-block-size: 16MB
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.stat-prefetch: on
>> performance.strict-write-ordering: off
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.data-self-heal: on
>>
>>
>> # gluster volume status
>> Status of volume: gv0
>> Gluster process TCP Port  RDMA Port  Online
>> Pid
>>
>> --
>> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>> 1331
>> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>> 2274
>> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
>> 2355
>> Self-heal Daemon on localhost   N/A   N/AY
>> 2300
>> Self-heal Daemon on 10.0.0.3N/A   N/AY
>> 10530
>> Self-heal Daemon on 10.0.0.2N/A   N/AY
>> 2425
>>
>> Task Status of Volume gv0
>>
>> --
>> There are no active volume tasks
>>
>>
>>
>> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>>  wrote:
>> > It's planned to have an arbiter soon :) It was just preliminary tests.
>> >
>> > Thanks for the settings, I'll test this soon and I'll come back to you!
>> >
>> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>> >  wrote:
>> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>> >>>
>> >>> gluster volume info gv0
>> >>>
>> >>> Volume Name: gv0
>> >>> Type: Replicate
>> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> >>> Status: Started
>> >>> Snapshot Count: 0
>> >>> Number of Bricks: 1 x 2 = 2
>> >>> Transport-type: tcp
>> >>> Bricks:
>> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >>> Options Reconfigured:
>> >>> nfs.disable: on
>> >>> performance.readdir-ahead: on
>> >>> transport.address-family: inet
>> >>> features.shard: on
>> >>> features.shard-block-size: 16MB
>> >>
>> >>
>> >>
>> >> When hosting VM's its essential to set these options:
>> >>
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >> Also with replica two and quorum on (required) your volume will become
>> >> read-only when one node goes down to prevent the possibility of

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-17 Thread David Gossage
On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert 
wrote:

> Okay, used the exact same config you provided, and adding an arbiter
> node (node3)
>
> After halting node2, VM continues to work after a small "lag"/freeze.
> I restarted node2 and it was back online: OK
>
> Then, after waiting few minutes, halting node1. And **just** at this
> moment, the VM is corrupted (segmentation fault, /var/log folder empty
> etc.)
>
> Other than waiting a few minutes did you make sure heals had completed?


> dmesg of the VM:
>
> [ 1645.852905] EXT4-fs error (device xvda1):
> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> entry in directory: rec_len is smaller than minimal - offset=0(0),
> inode=0, rec_len=0, name_len=0
> [ 1645.854509] Aborting journal on device xvda1-8.
> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>
> And got a lot of " comm bash: bad entry in directory" messages then...
>
> Here is the current config with all Node back online:
>
> # gluster volume info
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.0.1:/bricks/brick1/gv0
> Brick2: 10.0.0.2:/bricks/brick1/gv0
> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.shard: on
> features.shard-block-size: 16MB
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.stat-prefetch: on
> performance.strict-write-ordering: off
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.data-self-heal: on
>
>
> # gluster volume status
> Status of volume: gv0
> Gluster process TCP Port  RDMA Port  Online
> Pid
> 
> --
> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>  1331
> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>  2274
> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
>  2355
> Self-heal Daemon on localhost   N/A   N/AY
>  2300
> Self-heal Daemon on 10.0.0.3N/A   N/AY
>  10530
> Self-heal Daemon on 10.0.0.2N/A   N/AY
>  2425
>
> Task Status of Volume gv0
> 
> --
> There are no active volume tasks
>
>
>
> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>  wrote:
> > It's planned to have an arbiter soon :) It was just preliminary tests.
> >
> > Thanks for the settings, I'll test this soon and I'll come back to you!
> >
> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> >  wrote:
> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >>>
> >>> gluster volume info gv0
> >>>
> >>> Volume Name: gv0
> >>> Type: Replicate
> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >>> Status: Started
> >>> Snapshot Count: 0
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> performance.readdir-ahead: on
> >>> transport.address-family: inet
> >>> features.shard: on
> >>> features.shard-block-size: 16MB
> >>
> >>
> >>
> >> When hosting VM's its essential to set these options:
> >>
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >> Also with replica two and quorum on (required) your volume will become
> >> read-only when one node goes down to prevent the possibility of
> split-brain
> >> - you *really* want to avoid that :)
> >>
> >> I'd recommend a replica 3 volume, that way 1 node can go down, but the
> other
> >> two still form a quorum and will remain r/w.
> >>
> >> If the extra disks are not possible, then a Arbiter volume can be setup
> -
> >> basically dummy files on the third node.
> >>
> >>
> >>
> >> --
> >> Lindsay Mathieson
> >>
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-17 Thread Olivier Lambert
Attached, bricks log. Where could I find the fuse client log?

On Fri, Nov 18, 2016 at 2:22 AM, Krutika Dhananjay  wrote:
> Could you attach the fuse client and brick logs?
>
> -Krutika
>
> On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert 
> wrote:
>>
>> Okay, used the exact same config you provided, and adding an arbiter
>> node (node3)
>>
>> After halting node2, VM continues to work after a small "lag"/freeze.
>> I restarted node2 and it was back online: OK
>>
>> Then, after waiting few minutes, halting node1. And **just** at this
>> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> etc.)
>>
>> dmesg of the VM:
>>
>> [ 1645.852905] EXT4-fs error (device xvda1):
>> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> inode=0, rec_len=0, name_len=0
>> [ 1645.854509] Aborting journal on device xvda1-8.
>> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>>
>> And got a lot of " comm bash: bad entry in directory" messages then...
>>
>> Here is the current config with all Node back online:
>>
>> # gluster volume info
>>
>> Volume Name: gv0
>> Type: Replicate
>> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> features.shard: on
>> features.shard-block-size: 16MB
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.stat-prefetch: on
>> performance.strict-write-ordering: off
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.data-self-heal: on
>>
>>
>> # gluster volume status
>> Status of volume: gv0
>> Gluster process TCP Port  RDMA Port  Online
>> Pid
>>
>> --
>> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>> 1331
>> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>> 2274
>> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
>> 2355
>> Self-heal Daemon on localhost   N/A   N/AY
>> 2300
>> Self-heal Daemon on 10.0.0.3N/A   N/AY
>> 10530
>> Self-heal Daemon on 10.0.0.2N/A   N/AY
>> 2425
>>
>> Task Status of Volume gv0
>>
>> --
>> There are no active volume tasks
>>
>>
>>
>> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>>  wrote:
>> > It's planned to have an arbiter soon :) It was just preliminary tests.
>> >
>> > Thanks for the settings, I'll test this soon and I'll come back to you!
>> >
>> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>> >  wrote:
>> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>> >>>
>> >>> gluster volume info gv0
>> >>>
>> >>> Volume Name: gv0
>> >>> Type: Replicate
>> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> >>> Status: Started
>> >>> Snapshot Count: 0
>> >>> Number of Bricks: 1 x 2 = 2
>> >>> Transport-type: tcp
>> >>> Bricks:
>> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >>> Options Reconfigured:
>> >>> nfs.disable: on
>> >>> performance.readdir-ahead: on
>> >>> transport.address-family: inet
>> >>> features.shard: on
>> >>> features.shard-block-size: 16MB
>> >>
>> >>
>> >>
>> >> When hosting VM's its essential to set these options:
>> >>
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >> Also with replica two and quorum on (required) your volume will become
>> >> read-only when one node goes down to prevent the possibility of
>> >> split-brain
>> >> - you *really* want to avoid that :)
>> >>
>> >> I'd recommend a replica 3 volume, that way 1 node can go down, but the
>> >> other
>> >> two still form a quorum and will remain r/w.
>> >>
>> >> If the extra disks are not possible, then a Arbiter volume can be setup
>> >> -
>> >> basically dummy files on the third node.
>> >>
>> >>
>> >>
>> >> --
>> >> Lindsay Mathieson
>> >>
>> >> ___
>> >> Gluster-users mailing list
>> >> Gluster-users@gluster.org
>> >> 

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-17 Thread Krutika Dhananjay
Could you attach the fuse client and brick logs?

-Krutika

On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert 
wrote:

> Okay, used the exact same config you provided, and adding an arbiter
> node (node3)
>
> After halting node2, VM continues to work after a small "lag"/freeze.
> I restarted node2 and it was back online: OK
>
> Then, after waiting few minutes, halting node1. And **just** at this
> moment, the VM is corrupted (segmentation fault, /var/log folder empty
> etc.)
>
> dmesg of the VM:
>
> [ 1645.852905] EXT4-fs error (device xvda1):
> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> entry in directory: rec_len is smaller than minimal - offset=0(0),
> inode=0, rec_len=0, name_len=0
> [ 1645.854509] Aborting journal on device xvda1-8.
> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>
> And got a lot of " comm bash: bad entry in directory" messages then...
>
> Here is the current config with all Node back online:
>
> # gluster volume info
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.0.1:/bricks/brick1/gv0
> Brick2: 10.0.0.2:/bricks/brick1/gv0
> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.shard: on
> features.shard-block-size: 16MB
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.stat-prefetch: on
> performance.strict-write-ordering: off
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.data-self-heal: on
>
>
> # gluster volume status
> Status of volume: gv0
> Gluster process TCP Port  RDMA Port  Online
> Pid
> 
> --
> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>  1331
> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>  2274
> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
>  2355
> Self-heal Daemon on localhost   N/A   N/AY
>  2300
> Self-heal Daemon on 10.0.0.3N/A   N/AY
>  10530
> Self-heal Daemon on 10.0.0.2N/A   N/AY
>  2425
>
> Task Status of Volume gv0
> 
> --
> There are no active volume tasks
>
>
>
> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>  wrote:
> > It's planned to have an arbiter soon :) It was just preliminary tests.
> >
> > Thanks for the settings, I'll test this soon and I'll come back to you!
> >
> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> >  wrote:
> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >>>
> >>> gluster volume info gv0
> >>>
> >>> Volume Name: gv0
> >>> Type: Replicate
> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >>> Status: Started
> >>> Snapshot Count: 0
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> performance.readdir-ahead: on
> >>> transport.address-family: inet
> >>> features.shard: on
> >>> features.shard-block-size: 16MB
> >>
> >>
> >>
> >> When hosting VM's its essential to set these options:
> >>
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >> Also with replica two and quorum on (required) your volume will become
> >> read-only when one node goes down to prevent the possibility of
> split-brain
> >> - you *really* want to avoid that :)
> >>
> >> I'd recommend a replica 3 volume, that way 1 node can go down, but the
> other
> >> two still form a quorum and will remain r/w.
> >>
> >> If the extra disks are not possible, then a Arbiter volume can be setup
> -
> >> basically dummy files on the third node.
> >>
> >>
> >>
> >> --
> >> Lindsay Mathieson
> >>
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-17 Thread Olivier Lambert
It's planned to have an arbiter soon :) It was just preliminary tests.

Thanks for the settings, I'll test this soon and I'll come back to you!

On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
 wrote:
> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>>
>> gluster volume info gv0
>>
>> Volume Name: gv0
>> Type: Replicate
>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> features.shard: on
>> features.shard-block-size: 16MB
>
>
>
> When hosting VM's its essential to set these options:
>
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.stat-prefetch: on
> performance.strict-write-ordering: off
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.data-self-heal: on
>
> Also with replica two and quorum on (required) your volume will become
> read-only when one node goes down to prevent the possibility of split-brain
> - you *really* want to avoid that :)
>
> I'd recommend a replica 3 volume, that way 1 node can go down, but the other
> two still form a quorum and will remain r/w.
>
> If the extra disks are not possible, then a Arbiter volume can be setup -
> basically dummy files on the third node.
>
>
>
> --
> Lindsay Mathieson
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-17 Thread Olivier Lambert
Sure:

# gluster volume info gv0

Volume Name: gv0
Type: Replicate
Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.0.0.1:/bricks/brick1/gv0
Brick2: 10.0.0.2:/bricks/brick1/gv0
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.shard: on
features.shard-block-size: 16MB

# gluster volume status gv0
Status of volume: gv0
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick 10.0.0.1:/bricks/brick1/gv0   49153 0  Y   1246
Brick 10.0.0.2:/bricks/brick1/gv0   49154 0  Y   1866
Self-heal Daemon on localhost   N/A   N/AY   1241
Self-heal Daemon on 10.0.0.2N/A   N/AY   2440

Task Status of Volume gv0
--
There are no active volume tasks

On Thu, Nov 17, 2016 at 11:03 PM, Lindsay Mathieson
 wrote:
> On 18/11/2016 6:00 AM, Olivier Lambert wrote:
>>
>> First off, thanks for this great product:)
>>
>> I have a corruption issue when using Glusterfs with LIO iSCSI target:
>
>
> Could you post the results of:
>
> gluster volume info 
>
> gluster volume status 
>
>
> thnaks
>
> --
> Lindsay Mathieson
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-17 Thread Lindsay Mathieson

On 18/11/2016 6:00 AM, Olivier Lambert wrote:

First off, thanks for this great product:)

I have a corruption issue when using Glusterfs with LIO iSCSI target:


Could you post the results of:

gluster volume info 

gluster volume status 


thnaks

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users