[Gluster-users] pacemaker VIP routing latency to gluster node.

2016-09-22 Thread Dung Le
Hello,

I have a pretty straight forward configuration as below:

3 storage nodes running version 3.7.11 with replica of 3 and it using native 
gluster NFS.
corosync version 1.4.7 and pacemaker version 1.1.12
I have DNS round-robin on 3 VIPs living on the 3 storage nodes.

Here is how I configure my corosync:

SN1 with x.x.x.001
SN2 with x.x.x.002
SN3 with x.x.x.003


**
Below is pcs config output:

Cluster Name: dfs_cluster
Corosync Nodes:
 SN1 SN2 SN3 
Pacemaker Nodes:
 SN1 SN2 SN3 

Resources: 
 Clone: Gluster-clone
  Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=false 
  Resource: Gluster (class=ocf provider=glusterfs type=glusterd)
   Operations: start interval=0s timeout=20 (Gluster-start-interval-0s)
   stop interval=0s timeout=20 (Gluster-stop-interval-0s)
   monitor interval=10s (Gluster-monitor-interval-10s)
 Resource: SN1-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=x.x.x.001 cidr_netmask=32 
  Operations: start interval=0s timeout=20s (SN1-ClusterIP-start-interval-0s)
  stop interval=0s timeout=20s (SN1-ClusterIP-stop-interval-0s)
  monitor interval=10s (SN1-ClusterIP-monitor-interval-10s)
 Resource: SN2-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=x.x.x.002 cidr_netmask=32 
  Operations: start interval=0s timeout=20s (SN2-ClusterIP-start-interval-0s)
  stop interval=0s timeout=20s (SN2-ClusterIP-stop-interval-0s)
  monitor interval=10s (SN2-ClusterIP-monitor-interval-10s)
 Resource: SN3-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=x.x.x.003 cidr_netmask=32 
  Operations: start interval=0s timeout=20s (SN3-ClusterIP-start-interval-0s)
  stop interval=0s timeout=20s (SN3-ClusterIP-stop-interval-0s)
  monitor interval=10s (SN3-ClusterIP-monitor-interval-10s)

Stonith Devices: 
Fencing Levels: 

Location Constraints:
  Resource: SN1-ClusterIP
Enabled on: SN1 (score:3000) (id:location-SN1-ClusterIP-SN1-3000)
Enabled on: SN2 (score:2000) (id:location-SN1-ClusterIP-SN2-2000)
Enabled on: SN3 (score:1000) (id:location-SN1-ClusterIP-SN3-1000)
  Resource: SN2-ClusterIP
Enabled on: SN2 (score:3000) (id:location-SN2-ClusterIP-SN2-3000)
Enabled on: SN3 (score:2000) (id:location-SN2-ClusterIP-SN3-2000)
Enabled on: SN1 (score:1000) (id:location-SN2-ClusterIP-SN1-1000)
  Resource: SN3-ClusterIP
Enabled on: SN3 (score:3000) (id:location-SN3-ClusterIP-SN3-3000)
Enabled on: SN1 (score:2000) (id:location-SN3-ClusterIP-SN1-2000)
Enabled on: SN2 (score:1000) (id:location-SN3-ClusterIP-SN2-1000)
Ordering Constraints:
  start Gluster-clone then start SN1-ClusterIP (kind:Mandatory) 
(id:order-Gluster-clone-SN1-ClusterIP-mandatory)
  start Gluster-clone then start SN2-ClusterIP (kind:Mandatory) 
(id:order-Gluster-clone-SN2-ClusterIP-mandatory)
  start Gluster-clone then start SN3-ClusterIP (kind:Mandatory) 
(id:order-Gluster-clone-SN3-ClusterIP-mandatory)
Colocation Constraints:

Resources Defaults:
 is-managed: true
 target-role: Started
 requires: nothing
 multiple-active: stop_nkart
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.11-97629de
 no-quorum-policy: ignore
 stonith-enabled: false

**
pcs status output:

Cluster name: dfs_cluster
Last updated: Thu Sep 22 16:57:35 2016
Last change: Mon Aug 29 18:02:44 2016
Stack: cman
Current DC: SN1 - partition with quorum
Version: 1.1.11-97629de
3 Nodes configured
6 Resources configured


Online: [ SN1 SN2 SN3 ]

Full list of resources:

 Clone Set: Gluster-clone [Gluster]
 Started: [ SN1 SN2 SN3 ]
 SN1-ClusterIP  (ocf::heartbeat:IPaddr2):   Started SN1 
 SN2-ClusterIP  (ocf::heartbeat:IPaddr2):   Started SN2 
 SN3-ClusterIP  (ocf::heartbeat:IPaddr2):   Started SN3 

**


When I mount the gluster volume, I'm using the VIP name. It will choose one of 
the storage nodes to establish NFS. 

My issue is:

After mounted gluster volume for 1 - 2 hrs, all the clients are reporting not 
getting df output as df got hung. I did check the dmessage log from client side 
and getting the following error :

Sep 20 05:46:45 x kernel: nfs: server nfsserver001 not responding, still 
trying
Sep 20 05:49:45 x kernel: nfs: server nfsserver001 not responding, still 
trying

I did try to mount the gluster volume using the DNS round-robin to different 
mountpoint but the mount process was not successful. Then I tried to mount the 
gluster volume using storage node IP itself (not VIP ip), and I was able to 
mount the gluster volume. Afterward, I flipped all the clients 

Re: [Gluster-users] gluster 3.7 healing errors (no data available, buf->ia_gfid is null)

2016-09-22 Thread Pranith Kumar Karampuri
On Thu, Sep 22, 2016 at 8:33 PM, Pasi Kärkkäinen  wrote:

> On Thu, Sep 22, 2016 at 07:20:26PM +0530, Pranith Kumar Karampuri wrote:
> >On Thu, Sep 22, 2016 at 12:51 PM, Ravishankar N
> ><[1]ravishan...@redhat.com> wrote:
> >
> >  On 09/22/2016 12:38 PM, Pasi KÀrkkÀinen wrote:
> >
> >On Thu, Sep 22, 2016 at 09:58:25AM +0530, Ravishankar N wrote:
> >
> >  On 09/21/2016 10:54 PM, Pasi KÀrkkÀinen wrote:
> >
> >Let's see.
> >
> ># getfattr -m . -d -e hex /bricks/vol1/brick1/foo
> >getfattr: Removing leading '/' from absolute path names
> ># file: bricks/vol1/brick1/foo
> >security.selinux=0x756e636f6e66696e65645f753a6f
> 626a6563745f723a756e6c6162656c65645f743a733000
> >
> >So hmm.. no trusted.gfid it seems.. is that perhaps because
> this
> >node was down when the file was created?
> >
> >  No, even if that were the case, the gfid should have been set
> while
> >  healing the file to this node.
> >  Can you try doing a setfattr -n trusted.gfid -v
> >  0xc1ca778ed2af4828b981171c0c5bd45e on the file. and launch heal
> >  again?
> >  What about the .glusterfs hardlink- does that exist?
> >
> >It seems there's no hardlink.. nothing in
> >/bricks/vol1/brick1/.glusterfs/c1/ca/ directory.
> >
> >Now I manually set the trusted.gfid value on the file, and
> launched
> >heal again,
> >and now gluster was able to heal it OK! Healing is now fully
> complete,
> >and no out-of-sync files anymore.
> >
> >Any idea what caused the missing trusted.gfid ?
> >
> >Do you want to raise a bug for this? We would love to if you don't
> have
> >the time to make sure we address this.
> >
>
> Sure. Should I file the bug on redhat bugzilla?
>

Yes, here: https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS


>
> -- Pasi
>
>
>


-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Recovering lost node in dispersed volume

2016-09-22 Thread Tony Schreiner
Thanks for that advice. It worked. Setting the UUID in glusterd.info was
the bit I missed.

It seemed to work without the setfattr step in my particular case.

On Thu, Sep 22, 2016 at 11:05 AM, Serkan Çoban 
wrote:

> Here are the steps for replacing a failed node:
>
>
> 1- In one of the other servers run "grep thaila
> /var/lib/glusterd/peers/* | cut -d: -f1 | cut -d/ -f6" and note the
> UUID
> 2- stop glusterd on failed server and add "UUID=uuid_from_previous
> step" to /var/lib/glusterd/glusterd.info and start glusterd
> 3- run "gluster peer probe calliope"
> 4- restart glusterd
> 5- now gluster peer status should show all the peers. if not probe
> them manually as above.
> 6-for all the bricks run the command "setfattr -n
> trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/vol_name/info | cut -d= -f2 | sed 's/-//g')
> brick_name"
> 7 restart glusterd and everythimg should be fine.
>
> I think I read the steps from this link:
> https://support.rackspace.com/how-to/recover-from-a-failed-
> server-in-a-glusterfs-array/
> Look to the "keep the ip address" part.
>
>
> On Thu, Sep 22, 2016 at 5:16 PM, Tony Schreiner
>  wrote:
> > I set uo a dispersed volume with 1 x (3 + 1) nodes ( i do know that 3+1
> is
> > not optimal).
> > Originally created in version 3.7 but recently upgraded without issue to
> > 3.8.
> >
> > # gluster vol info
> > Volume Name: rvol
> > Type: Disperse
> > Volume ID: e8f15248-d9de-458e-9896-f1a5782dcf74
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1 x (3 + 1) = 4
> > Transport-type: tcp
> > Bricks:
> > Brick1: calliope:/brick/p1
> > Brick2: euterpe:/brick/p1
> > Brick3: lemans:/brick/p1
> > Brick4: thalia:/brick/p1
> > Options Reconfigured:
> > performance.readdir-ahead: on
> > nfs.disable: off
> >
> > I inadvertently allowed one of the nodes (thalia) to be reinstalled;
> which
> > overwrote the system, but not the brick, and I need guidance in getting
> it
> > back into the volume.
> >
> > (on lemans)
> > gluster peer status
> > Number of Peers: 3
> >
> > Hostname: calliope
> > Uuid: 72373eb1-8047-405a-a094-891e559755da
> > State: Peer in Cluster (Connected)
> >
> > Hostname: euterpe
> > Uuid: 9fafa5c4-1541-4aa0-9ea2-923a756cadbb
> > State: Peer in Cluster (Connected)
> >
> > Hostname: thalia
> > Uuid: 843169fa-3937-42de-8fda-9819efc75fe8
> > State: Peer Rejected (Connected)
> >
> > the thalia peer is rejected. If I try to peer probe thalia I am told it
> > already part of the pool. If from thalia, I try to peer probe one of the
> > others, I am told that they are already part of another pool.
> >
> > I have tried removing the thalia brick with
> > gluster vol remove-brick rvol thalia:/brick/p1 start
> > but get the error
> > volume remove-brick start: failed: Remove brick incorrect brick count of
> 1
> > for disperse 4
> >
> > I am not finding much guidance for this particular situation. I could
> use a
> > suggestion on how to recover. It's a lab situation so no biggie if I lose
> > it.
> > Cheers
> >
> > Tony Schreiner
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Recovering lost node in dispersed volume

2016-09-22 Thread Serkan Çoban
Here are the steps for replacing a failed node:


1- In one of the other servers run "grep thaila
/var/lib/glusterd/peers/* | cut -d: -f1 | cut -d/ -f6" and note the
UUID
2- stop glusterd on failed server and add "UUID=uuid_from_previous
step" to /var/lib/glusterd/glusterd.info and start glusterd
3- run "gluster peer probe calliope"
4- restart glusterd
5- now gluster peer status should show all the peers. if not probe
them manually as above.
6-for all the bricks run the command "setfattr -n
trusted.glusterfs.volume-id -v 0x$(grep volume-id
/var/lib/glusterd/vols/vol_name/info | cut -d= -f2 | sed 's/-//g')
brick_name"
7 restart glusterd and everythimg should be fine.

I think I read the steps from this link:
https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/
Look to the "keep the ip address" part.


On Thu, Sep 22, 2016 at 5:16 PM, Tony Schreiner
 wrote:
> I set uo a dispersed volume with 1 x (3 + 1) nodes ( i do know that 3+1 is
> not optimal).
> Originally created in version 3.7 but recently upgraded without issue to
> 3.8.
>
> # gluster vol info
> Volume Name: rvol
> Type: Disperse
> Volume ID: e8f15248-d9de-458e-9896-f1a5782dcf74
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (3 + 1) = 4
> Transport-type: tcp
> Bricks:
> Brick1: calliope:/brick/p1
> Brick2: euterpe:/brick/p1
> Brick3: lemans:/brick/p1
> Brick4: thalia:/brick/p1
> Options Reconfigured:
> performance.readdir-ahead: on
> nfs.disable: off
>
> I inadvertently allowed one of the nodes (thalia) to be reinstalled; which
> overwrote the system, but not the brick, and I need guidance in getting it
> back into the volume.
>
> (on lemans)
> gluster peer status
> Number of Peers: 3
>
> Hostname: calliope
> Uuid: 72373eb1-8047-405a-a094-891e559755da
> State: Peer in Cluster (Connected)
>
> Hostname: euterpe
> Uuid: 9fafa5c4-1541-4aa0-9ea2-923a756cadbb
> State: Peer in Cluster (Connected)
>
> Hostname: thalia
> Uuid: 843169fa-3937-42de-8fda-9819efc75fe8
> State: Peer Rejected (Connected)
>
> the thalia peer is rejected. If I try to peer probe thalia I am told it
> already part of the pool. If from thalia, I try to peer probe one of the
> others, I am told that they are already part of another pool.
>
> I have tried removing the thalia brick with
> gluster vol remove-brick rvol thalia:/brick/p1 start
> but get the error
> volume remove-brick start: failed: Remove brick incorrect brick count of 1
> for disperse 4
>
> I am not finding much guidance for this particular situation. I could use a
> suggestion on how to recover. It's a lab situation so no biggie if I lose
> it.
> Cheers
>
> Tony Schreiner
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster 3.7 healing errors (no data available, buf->ia_gfid is null)

2016-09-22 Thread Pasi Kärkkäinen
On Thu, Sep 22, 2016 at 07:20:26PM +0530, Pranith Kumar Karampuri wrote:
>On Thu, Sep 22, 2016 at 12:51 PM, Ravishankar N
><[1]ravishan...@redhat.com> wrote:
> 
>  On 09/22/2016 12:38 PM, Pasi KÀrkkÀinen wrote:
> 
>On Thu, Sep 22, 2016 at 09:58:25AM +0530, Ravishankar N wrote:
> 
>  On 09/21/2016 10:54 PM, Pasi KÀrkkÀinen wrote:
> 
>Let's see.
> 
># getfattr -m . -d -e hex /bricks/vol1/brick1/foo
>getfattr: Removing leading '/' from absolute path names
># file: bricks/vol1/brick1/foo
>
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> 
>So hmm.. no trusted.gfid it seems.. is that perhaps because this
>node was down when the file was created?
> 
>  No, even if that were the case, the gfid should have been set while
>  healing the file to this node.
>  Can you try doing a setfattr -n trusted.gfid -v
>  0xc1ca778ed2af4828b981171c0c5bd45e on the file. and launch heal
>  again?
>  What about the .glusterfs hardlink- does that exist?
> 
>It seems there's no hardlink.. nothing in
>/bricks/vol1/brick1/.glusterfs/c1/ca/ directory.
> 
>Now I manually set the trusted.gfid value on the file, and launched
>heal again,
>and now gluster was able to heal it OK! Healing is now fully complete,
>and no out-of-sync files anymore.
> 
>Any idea what caused the missing trusted.gfid ?
> 
>Do you want to raise a bug for this? We would love to if you don't have
>the time to make sure we address this.
>

Sure. Should I file the bug on redhat bugzilla? 


-- Pasi


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Recovering lost node in dispersed volume

2016-09-22 Thread Tony Schreiner
I set uo a dispersed volume with 1 x (3 + 1) nodes ( i do know that 3+1 is
not optimal).
Originally created in version 3.7 but recently upgraded without issue to
3.8.

# gluster vol info
Volume Name: rvol
Type: Disperse
Volume ID: e8f15248-d9de-458e-9896-f1a5782dcf74
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (3 + 1) = 4
Transport-type: tcp
Bricks:
Brick1: calliope:/brick/p1
Brick2: euterpe:/brick/p1
Brick3: lemans:/brick/p1
Brick4: thalia:/brick/p1
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: off

I inadvertently allowed one of the nodes (thalia) to be reinstalled; which
overwrote the system, but not the brick, and I need guidance in getting it
back into the volume.

(on lemans)
gluster peer status
Number of Peers: 3

Hostname: calliope
Uuid: 72373eb1-8047-405a-a094-891e559755da
State: Peer in Cluster (Connected)

Hostname: euterpe
Uuid: 9fafa5c4-1541-4aa0-9ea2-923a756cadbb
State: Peer in Cluster (Connected)

Hostname: thalia
Uuid: 843169fa-3937-42de-8fda-9819efc75fe8
State: Peer Rejected (Connected)

the thalia peer is rejected. If I try to peer probe thalia I am told it
already part of the pool. If from thalia, I try to peer probe one of the
others, I am told that they are already part of another pool.

I have tried removing the thalia brick with
gluster vol remove-brick rvol thalia:/brick/p1 start
but get the error
volume remove-brick start: failed: Remove brick incorrect brick count of 1
for disperse 4

I am not finding much guidance for this particular situation. I could use a
suggestion on how to recover. It's a lab situation so no biggie if I lose
it.
Cheers

Tony Schreiner
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster 3.7 healing errors (no data available, buf->ia_gfid is null)

2016-09-22 Thread Pranith Kumar Karampuri
On Thu, Sep 22, 2016 at 12:51 PM, Ravishankar N 
wrote:

> On 09/22/2016 12:38 PM, Pasi Kärkkäinen wrote:
>
>> On Thu, Sep 22, 2016 at 09:58:25AM +0530, Ravishankar N wrote:
>>
>>> On 09/21/2016 10:54 PM, Pasi Kärkkäinen wrote:
>>>
 Let's see.

 # getfattr -m . -d -e hex /bricks/vol1/brick1/foo
 getfattr: Removing leading '/' from absolute path names
 # file: bricks/vol1/brick1/foo
 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
 23a756e6c6162656c65645f743a733000

 So hmm.. no trusted.gfid it seems.. is that perhaps because this node
 was down when the file was created?

>>> No, even if that were the case, the gfid should have been set while
>>> healing the file to this node.
>>> Can you try doing a setfattr -n trusted.gfid -v
>>> 0xc1ca778ed2af4828b981171c0c5bd45e on the file. and launch heal
>>> again?
>>> What about the .glusterfs hardlink- does that exist?
>>>
>>> It seems there's no hardlink.. nothing in 
>>> /bricks/vol1/brick1/.glusterfs/c1/ca/
>> directory.
>>
>> Now I manually set the trusted.gfid value on the file, and launched heal
>> again,
>> and now gluster was able to heal it OK! Healing is now fully complete,
>> and no out-of-sync files anymore.
>>
>> Any idea what caused the missing trusted.gfid ?
>>
>
Do you want to raise a bug for this? We would love to if you don't have the
time to make sure we address this.


> A create FOP is a multi step process on the bricks.
> 1.creating the file on the actual path
> 2. Setting the gluster xattrs including gfid xattr
> 3. Creating the link file inside .glusterfs
>
> I'm guessing your brick went down after step 1 for the files in question.
> Check the brick logs to check for such messages. If the brick was still up,
> check if there are logs for failures related to performing 2 and 3.
>
> By the way, if everything healed successfully, check that the .glusterfs
> hardlink  is now present.
>
> -Ravi
>
>
>
>>
>>
>> Thanks a lot!
>>
>> -- Pasi
>>
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.8.3 Bitrot signature process

2016-09-22 Thread Amudhan P
Hi Kotresh,

I have raised bug.

https://bugzilla.redhat.com/show_bug.cgi?id=1378466

Thanks
Amudhan

On Thu, Sep 22, 2016 at 2:45 PM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Hi Amudhan,
>
> It's as of now, hard coded based on some testing results. That part is not
> tune-able yet.
> Only scrubber throttling is tune-able. As I have told you, because brick
> process has
> an open fd, bitrot signer process is not picking it up for scrubbing.
> Please raise
> a bug. We will take a look at it.
>
> Thanks and Regards,
> Kotresh H R
>
> - Original Message -
> > From: "Amudhan P" 
> > To: "Kotresh Hiremath Ravishankar" 
> > Cc: "Gluster Users" 
> > Sent: Thursday, September 22, 2016 2:37:25 PM
> > Subject: Re: 3.8.3 Bitrot signature process
> >
> > Hi Kotresh,
> >
> > its same behaviour in replicated volume also, file fd opens after 120
> > seconds in brick pid.
> >
> > for calculating signature for 100MB file it took 15m57s.
> >
> >
> > How can i increase CPU usage?, in your earlier mail you have said "To
> limit
> > the usage of CPU, throttling is done using token bucket algorithm".
> > any possibility of increasing bitrot hash calculation speed ?.
> >
> >
> > Thanks,
> > Amudhan
> >
> >
> > On Thu, Sep 22, 2016 at 11:44 AM, Kotresh Hiremath Ravishankar <
> > khire...@redhat.com> wrote:
> >
> > > Hi Amudhan,
> > >
> > > Thanks for the confirmation. If that's the case please try with
> dist-rep
> > > volume,
> > > and see if you are observing similar behavior.
> > >
> > > In any case please raise a bug for the same with your observations. We
> > > will work
> > > on it.
> > >
> > > Thanks and Regards,
> > > Kotresh H R
> > >
> > > - Original Message -
> > > > From: "Amudhan P" 
> > > > To: "Kotresh Hiremath Ravishankar" 
> > > > Cc: "Gluster Users" 
> > > > Sent: Thursday, September 22, 2016 11:25:28 AM
> > > > Subject: Re: 3.8.3 Bitrot signature process
> > > >
> > > > Hi Kotresh,
> > > >
> > > > 2280 is a brick process, i have not tried with dist-rep volume?
> > > >
> > > > I have not seen any fd in bitd process in any of the node's and bitd
> > > > process usage always 0% CPU and randomly it goes 0.3% CPU.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Amudhan
> > > >
> > > > On Thursday, September 22, 2016, Kotresh Hiremath Ravishankar <
> > > > khire...@redhat.com> wrote:
> > > > > Hi Amudhan,
> > > > >
> > > > > No, bitrot signer is a different process by itself and is not part
> of
> > > > brick process.
> > > > > I believe the process 2280 is a brick process ? Did you check with
> > > > dist-rep volume?
> > > > > Is the same behavior being observed there as well? We need to
> figure
> > > out
> > > > why brick
> > > > > process is holding that fd for such a long time.
> > > > >
> > > > > Thanks and Regards,
> > > > > Kotresh H R
> > > > >
> > > > > - Original Message -
> > > > >> From: "Amudhan P" 
> > > > >> To: "Kotresh Hiremath Ravishankar" 
> > > > >> Sent: Wednesday, September 21, 2016 8:15:33 PM
> > > > >> Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process
> > > > >>
> > > > >> Hi Kotresh,
> > > > >>
> > > > >> As soon as fd closes from brick1 pid, i can see bitrot signature
> for
> > > the
> > > > >> file in brick.
> > > > >>
> > > > >> So, it looks like fd opened by brick process to calculate
> signature.
> > > > >>
> > > > >> output of the file:
> > > > >>
> > > > >> -rw-r--r-- 2 root root 250M Sep 21 18:32
> > > > >> /media/disk1/brick1/data/G/test59-bs10M-c100.nul
> > > > >>
> > > > >> getfattr: Removing leading '/' from absolute path names
> > > > >> # file: media/disk1/brick1/data/G/test59-bs10M-c100.nul
> > > > >> trusted.bit-rot.signature=0x010200e9474e4cc6
> > > > 73c0c227a6e807e04aa4ab1f88d3744243950a290869c53daa65df
> > > > >> trusted.bit-rot.version=0x020057d6af3200012a13
> > > > >> trusted.ec.config=0x080501000200
> > > > >> trusted.ec.size=0x3e80
> > > > >> trusted.ec.version=0x1f401f40
> > > > >> trusted.gfid=0x4c091145429448468fffe358482c63e1
> > > > >>
> > > > >> stat /media/disk1/brick1/data/G/test59-bs10M-c100.nul
> > > > >>   File: ‘/media/disk1/brick1/data/G/test59-bs10M-c100.nul’
> > > > >>   Size: 262144000   Blocks: 512000 IO Block: 4096
>  regular
> > > file
> > > > >> Device: 811h/2065d  Inode: 402653311   Links: 2
> > > > >> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/
> > > root)
> > > > >> Access: 2016-09-21 18:34:43.722712751 +0530
> > > > >> Modify: 2016-09-21 18:32:41.650712946 +0530
> > > > >> Change: 2016-09-21 19:14:41.698708914 +0530
> > > > >>  Birth: -
> > > > >>
> > > > >>
> > > > >> In other 2 bricks in same set, still signature is not updated for
> the
> > > > same
> > > > >> file.
> > > > >>
> > > > >>
> > > > >> On Wed, Sep 21, 

[Gluster-users] FUSE mounts and Docker integration

2016-09-22 Thread Gandalf Corvotempesta
I would like to use Gluster as shared storage for apps deployed
through a PaaS that we are creating.
Currently I'm able to mount a gluster volume in each "compute" nodes
and then mount a sudirectory from this shared volume to each Docker
app.

Obviously this is not very secure, as also wrote on official docker
docs. There are some cases where users could exit from the mount point
and be able to traverse the whole FS

One solution (I think) would be to use docker volume plugin like this:
https://github.com/amarkwalder/docker-volume-glusterfs

but have to create 1 volume (with replica and so on) for each app is
resource wasting.

Any solution?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] An Update on GlusterD-2.0

2016-09-22 Thread Kaushal M
The first preview/dev release of GlusterD-2.0 is available now. A
prebuilt binary is available for download from the release-page[1].

This is just a preview of what has been happening in GD2, to give
users a taste of how GD2 is evolving.

GD2 can now form a cluster, list peers, create/delete,(psuedo)
start/stop and list volumes. Most of these will undergo changes and
refined as we progress.

More information on how to test this release can be found on the release page.

We'll providing periodic (hopefully fortnightly) updates on the
changes happening in GD2 from now on.

We'll also be providing periodic dev builds for people to test.
Currently builds are only available for Linux on amd64. Vagrant and
docker releases are planned to make it easier to test GD2.

Thanks,
Kaushal


[1] https://github.com/gluster/glusterd2/releases/tag/v4.0dev-1
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.8.3 Bitrot signature process

2016-09-22 Thread Kotresh Hiremath Ravishankar
Hi Amudhan,

It's as of now, hard coded based on some testing results. That part is not 
tune-able yet.
Only scrubber throttling is tune-able. As I have told you, because brick 
process has
an open fd, bitrot signer process is not picking it up for scrubbing. Please 
raise
a bug. We will take a look at it.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Amudhan P" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Gluster Users" 
> Sent: Thursday, September 22, 2016 2:37:25 PM
> Subject: Re: 3.8.3 Bitrot signature process
> 
> Hi Kotresh,
> 
> its same behaviour in replicated volume also, file fd opens after 120
> seconds in brick pid.
> 
> for calculating signature for 100MB file it took 15m57s.
> 
> 
> How can i increase CPU usage?, in your earlier mail you have said "To limit
> the usage of CPU, throttling is done using token bucket algorithm".
> any possibility of increasing bitrot hash calculation speed ?.
> 
> 
> Thanks,
> Amudhan
> 
> 
> On Thu, Sep 22, 2016 at 11:44 AM, Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
> 
> > Hi Amudhan,
> >
> > Thanks for the confirmation. If that's the case please try with dist-rep
> > volume,
> > and see if you are observing similar behavior.
> >
> > In any case please raise a bug for the same with your observations. We
> > will work
> > on it.
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > - Original Message -
> > > From: "Amudhan P" 
> > > To: "Kotresh Hiremath Ravishankar" 
> > > Cc: "Gluster Users" 
> > > Sent: Thursday, September 22, 2016 11:25:28 AM
> > > Subject: Re: 3.8.3 Bitrot signature process
> > >
> > > Hi Kotresh,
> > >
> > > 2280 is a brick process, i have not tried with dist-rep volume?
> > >
> > > I have not seen any fd in bitd process in any of the node's and bitd
> > > process usage always 0% CPU and randomly it goes 0.3% CPU.
> > >
> > >
> > >
> > > Thanks,
> > > Amudhan
> > >
> > > On Thursday, September 22, 2016, Kotresh Hiremath Ravishankar <
> > > khire...@redhat.com> wrote:
> > > > Hi Amudhan,
> > > >
> > > > No, bitrot signer is a different process by itself and is not part of
> > > brick process.
> > > > I believe the process 2280 is a brick process ? Did you check with
> > > dist-rep volume?
> > > > Is the same behavior being observed there as well? We need to figure
> > out
> > > why brick
> > > > process is holding that fd for such a long time.
> > > >
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > > - Original Message -
> > > >> From: "Amudhan P" 
> > > >> To: "Kotresh Hiremath Ravishankar" 
> > > >> Sent: Wednesday, September 21, 2016 8:15:33 PM
> > > >> Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process
> > > >>
> > > >> Hi Kotresh,
> > > >>
> > > >> As soon as fd closes from brick1 pid, i can see bitrot signature for
> > the
> > > >> file in brick.
> > > >>
> > > >> So, it looks like fd opened by brick process to calculate signature.
> > > >>
> > > >> output of the file:
> > > >>
> > > >> -rw-r--r-- 2 root root 250M Sep 21 18:32
> > > >> /media/disk1/brick1/data/G/test59-bs10M-c100.nul
> > > >>
> > > >> getfattr: Removing leading '/' from absolute path names
> > > >> # file: media/disk1/brick1/data/G/test59-bs10M-c100.nul
> > > >> trusted.bit-rot.signature=0x010200e9474e4cc6
> > > 73c0c227a6e807e04aa4ab1f88d3744243950a290869c53daa65df
> > > >> trusted.bit-rot.version=0x020057d6af3200012a13
> > > >> trusted.ec.config=0x080501000200
> > > >> trusted.ec.size=0x3e80
> > > >> trusted.ec.version=0x1f401f40
> > > >> trusted.gfid=0x4c091145429448468fffe358482c63e1
> > > >>
> > > >> stat /media/disk1/brick1/data/G/test59-bs10M-c100.nul
> > > >>   File: ‘/media/disk1/brick1/data/G/test59-bs10M-c100.nul’
> > > >>   Size: 262144000   Blocks: 512000 IO Block: 4096   regular
> > file
> > > >> Device: 811h/2065d  Inode: 402653311   Links: 2
> > > >> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/
> > root)
> > > >> Access: 2016-09-21 18:34:43.722712751 +0530
> > > >> Modify: 2016-09-21 18:32:41.650712946 +0530
> > > >> Change: 2016-09-21 19:14:41.698708914 +0530
> > > >>  Birth: -
> > > >>
> > > >>
> > > >> In other 2 bricks in same set, still signature is not updated for the
> > > same
> > > >> file.
> > > >>
> > > >>
> > > >> On Wed, Sep 21, 2016 at 6:48 PM, Amudhan P 
> > wrote:
> > > >>
> > > >> > Hi Kotresh,
> > > >> >
> > > >> > I am very sure, No read was going on from mount point.
> > > >> >
> > > >> > Again i did same test but after writing data to mount point. I have
> > > >> > unmounted mount point.
> > > >> >
> > > >> > after 120 seconds i am seeing this file fd entry in brick 1 pid
> > > >> >
> > > >> > getfattr -m. -e hex -d test59-bs10
> > > >> > # file: 

Re: [Gluster-users] 3.8.3 Bitrot signature process

2016-09-22 Thread Amudhan P
Hi Kotresh,

its same behaviour in replicated volume also, file fd opens after 120
seconds in brick pid.

for calculating signature for 100MB file it took 15m57s.


How can i increase CPU usage?, in your earlier mail you have said "To limit
the usage of CPU, throttling is done using token bucket algorithm".
any possibility of increasing bitrot hash calculation speed ?.


Thanks,
Amudhan


On Thu, Sep 22, 2016 at 11:44 AM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Hi Amudhan,
>
> Thanks for the confirmation. If that's the case please try with dist-rep
> volume,
> and see if you are observing similar behavior.
>
> In any case please raise a bug for the same with your observations. We
> will work
> on it.
>
> Thanks and Regards,
> Kotresh H R
>
> - Original Message -
> > From: "Amudhan P" 
> > To: "Kotresh Hiremath Ravishankar" 
> > Cc: "Gluster Users" 
> > Sent: Thursday, September 22, 2016 11:25:28 AM
> > Subject: Re: 3.8.3 Bitrot signature process
> >
> > Hi Kotresh,
> >
> > 2280 is a brick process, i have not tried with dist-rep volume?
> >
> > I have not seen any fd in bitd process in any of the node's and bitd
> > process usage always 0% CPU and randomly it goes 0.3% CPU.
> >
> >
> >
> > Thanks,
> > Amudhan
> >
> > On Thursday, September 22, 2016, Kotresh Hiremath Ravishankar <
> > khire...@redhat.com> wrote:
> > > Hi Amudhan,
> > >
> > > No, bitrot signer is a different process by itself and is not part of
> > brick process.
> > > I believe the process 2280 is a brick process ? Did you check with
> > dist-rep volume?
> > > Is the same behavior being observed there as well? We need to figure
> out
> > why brick
> > > process is holding that fd for such a long time.
> > >
> > > Thanks and Regards,
> > > Kotresh H R
> > >
> > > - Original Message -
> > >> From: "Amudhan P" 
> > >> To: "Kotresh Hiremath Ravishankar" 
> > >> Sent: Wednesday, September 21, 2016 8:15:33 PM
> > >> Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process
> > >>
> > >> Hi Kotresh,
> > >>
> > >> As soon as fd closes from brick1 pid, i can see bitrot signature for
> the
> > >> file in brick.
> > >>
> > >> So, it looks like fd opened by brick process to calculate signature.
> > >>
> > >> output of the file:
> > >>
> > >> -rw-r--r-- 2 root root 250M Sep 21 18:32
> > >> /media/disk1/brick1/data/G/test59-bs10M-c100.nul
> > >>
> > >> getfattr: Removing leading '/' from absolute path names
> > >> # file: media/disk1/brick1/data/G/test59-bs10M-c100.nul
> > >> trusted.bit-rot.signature=0x010200e9474e4cc6
> > 73c0c227a6e807e04aa4ab1f88d3744243950a290869c53daa65df
> > >> trusted.bit-rot.version=0x020057d6af3200012a13
> > >> trusted.ec.config=0x080501000200
> > >> trusted.ec.size=0x3e80
> > >> trusted.ec.version=0x1f401f40
> > >> trusted.gfid=0x4c091145429448468fffe358482c63e1
> > >>
> > >> stat /media/disk1/brick1/data/G/test59-bs10M-c100.nul
> > >>   File: ‘/media/disk1/brick1/data/G/test59-bs10M-c100.nul’
> > >>   Size: 262144000   Blocks: 512000 IO Block: 4096   regular
> file
> > >> Device: 811h/2065d  Inode: 402653311   Links: 2
> > >> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/
> root)
> > >> Access: 2016-09-21 18:34:43.722712751 +0530
> > >> Modify: 2016-09-21 18:32:41.650712946 +0530
> > >> Change: 2016-09-21 19:14:41.698708914 +0530
> > >>  Birth: -
> > >>
> > >>
> > >> In other 2 bricks in same set, still signature is not updated for the
> > same
> > >> file.
> > >>
> > >>
> > >> On Wed, Sep 21, 2016 at 6:48 PM, Amudhan P 
> wrote:
> > >>
> > >> > Hi Kotresh,
> > >> >
> > >> > I am very sure, No read was going on from mount point.
> > >> >
> > >> > Again i did same test but after writing data to mount point. I have
> > >> > unmounted mount point.
> > >> >
> > >> > after 120 seconds i am seeing this file fd entry in brick 1 pid
> > >> >
> > >> > getfattr -m. -e hex -d test59-bs10
> > >> > # file: test59-bs10M-c100.nul
> > >> > trusted.bit-rot.version=0x020057bed574000ed534
> > >> > trusted.ec.config=0x080501000200
> > >> > trusted.ec.size=0x3e80
> > >> > trusted.ec.version=0x1f401f40
> > >> > trusted.gfid=0x4c091145429448468fffe358482c63e1
> > >> >
> > >> >
> > >> > ls -l /proc/2280/fd
> > >> > lr-x-- 1 root root 64 Sep 21 13:08 19 -> /media/disk1/brick1/.
> > >> > glusterfs/4c/09/4c091145-4294-4846-8fff-e358482c63e1
> > >> >
> > >> > Volume is a EC - 4+1
> > >> >
> > >> > On Wed, Sep 21, 2016 at 6:17 PM, Kotresh Hiremath Ravishankar <
> > >> > khire...@redhat.com> wrote:
> > >> >
> > >> >> Hi Amudhan,
> > >> >>
> > >> >> If you see the ls output, some process has a fd opened in the
> backend.
> > >> >> That is the reason bitrot is not considering for the signing.
> > >> >> Could you please observe, after 120 secs 

Re: [Gluster-users] write performance with NIC bonding

2016-09-22 Thread Дмитрий Глушенок
Hi,

It is because your switch is not performing round-robin distribution while 
sending data to server (probably it can't). Usually it is enough to configure 
ip-port LACP hashing to evenly distribute traffic by all ports in aggregation. 
But any single tcp connections will still be using only one interface.

--
Dmitry Glushenok
Jet Infosystems

> 21 сент. 2016 г., в 23:42, James Ching  написал(а):
> 
> Hi,
> 
> I'm using gluster 3.7.5 and I'm trying to get port bonding working properly 
> with the gluster protocol.  I've bonded the NICs using round robin because I 
> also bond it at the switch level with link aggregation.  I've used this type 
> of bonding without a problem with my other applications but for some reason 
> gluster does not want to utilize all 3 NICs for writes but it does for 
> reads... any of you come across this or know why?  Here's the output of the 
> traffic on the NICs you can see that RX is unbalanced but TX is completely 
> balanced across the 3 NICs.  I've tried both mounting via glusterfs or nfs, 
> both result in the same imbalance. Am I missing some configuration?
> 
> 
> root@e-gluster-01:~# ifconfig
> bond0 Link encap:Ethernet
>  inet addr:  Bcast:128.33.23.255 Mask:255.255.248.0
>  inet6 addr: fe80::46a8:42ff:fe43:8817/64 Scope:Link
>  UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500 Metric:1
>  RX packets:160972852 errors:0 dropped:0 overruns:0 frame:0
>  TX packets:122295229 errors:0 dropped:0 overruns:0 carrier:0
>  collisions:0 txqueuelen:0
>  RX bytes:152800624950 (142.3 GiB)  TX bytes:138720356365 (129.1 GiB)
> 
> em1   Link encap:Ethernet
>  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
>  RX packets:160793725 errors:0 dropped:0 overruns:0 frame:0
>  TX packets:40763142 errors:0 dropped:0 overruns:0 carrier:0
>  collisions:0 txqueuelen:1000
>  RX bytes:152688146880 (142.2 GiB)  TX bytes:46239971255 (43.0 GiB)
>  Interrupt:41
> 
> em2   Link encap:Ethernet
>  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
>  RX packets:92451 errors:0 dropped:0 overruns:0 frame:0
>  TX packets:40750031 errors:0 dropped:0 overruns:0 carrier:0
>  collisions:0 txqueuelen:1000
>  RX bytes:9001370 (8.5 MiB)  TX bytes:46216513162 (43.0 GiB)
>  Interrupt:45
> 
> em3   Link encap:Ethernet
>  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
>  RX packets:86676 errors:0 dropped:0 overruns:0 frame:0
>  TX packets:40782056 errors:0 dropped:0 overruns:0 carrier:0
>  collisions:0 txqueuelen:1000
>  RX bytes:103476700 (98.6 MiB)  TX bytes:46263871948 (43.0 GiB)
>  Interrupt:40
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster 3.7 healing errors (no data available, buf->ia_gfid is null)

2016-09-22 Thread Ravishankar N

On 09/22/2016 12:38 PM, Pasi Kärkkäinen wrote:

On Thu, Sep 22, 2016 at 09:58:25AM +0530, Ravishankar N wrote:

On 09/21/2016 10:54 PM, Pasi Kärkkäinen wrote:

Let's see.

# getfattr -m . -d -e hex /bricks/vol1/brick1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/vol1/brick1/foo
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000

So hmm.. no trusted.gfid it seems.. is that perhaps because this node was down 
when the file was created?

No, even if that were the case, the gfid should have been set while
healing the file to this node.
Can you try doing a setfattr -n trusted.gfid -v
0xc1ca778ed2af4828b981171c0c5bd45e on the file. and launch heal
again?
What about the .glusterfs hardlink- does that exist?


It seems there's no hardlink.. nothing in /bricks/vol1/brick1/.glusterfs/c1/ca/ 
directory.

Now I manually set the trusted.gfid value on the file, and launched heal again,
and now gluster was able to heal it OK! Healing is now fully complete, and no 
out-of-sync files anymore.

Any idea what caused the missing trusted.gfid ?

A create FOP is a multi step process on the bricks.
1.creating the file on the actual path
2. Setting the gluster xattrs including gfid xattr
3. Creating the link file inside .glusterfs

I'm guessing your brick went down after step 1 for the files in 
question.  Check the brick logs to check for such messages. If the brick 
was still up, check if there are logs for failures related to performing 
2 and 3.


By the way, if everything healed successfully, check that the .glusterfs 
hardlink  is now present.


-Ravi





Thanks a lot!

-- Pasi



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster 3.7 healing errors (no data available, buf->ia_gfid is null)

2016-09-22 Thread Pasi Kärkkäinen
On Thu, Sep 22, 2016 at 09:58:25AM +0530, Ravishankar N wrote:
> On 09/21/2016 10:54 PM, Pasi Kärkkäinen wrote:
> >Let's see.
> >
> ># getfattr -m . -d -e hex /bricks/vol1/brick1/foo
> >getfattr: Removing leading '/' from absolute path names
> ># file: bricks/vol1/brick1/foo
> >security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> >
> >So hmm.. no trusted.gfid it seems.. is that perhaps because this node was 
> >down when the file was created?
>
> No, even if that were the case, the gfid should have been set while
> healing the file to this node.
> Can you try doing a setfattr -n trusted.gfid -v
> 0xc1ca778ed2af4828b981171c0c5bd45e on the file. and launch heal
> again?
> What about the .glusterfs hardlink- does that exist?
> 

It seems there's no hardlink.. nothing in /bricks/vol1/brick1/.glusterfs/c1/ca/ 
directory.

Now I manually set the trusted.gfid value on the file, and launched heal again,
and now gluster was able to heal it OK! Healing is now fully complete, and no 
out-of-sync files anymore.

Any idea what caused the missing trusted.gfid ? 



Thanks a lot!

-- Pasi

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] 3.8.3 Bitrot signature process

2016-09-22 Thread Kotresh Hiremath Ravishankar
Hi Amudhan,

Thanks for the confirmation. If that's the case please try with dist-rep volume,
and see if you are observing similar behavior.

In any case please raise a bug for the same with your observations. We will work
on it.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Amudhan P" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Gluster Users" 
> Sent: Thursday, September 22, 2016 11:25:28 AM
> Subject: Re: 3.8.3 Bitrot signature process
> 
> Hi Kotresh,
> 
> 2280 is a brick process, i have not tried with dist-rep volume?
> 
> I have not seen any fd in bitd process in any of the node's and bitd
> process usage always 0% CPU and randomly it goes 0.3% CPU.
> 
> 
> 
> Thanks,
> Amudhan
> 
> On Thursday, September 22, 2016, Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
> > Hi Amudhan,
> >
> > No, bitrot signer is a different process by itself and is not part of
> brick process.
> > I believe the process 2280 is a brick process ? Did you check with
> dist-rep volume?
> > Is the same behavior being observed there as well? We need to figure out
> why brick
> > process is holding that fd for such a long time.
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > - Original Message -
> >> From: "Amudhan P" 
> >> To: "Kotresh Hiremath Ravishankar" 
> >> Sent: Wednesday, September 21, 2016 8:15:33 PM
> >> Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process
> >>
> >> Hi Kotresh,
> >>
> >> As soon as fd closes from brick1 pid, i can see bitrot signature for the
> >> file in brick.
> >>
> >> So, it looks like fd opened by brick process to calculate signature.
> >>
> >> output of the file:
> >>
> >> -rw-r--r-- 2 root root 250M Sep 21 18:32
> >> /media/disk1/brick1/data/G/test59-bs10M-c100.nul
> >>
> >> getfattr: Removing leading '/' from absolute path names
> >> # file: media/disk1/brick1/data/G/test59-bs10M-c100.nul
> >> trusted.bit-rot.signature=0x010200e9474e4cc6
> 73c0c227a6e807e04aa4ab1f88d3744243950a290869c53daa65df
> >> trusted.bit-rot.version=0x020057d6af3200012a13
> >> trusted.ec.config=0x080501000200
> >> trusted.ec.size=0x3e80
> >> trusted.ec.version=0x1f401f40
> >> trusted.gfid=0x4c091145429448468fffe358482c63e1
> >>
> >> stat /media/disk1/brick1/data/G/test59-bs10M-c100.nul
> >>   File: ‘/media/disk1/brick1/data/G/test59-bs10M-c100.nul’
> >>   Size: 262144000   Blocks: 512000 IO Block: 4096   regular file
> >> Device: 811h/2065d  Inode: 402653311   Links: 2
> >> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> >> Access: 2016-09-21 18:34:43.722712751 +0530
> >> Modify: 2016-09-21 18:32:41.650712946 +0530
> >> Change: 2016-09-21 19:14:41.698708914 +0530
> >>  Birth: -
> >>
> >>
> >> In other 2 bricks in same set, still signature is not updated for the
> same
> >> file.
> >>
> >>
> >> On Wed, Sep 21, 2016 at 6:48 PM, Amudhan P  wrote:
> >>
> >> > Hi Kotresh,
> >> >
> >> > I am very sure, No read was going on from mount point.
> >> >
> >> > Again i did same test but after writing data to mount point. I have
> >> > unmounted mount point.
> >> >
> >> > after 120 seconds i am seeing this file fd entry in brick 1 pid
> >> >
> >> > getfattr -m. -e hex -d test59-bs10
> >> > # file: test59-bs10M-c100.nul
> >> > trusted.bit-rot.version=0x020057bed574000ed534
> >> > trusted.ec.config=0x080501000200
> >> > trusted.ec.size=0x3e80
> >> > trusted.ec.version=0x1f401f40
> >> > trusted.gfid=0x4c091145429448468fffe358482c63e1
> >> >
> >> >
> >> > ls -l /proc/2280/fd
> >> > lr-x-- 1 root root 64 Sep 21 13:08 19 -> /media/disk1/brick1/.
> >> > glusterfs/4c/09/4c091145-4294-4846-8fff-e358482c63e1
> >> >
> >> > Volume is a EC - 4+1
> >> >
> >> > On Wed, Sep 21, 2016 at 6:17 PM, Kotresh Hiremath Ravishankar <
> >> > khire...@redhat.com> wrote:
> >> >
> >> >> Hi Amudhan,
> >> >>
> >> >> If you see the ls output, some process has a fd opened in the backend.
> >> >> That is the reason bitrot is not considering for the signing.
> >> >> Could you please observe, after 120 secs of closure of
> >> >> "/media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435-
> >> >> 85bf-f21f99fd8764"
> >> >> the signing happens. If so we need to figure out who holds this fd for
> >> >> such a long time.
> >> >> And also we need to figure is this issue specific to EC volume.
> >> >>
> >> >> Thanks and Regards,
> >> >> Kotresh H R
> >> >>
> >> >> - Original Message -
> >> >> > From: "Amudhan P" 
> >> >> > To: "Kotresh Hiremath Ravishankar" 
> >> >> > Cc: "Gluster Users" 
> >> >> > Sent: Wednesday, September 21, 2016 4:56:40 PM
> >> >> > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process
> >> >> >
> >> >> > Hi Kotresh,
> >> >> >
> >>