Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread Serkan Çoban
Hi,

You can find the output below link:
https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0

Thanks,
Serkan

On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez  wrote:
> Can you post the result of 'gluster volume status v0 detail' ?
>
>
> On 05/05/16 06:49, Serkan Çoban wrote:
>>
>> Hi, Can anyone suggest something for this issue? df, du has no issue
>> for the bricks yet one subvolume not being used by gluster..
>>
>> On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban 
>> wrote:
>>>
>>> Hi,
>>>
>>> I changed cluster.min-free-inodes to "0". Remount the volume on
>>> clients. inode full messages not coming to syslog anymore but I see
>>> disperse-56 subvolume still not being used.
>>> Anything I can do to resolve this issue? Maybe I can destroy and
>>> recreate the volume but I am not sure It will fix this issue...
>>> Maybe the disperse size 16+4 is too big should I change it to 8+2?
>>>
>>> On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban 
>>> wrote:

 I also checked the df output all 20 bricks are same like below:
 /dev/sdu1 7.3T 34M 7.3T 1% /bricks/20

 On Tue, May 3, 2016 at 1:40 PM, Raghavendra G 
 wrote:
>
>
>
> On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban 
> wrote:
>>
>>
>>> 1. What is the out put of du -hs ? Please get this
>>> information for each of the brick that are part of disperse.
>
>
>
> Sorry. I needed df output of the filesystem containing brick. Not du.
> Sorry
> about that.
>
>>
>> There are 20 bricks in disperse-56 and the du -hs output is like:
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 1.8M /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>>
>> I see that gluster is not writing to this disperse set. All other
>> disperse sets are filled 13GB but this one is empty. I see directory
>> structure created but no files in directories.
>> How can I fix the issue? I will try to rebalance but I don't think it
>> will write to this disperse set...
>>
>>
>>
>> On Sat, Apr 30, 2016 at 9:22 AM, Raghavendra G
>> 
>> wrote:
>>>
>>>
>>>
>>> On Fri, Apr 29, 2016 at 12:32 AM, Serkan Çoban
>>> 
>>> wrote:


 Hi, I cannot get an answer from user list, so asking to devel list.

 I am getting [dht-diskusage.c:277:dht_is_subvol_filled] 0-v0-dht:
 inodes on subvolume 'v0-disperse-56' are at (100.00 %), consider
 adding more bricks.

 message on client logs.My cluster is empty there are only a couple
 of
 GB files for testing. Why this message appear in syslog?
>>>
>>>
>>>
>>> dht uses disk usage information from backend export.
>>>
>>> 1. What is the out put of du -hs ? Please get this
>>> information for each of the brick that are part of disperse.
>>> 2. Once you get du information from each brick, the value seen by dht
>>> will
>>> be based on how cluster/disperse aggregates du info (basically statfs
>>> fop).
>>>
>>> The reason for 100% disk usage may be,
>>> In case of 1, backend fs might be shared by data other than brick.
>>> In case of 2, some issues with aggregation.
>>>
 Is is safe to
 ignore it?
>>>
>>>
>>>
>>> dht will try not to have data files on the subvol in question
>>> (v0-disperse-56). Hence lookup cost will be two hops for files
>>> hashing
>>> to
>>> disperse-56 (note that other fops like read/write/open still have the
>>> cost
>>> of single hop and dont suffer from this penalty). Other than that
>>> there
>>> is
>>> no significant harm unless disperse-56 is really running out of
>>> space.
>>>
>>> regards,
>>> Raghavendra
>>>
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Raghavendra G
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
>
> --
> Raghavendra G
>>
>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> 

Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread Serkan Çoban
Hi, Can anyone suggest something for this issue? df, du has no issue
for the bricks yet one subvolume not being used by gluster..

On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban  wrote:
> Hi,
>
> I changed cluster.min-free-inodes to "0". Remount the volume on
> clients. inode full messages not coming to syslog anymore but I see
> disperse-56 subvolume still not being used.
> Anything I can do to resolve this issue? Maybe I can destroy and
> recreate the volume but I am not sure It will fix this issue...
> Maybe the disperse size 16+4 is too big should I change it to 8+2?
>
> On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban  wrote:
>> I also checked the df output all 20 bricks are same like below:
>> /dev/sdu1 7.3T 34M 7.3T 1% /bricks/20
>>
>> On Tue, May 3, 2016 at 1:40 PM, Raghavendra G  
>> wrote:
>>>
>>>
>>> On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban  wrote:

 >1. What is the out put of du -hs ? Please get this
 > information for each of the brick that are part of disperse.
>>>
>>>
>>> Sorry. I needed df output of the filesystem containing brick. Not du. Sorry
>>> about that.
>>>

 There are 20 bricks in disperse-56 and the du -hs output is like:
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 1.8M /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20
 80K /bricks/20

 I see that gluster is not writing to this disperse set. All other
 disperse sets are filled 13GB but this one is empty. I see directory
 structure created but no files in directories.
 How can I fix the issue? I will try to rebalance but I don't think it
 will write to this disperse set...



 On Sat, Apr 30, 2016 at 9:22 AM, Raghavendra G 
 wrote:
 >
 >
 > On Fri, Apr 29, 2016 at 12:32 AM, Serkan Çoban 
 > wrote:
 >>
 >> Hi, I cannot get an answer from user list, so asking to devel list.
 >>
 >> I am getting [dht-diskusage.c:277:dht_is_subvol_filled] 0-v0-dht:
 >> inodes on subvolume 'v0-disperse-56' are at (100.00 %), consider
 >> adding more bricks.
 >>
 >> message on client logs.My cluster is empty there are only a couple of
 >> GB files for testing. Why this message appear in syslog?
 >
 >
 > dht uses disk usage information from backend export.
 >
 > 1. What is the out put of du -hs ? Please get this
 > information for each of the brick that are part of disperse.
 > 2. Once you get du information from each brick, the value seen by dht
 > will
 > be based on how cluster/disperse aggregates du info (basically statfs
 > fop).
 >
 > The reason for 100% disk usage may be,
 > In case of 1, backend fs might be shared by data other than brick.
 > In case of 2, some issues with aggregation.
 >
 >> Is is safe to
 >> ignore it?
 >
 >
 > dht will try not to have data files on the subvol in question
 > (v0-disperse-56). Hence lookup cost will be two hops for files hashing
 > to
 > disperse-56 (note that other fops like read/write/open still have the
 > cost
 > of single hop and dont suffer from this penalty). Other than that there
 > is
 > no significant harm unless disperse-56 is really running out of space.
 >
 > regards,
 > Raghavendra
 >
 >> ___
 >> Gluster-devel mailing list
 >> Gluster-devel@gluster.org
 >> http://www.gluster.org/mailman/listinfo/gluster-devel
 >
 >
 >
 >
 > --
 > Raghavendra G
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>>
>>>
>>> --
>>> Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread Serkan Çoban
Ah I see how can I overlook this, my bad sorry everyone for taking
time to help me...
>BTW, how large is the volume you have?
9PB usable :)

Serkan

On Thu, May 5, 2016 at 2:07 PM, Xavier Hernandez  wrote:
> On 05/05/16 11:31, Kaushal M wrote:
>>
>> On Thu, May 5, 2016 at 2:36 PM, David Gossage
>>  wrote:
>>>
>>>
>>>
>>>
>>> On Thu, May 5, 2016 at 3:28 AM, Serkan Çoban 
>>> wrote:


 Hi,

 You can find the output below link:
 https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0

 Thanks,
 Serkan
>>>
>>>
>>>
>>> Maybe not issue, but playing one of these things is not like the other I
>>> notice of all the bricks only one seems to be different at a quick glance
>>>
>>> Brick: Brick 1.1.1.235:/bricks/20
>>> TCP Port : 49170
>>> RDMA Port: 0
>>> Online   : Y
>>> Pid  : 26736
>>> File System  : ext4
>>> Device   : /dev/mapper/vol0-vol_root
>>> Mount Options: rw,relatime,data=ordered
>>> Inode Size   : 256
>>> Disk Space Free  : 86.1GB
>>> Total Disk Space : 96.0GB
>>> Inode Count  : 6406144
>>> Free Inodes  : 6381374
>>>
>>> Every other brick seems to be 7TB and xfs but this one.
>>
>>
>> Looks like the brick fs isn't mounted, and the root-fs is being used
>> instead. But that still leaves enough inodes free.
>>
>> What I suspect is that one of the cluster translators is mixing up
>> stats when aggregating from multiple bricks.
>> From the log snippet you gave in the first mail, it seems like the
>> disperse translator is possibly involved.
>
>
> Currently ec takes the number of potential files in the subvolume (f_files)
> as the maximum of all its subvolumes, but it takes the available count
> (f_ffree) as the minumum of all its volumes.
>
> This causes max to be ~781.000.000, but free will be ~6.300.000. This gives
> a ~0.8% available, i.e. almost 100% full.
>
> Given the circumstances I think it's the correct thing to do.
>
> Xavi
>
>
>>
>> BTW, how large is the volume you have? Those are a lot of bricks!
>>
>> ~kaushal
>>
>>
>>>
>>>
>>>


 On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez 
 wrote:
>
> Can you post the result of 'gluster volume status v0 detail' ?
>
>
> On 05/05/16 06:49, Serkan Çoban wrote:
>>
>>
>> Hi, Can anyone suggest something for this issue? df, du has no issue
>> for the bricks yet one subvolume not being used by gluster..
>>
>> On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban 
>> wrote:
>>>
>>>
>>> Hi,
>>>
>>> I changed cluster.min-free-inodes to "0". Remount the volume on
>>> clients. inode full messages not coming to syslog anymore but I see
>>> disperse-56 subvolume still not being used.
>>> Anything I can do to resolve this issue? Maybe I can destroy and
>>> recreate the volume but I am not sure It will fix this issue...
>>> Maybe the disperse size 16+4 is too big should I change it to 8+2?
>>>
>>> On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban 
>>> wrote:


 I also checked the df output all 20 bricks are same like below:
 /dev/sdu1 7.3T 34M 7.3T 1% /bricks/20

 On Tue, May 3, 2016 at 1:40 PM, Raghavendra G
 
 wrote:
>
>
>
>
> On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban
> 
> wrote:
>>
>>
>>
>>> 1. What is the out put of du -hs ? Please get
>>> this
>>> information for each of the brick that are part of disperse.
>
>
>
>
> Sorry. I needed df output of the filesystem containing brick. Not
> du.
> Sorry
> about that.
>
>>
>> There are 20 bricks in disperse-56 and the du -hs output is like:
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 1.8M /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>>
>> I see that gluster is not writing to this disperse set. All other
>> disperse sets are filled 13GB but this one is empty. I see
>> directory
>> structure created but no files in directories.
>> How can I fix the issue? I will try to rebalance but I don't think
>> it
>> will write to this 

[Gluster-devel] git-branch-diff: wrapper script for git to visualize backports

2016-05-05 Thread Prasanna Kalever
Hi Team,

Checkout glusterfs script that is capable of showing your list of commits
missed (backporting) in other branches (say 3.7.12) w.r.t master

http://review.gluster.org/#/c/14230/


This script helps in visualizing backported and missed commits between two
different branches.

While backporting commit to another branch only subject of the patch may
remain unchanged, all others such as commit message,  commit Id, change Id,
bug Id, will be changed. This script works by taking commit subject as the
key value for comparing two git branches, which can be local or remote.



Help:

$ ./extras/git-branch-diff.py --help
usage: git-branch-diff.py [-h] [-s SOURCE_BRANCH] -t TARGET_BRANCH
  [-a GIT_AUTHOR] [-p REPO_PATH]

git wrapper to diff local/remote branches

optional arguments:
  -h, --helpshow this help message and exit
  -s SOURCE_BRANCH, --source-branch SOURCE_BRANCH
source branch name
  -t TARGET_BRANCH, --target-branch TARGET_BRANCH
target branch name
  -a GIT_AUTHOR, --author GIT_AUTHOR
default: git config name/email
  -p REPO_PATH, --path REPO_PATH
show branches diff specific to path


Sample usages:

  $ ./extras/git-branch-diff.py -t origin/release-3.8
  $ ./extras/git-branch-diff.py -s local_branch -t origin/release-3.7
  $ ./extras/git-branch-diff.py -t origin/release-3.8
--author="us...@redhat.com"
  $ ./extras/git-branch-diff.py -t origin/release-3.8 --path="xlators/"

  $ ./extras/git-branch-diff.py -t origin/release-3.8 --author=""



Example output:

$ ./extras/git-branch-diff.py -t origin/release-3.8 --path=./rpc



[ ✔ ] Successfully Backported changes:
{from: remotes/origin/master  to: origin/release-3.8}

glusterd: try to connect on GF_PMAP_PORT_FOREIGN aswell
rpc: fix gf_process_reserved_ports
rpc: assign port only if it is unreserved
server/protocol: option for dynamic authorization of client permissions
rpc: fix binding brick issue while bind-insecure is enabled
rpc: By default set allow-insecure, bind-insecure to on



[ ✖ ] Missing patches in origin/release-3.8:

glusterd: add defence mechanism to avoid brick port clashes
rpc: define client port range




Note: This script may ignore commits which have altered their commit subjects
while backporting patches. Also this script doesn't have any intelligence to
detect squashed commits.



Thanks,
--
Prasanna
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread Xavier Hernandez

On 05/05/16 13:59, Kaushal M wrote:

On Thu, May 5, 2016 at 4:37 PM, Xavier Hernandez  wrote:

On 05/05/16 11:31, Kaushal M wrote:


On Thu, May 5, 2016 at 2:36 PM, David Gossage
 wrote:





On Thu, May 5, 2016 at 3:28 AM, Serkan Çoban 
wrote:



Hi,

You can find the output below link:
https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0

Thanks,
Serkan




Maybe not issue, but playing one of these things is not like the other I
notice of all the bricks only one seems to be different at a quick glance

Brick: Brick 1.1.1.235:/bricks/20
TCP Port : 49170
RDMA Port: 0
Online   : Y
Pid  : 26736
File System  : ext4
Device   : /dev/mapper/vol0-vol_root
Mount Options: rw,relatime,data=ordered
Inode Size   : 256
Disk Space Free  : 86.1GB
Total Disk Space : 96.0GB
Inode Count  : 6406144
Free Inodes  : 6381374

Every other brick seems to be 7TB and xfs but this one.



Looks like the brick fs isn't mounted, and the root-fs is being used
instead. But that still leaves enough inodes free.

What I suspect is that one of the cluster translators is mixing up
stats when aggregating from multiple bricks.
From the log snippet you gave in the first mail, it seems like the
disperse translator is possibly involved.



Currently ec takes the number of potential files in the subvolume (f_files)
as the maximum of all its subvolumes, but it takes the available count
(f_ffree) as the minumum of all its volumes.

This causes max to be ~781.000.000, but free will be ~6.300.000. This gives
a ~0.8% available, i.e. almost 100% full.

Given the circumstances I think it's the correct thing to do.


Thanks for giving the reasoning Xavi.

But why is the number of potential files the maximum?
IIUC, a file (or parts of it) will be written to all subvolumes in the
disperse set.
So wouldn't the smallest subvolume limit the number of files that
could be possibly created?


I'm not very sure why this decision was taken. In theory ec only 
supports identical subvolumes because of the way it works. This means 
that all bricks should report the same maximum.


When this doesn't happen, I suppose that the motivation was that this 
number should report the theoretic maximum number of files that the 
volume can contain.




~kaushal



Xavi




BTW, how large is the volume you have? Those are a lot of bricks!

~kaushal









On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez 
wrote:


Can you post the result of 'gluster volume status v0 detail' ?


On 05/05/16 06:49, Serkan Çoban wrote:



Hi, Can anyone suggest something for this issue? df, du has no issue
for the bricks yet one subvolume not being used by gluster..

On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban 
wrote:



Hi,

I changed cluster.min-free-inodes to "0". Remount the volume on
clients. inode full messages not coming to syslog anymore but I see
disperse-56 subvolume still not being used.
Anything I can do to resolve this issue? Maybe I can destroy and
recreate the volume but I am not sure It will fix this issue...
Maybe the disperse size 16+4 is too big should I change it to 8+2?

On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban 
wrote:



I also checked the df output all 20 bricks are same like below:
/dev/sdu1 7.3T 34M 7.3T 1% /bricks/20

On Tue, May 3, 2016 at 1:40 PM, Raghavendra G

wrote:





On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban

wrote:





1. What is the out put of du -hs ? Please get
this
information for each of the brick that are part of disperse.





Sorry. I needed df output of the filesystem containing brick. Not
du.
Sorry
about that.



There are 20 bricks in disperse-56 and the du -hs output is like:
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
1.8M /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20

I see that gluster is not writing to this disperse set. All other
disperse sets are filled 13GB but this one is empty. I see
directory
structure created but no files in directories.
How can I fix the issue? I will try to rebalance but I don't think
it
will write to this disperse set...



On Sat, Apr 30, 2016 at 9:22 AM, Raghavendra G

wrote:





On Fri, Apr 29, 2016 at 12:32 AM, Serkan Çoban

wrote:




Hi, I cannot get an answer from user list, so asking to devel
list.

I am getting [dht-diskusage.c:277:dht_is_subvol_filled]
0-v0-dht:
inodes on subvolume 'v0-disperse-56' are at (100.00 %), consider
adding more bricks.

message on client logs.My cluster is empty there are only a
couple
of
GB files 

Re: [Gluster-devel] Handling locks in NSR

2016-05-05 Thread Avra Sengupta

Hi,

I have sent a patch(http://review.gluster.org/#/c/14226/1) in accordance 
to lock/unlock fops in jbr-server and the discussion we had below. 
Please feel free to review the same. Thanks.


Regards,
Avra

On 03/03/2016 12:21 PM, Avra Sengupta wrote:

On 03/03/2016 02:29 AM, Shyam wrote:

On 03/02/2016 03:10 AM, Avra Sengupta wrote:

Hi,

All fops in NSR, follow a specific workflow as described in this
UML(https://docs.google.com/presentation/d/1lxwox72n6ovfOwzmdlNCZBJ5vQcCaONvZva0aLWKUqk/edit?usp=sharing). 


However all locking fops will follow a slightly different workflow as
described below. This is a first proposed draft for handling locks, and
we would like to hear your concerns and queries regarding the same.


This change, to handle locking FOPs differently, is due to what 
limitation/problem? (apologies if I missed an earlier thread on the 
same)


My understanding is that this is due to the fact that the actual FOP 
could fail/block (non-blocking/blocking) as there is an existing lock 
held, and hence just adding a journal entry and meeting quorum, is 
not sufficient for the success of the FOP (it is necessary though to 
handle lock preservation in the event of leadership change), rather 
acquiring the lock is. Is this understanding right?
Yes it is right, the change in approach for handling locks is to avoid 
getting into a deadlock amogst the followers.


Based on the above understanding of mine, and the discussion below, 
the intention seems to be to place the locking xlator below the 
journal. What if we place this xlator above the journal, but add 
requirements that FOPs handled by this xlator needs to reach the 
journal?


Assuming we adopt this strategy (i.e the locks xlator is above the 
journal xlator), a successful lock acquisition by the locks xlator is 
not enough to guarantee that the lock is preserved across the replica 
group, hence it has to reach the journal and as a result pass through 
other replica members journal and locks xlators as well.


If we do the above, what are the advantages and repercussions of the 
same?
Why would we want to put the locking xlator above the journal. Is 
there a use case for that?
Firstly, we would have to modify the locking xlator to make it pass 
through.
We would also introduce a small window where we perform the lock 
successfully, but have a failure on the journal. We would then have to 
release the lock because we failed to journal it. In the previous 
approach, if we fail to journal it, we wouldn't even go to the locking 
xlator. Logically it makes the locking xlator dependent on the 
journal's output, whereas ideally the journal should be dependent on 
the locking xlator's output.


Some of the points noted here (like conflicting non-blocking locks 
when the previous lock is not yet released) could be handled. Also in 
your scheme, what happens to blocking lock requests, the FOP will 
block, there is no async return to handle the success/failure of the 
same.
Yes the FOP will block on blocking lock requests. I assume that's the 
behaviour today. Please correct me if I am wrong.


The downside is that on reconciliation we need to, potentially, undo 
some of the locks that are held by the locks xlator (in the new 
leader), which is outside the scope of the journal xlator.
Yes we need to do lock cleanup on reconciliation, which is anyways 
outside the scope of the journal xlator. The reconciliation daemon 
will compare the terms on each replica node, and either acquire or 
release locks accordingly.



I also assume we need to do the same for the leases xlator as well, 
right?
Yes, as long as we handle locking properly leases xlators shouldn't be 
a problem.





1. On receiving the lock, the leader will Journal the lock himself, and
then try to actually acquire the lock. At this point in time, if it
fails to acquire the lock, then it will invalidate the journal entry,
and return a -ve ack back to the client. However, if it is 
successful in

acquiring the lock, it will mark the journal entry as complete, and
forward the fop to the followers.

2. The followers on receiving the fop, will journal it, and then try to
actually acquire the lock. If it fails to acquire the lock, then it 
will

invalidate the journal entry, and return a -ve ack back to the leader.
If it is successful in acquiring the lock, it will mark the journal
entry as complete,and send a +ve ack to the leader.

3. The leader on receiving all acks, will perform a quorum check. If
quorum meets, it will send a +ve ack to the client. If the quorum 
fails,

it will send a rollback to the followers.

4. The followers on receiving the rollback, will journal it first, and
then release the acquired lock. It will update the rollback entry in 
the

journal as complete and send an ack to the leader.

5. The leader on receiving the rollback acks, will journal it's own
rollback, and then release the acquired lock. It will update the
rollback entry in the journal, and send a -ve ack to 

Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread Kaushal M
On Thu, May 5, 2016 at 4:59 PM, Serkan Çoban  wrote:
> Ah I see how can I overlook this, my bad sorry everyone for taking
> time to help me...
>>BTW, how large is the volume you have?
> 9PB usable :)

Happy to help!

>
> Serkan
>
> On Thu, May 5, 2016 at 2:07 PM, Xavier Hernandez  
> wrote:
>> On 05/05/16 11:31, Kaushal M wrote:
>>>
>>> On Thu, May 5, 2016 at 2:36 PM, David Gossage
>>>  wrote:




 On Thu, May 5, 2016 at 3:28 AM, Serkan Çoban 
 wrote:
>
>
> Hi,
>
> You can find the output below link:
> https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0
>
> Thanks,
> Serkan



 Maybe not issue, but playing one of these things is not like the other I
 notice of all the bricks only one seems to be different at a quick glance

 Brick: Brick 1.1.1.235:/bricks/20
 TCP Port : 49170
 RDMA Port: 0
 Online   : Y
 Pid  : 26736
 File System  : ext4
 Device   : /dev/mapper/vol0-vol_root
 Mount Options: rw,relatime,data=ordered
 Inode Size   : 256
 Disk Space Free  : 86.1GB
 Total Disk Space : 96.0GB
 Inode Count  : 6406144
 Free Inodes  : 6381374

 Every other brick seems to be 7TB and xfs but this one.
>>>
>>>
>>> Looks like the brick fs isn't mounted, and the root-fs is being used
>>> instead. But that still leaves enough inodes free.
>>>
>>> What I suspect is that one of the cluster translators is mixing up
>>> stats when aggregating from multiple bricks.
>>> From the log snippet you gave in the first mail, it seems like the
>>> disperse translator is possibly involved.
>>
>>
>> Currently ec takes the number of potential files in the subvolume (f_files)
>> as the maximum of all its subvolumes, but it takes the available count
>> (f_ffree) as the minumum of all its volumes.
>>
>> This causes max to be ~781.000.000, but free will be ~6.300.000. This gives
>> a ~0.8% available, i.e. almost 100% full.
>>
>> Given the circumstances I think it's the correct thing to do.
>>
>> Xavi
>>
>>
>>>
>>> BTW, how large is the volume you have? Those are a lot of bricks!
>>>
>>> ~kaushal
>>>
>>>



>
>
> On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez 
> wrote:
>>
>> Can you post the result of 'gluster volume status v0 detail' ?
>>
>>
>> On 05/05/16 06:49, Serkan Çoban wrote:
>>>
>>>
>>> Hi, Can anyone suggest something for this issue? df, du has no issue
>>> for the bricks yet one subvolume not being used by gluster..
>>>
>>> On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban 
>>> wrote:


 Hi,

 I changed cluster.min-free-inodes to "0". Remount the volume on
 clients. inode full messages not coming to syslog anymore but I see
 disperse-56 subvolume still not being used.
 Anything I can do to resolve this issue? Maybe I can destroy and
 recreate the volume but I am not sure It will fix this issue...
 Maybe the disperse size 16+4 is too big should I change it to 8+2?

 On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban 
 wrote:
>
>
> I also checked the df output all 20 bricks are same like below:
> /dev/sdu1 7.3T 34M 7.3T 1% /bricks/20
>
> On Tue, May 3, 2016 at 1:40 PM, Raghavendra G
> 
> wrote:
>>
>>
>>
>>
>> On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban
>> 
>> wrote:
>>>
>>>
>>>
 1. What is the out put of du -hs ? Please get
 this
 information for each of the brick that are part of disperse.
>>
>>
>>
>>
>> Sorry. I needed df output of the filesystem containing brick. Not
>> du.
>> Sorry
>> about that.
>>
>>>
>>> There are 20 bricks in disperse-56 and the du -hs output is like:
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 1.8M /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>> 80K /bricks/20
>>>
>>> I see that gluster is not writing to this disperse set. All other
>>> 

Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread Kaushal M
On Thu, May 5, 2016 at 4:37 PM, Xavier Hernandez  wrote:
> On 05/05/16 11:31, Kaushal M wrote:
>>
>> On Thu, May 5, 2016 at 2:36 PM, David Gossage
>>  wrote:
>>>
>>>
>>>
>>>
>>> On Thu, May 5, 2016 at 3:28 AM, Serkan Çoban 
>>> wrote:


 Hi,

 You can find the output below link:
 https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0

 Thanks,
 Serkan
>>>
>>>
>>>
>>> Maybe not issue, but playing one of these things is not like the other I
>>> notice of all the bricks only one seems to be different at a quick glance
>>>
>>> Brick: Brick 1.1.1.235:/bricks/20
>>> TCP Port : 49170
>>> RDMA Port: 0
>>> Online   : Y
>>> Pid  : 26736
>>> File System  : ext4
>>> Device   : /dev/mapper/vol0-vol_root
>>> Mount Options: rw,relatime,data=ordered
>>> Inode Size   : 256
>>> Disk Space Free  : 86.1GB
>>> Total Disk Space : 96.0GB
>>> Inode Count  : 6406144
>>> Free Inodes  : 6381374
>>>
>>> Every other brick seems to be 7TB and xfs but this one.
>>
>>
>> Looks like the brick fs isn't mounted, and the root-fs is being used
>> instead. But that still leaves enough inodes free.
>>
>> What I suspect is that one of the cluster translators is mixing up
>> stats when aggregating from multiple bricks.
>> From the log snippet you gave in the first mail, it seems like the
>> disperse translator is possibly involved.
>
>
> Currently ec takes the number of potential files in the subvolume (f_files)
> as the maximum of all its subvolumes, but it takes the available count
> (f_ffree) as the minumum of all its volumes.
>
> This causes max to be ~781.000.000, but free will be ~6.300.000. This gives
> a ~0.8% available, i.e. almost 100% full.
>
> Given the circumstances I think it's the correct thing to do.

Thanks for giving the reasoning Xavi.

But why is the number of potential files the maximum?
IIUC, a file (or parts of it) will be written to all subvolumes in the
disperse set.
So wouldn't the smallest subvolume limit the number of files that
could be possibly created?

~kaushal

>
> Xavi
>
>
>>
>> BTW, how large is the volume you have? Those are a lot of bricks!
>>
>> ~kaushal
>>
>>
>>>
>>>
>>>


 On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez 
 wrote:
>
> Can you post the result of 'gluster volume status v0 detail' ?
>
>
> On 05/05/16 06:49, Serkan Çoban wrote:
>>
>>
>> Hi, Can anyone suggest something for this issue? df, du has no issue
>> for the bricks yet one subvolume not being used by gluster..
>>
>> On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban 
>> wrote:
>>>
>>>
>>> Hi,
>>>
>>> I changed cluster.min-free-inodes to "0". Remount the volume on
>>> clients. inode full messages not coming to syslog anymore but I see
>>> disperse-56 subvolume still not being used.
>>> Anything I can do to resolve this issue? Maybe I can destroy and
>>> recreate the volume but I am not sure It will fix this issue...
>>> Maybe the disperse size 16+4 is too big should I change it to 8+2?
>>>
>>> On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban 
>>> wrote:


 I also checked the df output all 20 bricks are same like below:
 /dev/sdu1 7.3T 34M 7.3T 1% /bricks/20

 On Tue, May 3, 2016 at 1:40 PM, Raghavendra G
 
 wrote:
>
>
>
>
> On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban
> 
> wrote:
>>
>>
>>
>>> 1. What is the out put of du -hs ? Please get
>>> this
>>> information for each of the brick that are part of disperse.
>
>
>
>
> Sorry. I needed df output of the filesystem containing brick. Not
> du.
> Sorry
> about that.
>
>>
>> There are 20 bricks in disperse-56 and the du -hs output is like:
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 1.8M /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>> 80K /bricks/20
>>
>> I see that gluster is not writing to this disperse set. All other
>> disperse sets are filled 13GB but this one is empty. I see
>> directory
>> structure created but no files in 

Re: [Gluster-devel] Idea: Alternate Release process

2016-05-05 Thread Aravinda


regards
Aravinda

On 05/05/2016 03:54 PM, Kaushal M wrote:

On Thu, May 5, 2016 at 11:48 AM, Aravinda  wrote:

Hi,

Sharing an idea to manage multiple releases without maintaining
multiple release branches and backports.

This idea is heavily inspired by the Rust release model(you may feel
exactly same except the LTS part). I think Chrome/Firefox also follows
the same model.

http://blog.rust-lang.org/2014/10/30/Stability.html

Feature Flag:
--
Compile time variable to prevent compiling featurerelated code when
disabled. (For example, ./configure--disable-geo-replication
or ./configure --disable-xml etc)

Plan
-
- Nightly build with all the features enabled(./build --nightly)

- All new patches will land in Master, if the patch belongs to a
   existing feature then it should be written behind that feature flag.

- If a feature is still work in progress then it will be only enabled in
   nightly build and not enabled in beta or stable builds.
   Once the maintainer thinks the feature is ready for testing then that
   feature will be enabled in beta build.

- Every 6 weeks, beta branch will be created by enabling all the
   features which maintainers thinks it is stable and previous beta
   branch will be promoted as stable.
   All the previous beta features will be enabled in stable unless it
   is marked as unstable during beta testing.

- LTS builds are same as stable builds but without enabling all the
   features. If we decide last stable build will become LTS release,
   then the feature list from last stable build will be saved as
   `features-release-.yaml`, For example:
   features-release-3.9.yaml`
   Same feature list will be used while building minor releases for the
   LTS. For example, `./build --stable --features features-release-3.8.yaml`

- Three branches, nightly/master, testing/beta, stable

To summarize,
- One stable release once in 6 weeks
- One Beta release once in 6 weeks
- Nightly builds every day
- LTS release once in 6 months or 1 year, Minor releases once in 6 weeks.

Advantageous:
-
1. No more backports required to different release branches.(only
exceptional backports, discussed below)
2. Non feature Bugfix will never get missed in releases.
3. Release process can be automated.
4. Bugzilla process can be simplified.

Challenges:

1. Enforcing Feature flag for every patch
2. Tests also should be behind feature flag
3. New release process

Backports, Bug Fixes and Features:
--
- Release bug fix - Patch only to Master, which will be available in
   next beta/stable build.
- Urgent bug fix - Patch to Master and Backport to beta and stable
   branch, and early release stable and beta build.
- Beta bug fix - Patch to Master and Backport to Beta branch if urgent.
- Security fix - Patch to Master, Beta and last stable branch and build
   all LTS releases.
- Features - Patch only to Master, which will be available in
   stable/beta builds once feature becomes stable.

FAQs:
-
- Can a feature development take more than one release cycle(6 weeks)?
Yes, the feature will be enabled only in nightly build and not in
beta/stable builds. Once the feature is complete mark it as
stable so that it will be included in next beta build and stable
build.


---

Do you like the idea? Let me know what you guys think.


This reduces the number of versions that we need to maintain, which I like.
Having official test (beta) releases should help get features out to
testers hand faster,
and get quicker feedback.

One thing that's still not quite clear to is the issue of backwards
compatibility.
I'm still thinking it thorough and don't have a proper answer to this yet.
Would a new release be backwards compatible with the previous release?
Should we be maintaining compatibility with LTS releases with the
latest release?
Each LTS release will have seperate list of features to be enabled. If 
we make any breaking changes(which are not backward compatible) then it 
will affect LTS releases as you mentioned. But we should not break 
compatibility unless it is major version change like 4.0. I have to 
workout how we can handle backward incompatible changes.



With our current strategy, we at least have a long term release branch,
so we get some guarantees of compatibility with releases on the same branch.

As I understand the proposed approach, we'd be replacing a stable
branch with the beta branch.
So we don't have a long-term release branch (apart from LTS).
Stable branch is common for LTS releases also. Builds will be different 
using different list of features.


Below example shows stable release once in 6 weeks, and two LTS releases 
in 6 months gap(3.8 and 3.12)


LTS 1 : 3.83.8.1  3.8.2  3.8.3   3.8.4   3.8.5...
LTS 2 :  3.123.12.1...
Stable: 3.83.93.10   3.113.123.13...

A user would be upgrading from one branch to another for every release.
Can we 

Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread Xavier Hernandez

On 05/05/16 11:31, Kaushal M wrote:

On Thu, May 5, 2016 at 2:36 PM, David Gossage
 wrote:




On Thu, May 5, 2016 at 3:28 AM, Serkan Çoban  wrote:


Hi,

You can find the output below link:
https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0

Thanks,
Serkan



Maybe not issue, but playing one of these things is not like the other I
notice of all the bricks only one seems to be different at a quick glance

Brick: Brick 1.1.1.235:/bricks/20
TCP Port : 49170
RDMA Port: 0
Online   : Y
Pid  : 26736
File System  : ext4
Device   : /dev/mapper/vol0-vol_root
Mount Options: rw,relatime,data=ordered
Inode Size   : 256
Disk Space Free  : 86.1GB
Total Disk Space : 96.0GB
Inode Count  : 6406144
Free Inodes  : 6381374

Every other brick seems to be 7TB and xfs but this one.


Looks like the brick fs isn't mounted, and the root-fs is being used
instead. But that still leaves enough inodes free.

What I suspect is that one of the cluster translators is mixing up
stats when aggregating from multiple bricks.
From the log snippet you gave in the first mail, it seems like the
disperse translator is possibly involved.


Currently ec takes the number of potential files in the subvolume 
(f_files) as the maximum of all its subvolumes, but it takes the 
available count (f_ffree) as the minumum of all its volumes.


This causes max to be ~781.000.000, but free will be ~6.300.000. This 
gives a ~0.8% available, i.e. almost 100% full.


Given the circumstances I think it's the correct thing to do.

Xavi



BTW, how large is the volume you have? Those are a lot of bricks!

~kaushal









On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez 
wrote:

Can you post the result of 'gluster volume status v0 detail' ?


On 05/05/16 06:49, Serkan Çoban wrote:


Hi, Can anyone suggest something for this issue? df, du has no issue
for the bricks yet one subvolume not being used by gluster..

On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban 
wrote:


Hi,

I changed cluster.min-free-inodes to "0". Remount the volume on
clients. inode full messages not coming to syslog anymore but I see
disperse-56 subvolume still not being used.
Anything I can do to resolve this issue? Maybe I can destroy and
recreate the volume but I am not sure It will fix this issue...
Maybe the disperse size 16+4 is too big should I change it to 8+2?

On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban 
wrote:


I also checked the df output all 20 bricks are same like below:
/dev/sdu1 7.3T 34M 7.3T 1% /bricks/20

On Tue, May 3, 2016 at 1:40 PM, Raghavendra G

wrote:




On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban

wrote:




1. What is the out put of du -hs ? Please get
this
information for each of the brick that are part of disperse.




Sorry. I needed df output of the filesystem containing brick. Not
du.
Sorry
about that.



There are 20 bricks in disperse-56 and the du -hs output is like:
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
1.8M /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20

I see that gluster is not writing to this disperse set. All other
disperse sets are filled 13GB but this one is empty. I see
directory
structure created but no files in directories.
How can I fix the issue? I will try to rebalance but I don't think
it
will write to this disperse set...



On Sat, Apr 30, 2016 at 9:22 AM, Raghavendra G

wrote:




On Fri, Apr 29, 2016 at 12:32 AM, Serkan Çoban

wrote:



Hi, I cannot get an answer from user list, so asking to devel
list.

I am getting [dht-diskusage.c:277:dht_is_subvol_filled] 0-v0-dht:
inodes on subvolume 'v0-disperse-56' are at (100.00 %), consider
adding more bricks.

message on client logs.My cluster is empty there are only a
couple
of
GB files for testing. Why this message appear in syslog?




dht uses disk usage information from backend export.

1. What is the out put of du -hs ? Please get
this
information for each of the brick that are part of disperse.
2. Once you get du information from each brick, the value seen by
dht
will
be based on how cluster/disperse aggregates du info (basically
statfs
fop).

The reason for 100% disk usage may be,
In case of 1, backend fs might be shared by data other than brick.
In case of 2, some issues with aggregation.


Is is safe to
ignore it?




dht will try not to have data files on the subvol in question
(v0-disperse-56). Hence lookup cost will be two hops for files
hashing
to
disperse-56 (note that other fops like read/write/open still have
the
cost
of 

Re: [Gluster-devel] Idea: Alternate Release process

2016-05-05 Thread Kaushal M
On Thu, May 5, 2016 at 11:48 AM, Aravinda  wrote:
> Hi,
>
> Sharing an idea to manage multiple releases without maintaining
> multiple release branches and backports.
>
> This idea is heavily inspired by the Rust release model(you may feel
> exactly same except the LTS part). I think Chrome/Firefox also follows
> the same model.
>
> http://blog.rust-lang.org/2014/10/30/Stability.html
>
> Feature Flag:
> --
> Compile time variable to prevent compiling featurerelated code when
> disabled. (For example, ./configure--disable-geo-replication
> or ./configure --disable-xml etc)
>
> Plan
> -
> - Nightly build with all the features enabled(./build --nightly)
>
> - All new patches will land in Master, if the patch belongs to a
>   existing feature then it should be written behind that feature flag.
>
> - If a feature is still work in progress then it will be only enabled in
>   nightly build and not enabled in beta or stable builds.
>   Once the maintainer thinks the feature is ready for testing then that
>   feature will be enabled in beta build.
>
> - Every 6 weeks, beta branch will be created by enabling all the
>   features which maintainers thinks it is stable and previous beta
>   branch will be promoted as stable.
>   All the previous beta features will be enabled in stable unless it
>   is marked as unstable during beta testing.
>
> - LTS builds are same as stable builds but without enabling all the
>   features. If we decide last stable build will become LTS release,
>   then the feature list from last stable build will be saved as
>   `features-release-.yaml`, For example:
>   features-release-3.9.yaml`
>   Same feature list will be used while building minor releases for the
>   LTS. For example, `./build --stable --features features-release-3.8.yaml`
>
> - Three branches, nightly/master, testing/beta, stable
>
> To summarize,
> - One stable release once in 6 weeks
> - One Beta release once in 6 weeks
> - Nightly builds every day
> - LTS release once in 6 months or 1 year, Minor releases once in 6 weeks.
>
> Advantageous:
> -
> 1. No more backports required to different release branches.(only
>exceptional backports, discussed below)
> 2. Non feature Bugfix will never get missed in releases.
> 3. Release process can be automated.
> 4. Bugzilla process can be simplified.
>
> Challenges:
> 
> 1. Enforcing Feature flag for every patch
> 2. Tests also should be behind feature flag
> 3. New release process
>
> Backports, Bug Fixes and Features:
> --
> - Release bug fix - Patch only to Master, which will be available in
>   next beta/stable build.
> - Urgent bug fix - Patch to Master and Backport to beta and stable
>   branch, and early release stable and beta build.
> - Beta bug fix - Patch to Master and Backport to Beta branch if urgent.
> - Security fix - Patch to Master, Beta and last stable branch and build
>   all LTS releases.
> - Features - Patch only to Master, which will be available in
>   stable/beta builds once feature becomes stable.
>
> FAQs:
> -
> - Can a feature development take more than one release cycle(6 weeks)?
> Yes, the feature will be enabled only in nightly build and not in
> beta/stable builds. Once the feature is complete mark it as
> stable so that it will be included in next beta build and stable
> build.
>
>
> ---
>
> Do you like the idea? Let me know what you guys think.
>

This reduces the number of versions that we need to maintain, which I like.
Having official test (beta) releases should help get features out to
testers hand faster,
and get quicker feedback.

One thing that's still not quite clear to is the issue of backwards
compatibility.
I'm still thinking it thorough and don't have a proper answer to this yet.
Would a new release be backwards compatible with the previous release?
Should we be maintaining compatibility with LTS releases with the
latest release?
With our current strategy, we at least have a long term release branch,
so we get some guarantees of compatibility with releases on the same branch.

As I understand the proposed approach, we'd be replacing a stable
branch with the beta branch.
So we don't have a long-term release branch (apart from LTS).
A user would be upgrading from one branch to another for every release.
Can we sketch out how compatibility would work in this case?

This approach work well for projects like Chromium and Firefox, single
system apps
 which generally don't need to be compatible with the previous release.
I don't understand how the Rust  project uses this (I am yet to read
the linked blog post),
as it requires some sort of backwards compatibility. But it too is a
single system app,
and doesn't have the compatibility problems we face.

Gluster is a distributed system, that can involve multiple different
versions interacting with each other.
This is something we need to think about.

We could work out some sort of a solution for this 

Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread Kaushal M
On Thu, May 5, 2016 at 2:36 PM, David Gossage
 wrote:
>
>
>
> On Thu, May 5, 2016 at 3:28 AM, Serkan Çoban  wrote:
>>
>> Hi,
>>
>> You can find the output below link:
>> https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0
>>
>> Thanks,
>> Serkan
>
>
> Maybe not issue, but playing one of these things is not like the other I
> notice of all the bricks only one seems to be different at a quick glance
>
> Brick: Brick 1.1.1.235:/bricks/20
> TCP Port : 49170
> RDMA Port: 0
> Online   : Y
> Pid  : 26736
> File System  : ext4
> Device   : /dev/mapper/vol0-vol_root
> Mount Options: rw,relatime,data=ordered
> Inode Size   : 256
> Disk Space Free  : 86.1GB
> Total Disk Space : 96.0GB
> Inode Count  : 6406144
> Free Inodes  : 6381374
>
> Every other brick seems to be 7TB and xfs but this one.

Looks like the brick fs isn't mounted, and the root-fs is being used
instead. But that still leaves enough inodes free.

What I suspect is that one of the cluster translators is mixing up
stats when aggregating from multiple bricks.
From the log snippet you gave in the first mail, it seems like the
disperse translator is possibly involved.

BTW, how large is the volume you have? Those are a lot of bricks!

~kaushal


>
>
>
>>
>>
>> On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez 
>> wrote:
>> > Can you post the result of 'gluster volume status v0 detail' ?
>> >
>> >
>> > On 05/05/16 06:49, Serkan Çoban wrote:
>> >>
>> >> Hi, Can anyone suggest something for this issue? df, du has no issue
>> >> for the bricks yet one subvolume not being used by gluster..
>> >>
>> >> On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban 
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I changed cluster.min-free-inodes to "0". Remount the volume on
>> >>> clients. inode full messages not coming to syslog anymore but I see
>> >>> disperse-56 subvolume still not being used.
>> >>> Anything I can do to resolve this issue? Maybe I can destroy and
>> >>> recreate the volume but I am not sure It will fix this issue...
>> >>> Maybe the disperse size 16+4 is too big should I change it to 8+2?
>> >>>
>> >>> On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban 
>> >>> wrote:
>> 
>>  I also checked the df output all 20 bricks are same like below:
>>  /dev/sdu1 7.3T 34M 7.3T 1% /bricks/20
>> 
>>  On Tue, May 3, 2016 at 1:40 PM, Raghavendra G
>>  
>>  wrote:
>> >
>> >
>> >
>> > On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban
>> > 
>> > wrote:
>> >>
>> >>
>> >>> 1. What is the out put of du -hs ? Please get
>> >>> this
>> >>> information for each of the brick that are part of disperse.
>> >
>> >
>> >
>> > Sorry. I needed df output of the filesystem containing brick. Not
>> > du.
>> > Sorry
>> > about that.
>> >
>> >>
>> >> There are 20 bricks in disperse-56 and the du -hs output is like:
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 1.8M /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >> 80K /bricks/20
>> >>
>> >> I see that gluster is not writing to this disperse set. All other
>> >> disperse sets are filled 13GB but this one is empty. I see
>> >> directory
>> >> structure created but no files in directories.
>> >> How can I fix the issue? I will try to rebalance but I don't think
>> >> it
>> >> will write to this disperse set...
>> >>
>> >>
>> >>
>> >> On Sat, Apr 30, 2016 at 9:22 AM, Raghavendra G
>> >> 
>> >> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Apr 29, 2016 at 12:32 AM, Serkan Çoban
>> >>> 
>> >>> wrote:
>> 
>> 
>>  Hi, I cannot get an answer from user list, so asking to devel
>>  list.
>> 
>>  I am getting [dht-diskusage.c:277:dht_is_subvol_filled] 0-v0-dht:
>>  inodes on subvolume 'v0-disperse-56' are at (100.00 %), consider
>>  adding more bricks.
>> 
>>  message on client logs.My cluster is empty there are only a
>>  couple
>>  of
>>  GB files for testing. Why this message appear in syslog?
>> >>>
>> >>>
>> >>>
>> >>> dht uses disk usage information from backend export.
>> >>>
>> >>> 1. What is the out put of du -hs ? 

Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread David Gossage
On Thu, May 5, 2016 at 3:28 AM, Serkan Çoban  wrote:

> Hi,
>
> You can find the output below link:
> https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0
>
> Thanks,
> Serkan
>

Maybe not issue, but playing one of these things is not like the other I
notice of all the bricks only one seems to be different at a quick glance

Brick: Brick 1.1.1.235:/bricks/20
TCP Port : 49170
RDMA Port: 0
Online   : Y
Pid  : 26736
File System  : ext4
Device   : /dev/mapper/vol0-vol_root
Mount Options: rw,relatime,data=ordered
Inode Size   : 256
Disk Space Free  : 86.1GB
Total Disk Space : 96.0GB
Inode Count  : 6406144
Free Inodes  : 6381374

Every other brick seems to be 7TB and xfs but this one.




>
> On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez 
> wrote:
> > Can you post the result of 'gluster volume status v0 detail' ?
> >
> >
> > On 05/05/16 06:49, Serkan Çoban wrote:
> >>
> >> Hi, Can anyone suggest something for this issue? df, du has no issue
> >> for the bricks yet one subvolume not being used by gluster..
> >>
> >> On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I changed cluster.min-free-inodes to "0". Remount the volume on
> >>> clients. inode full messages not coming to syslog anymore but I see
> >>> disperse-56 subvolume still not being used.
> >>> Anything I can do to resolve this issue? Maybe I can destroy and
> >>> recreate the volume but I am not sure It will fix this issue...
> >>> Maybe the disperse size 16+4 is too big should I change it to 8+2?
> >>>
> >>> On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban 
> >>> wrote:
> 
>  I also checked the df output all 20 bricks are same like below:
>  /dev/sdu1 7.3T 34M 7.3T 1% /bricks/20
> 
>  On Tue, May 3, 2016 at 1:40 PM, Raghavendra G <
> raghaven...@gluster.com>
>  wrote:
> >
> >
> >
> > On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban  >
> > wrote:
> >>
> >>
> >>> 1. What is the out put of du -hs ? Please get this
> >>> information for each of the brick that are part of disperse.
> >
> >
> >
> > Sorry. I needed df output of the filesystem containing brick. Not du.
> > Sorry
> > about that.
> >
> >>
> >> There are 20 bricks in disperse-56 and the du -hs output is like:
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 1.8M /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >> 80K /bricks/20
> >>
> >> I see that gluster is not writing to this disperse set. All other
> >> disperse sets are filled 13GB but this one is empty. I see directory
> >> structure created but no files in directories.
> >> How can I fix the issue? I will try to rebalance but I don't think
> it
> >> will write to this disperse set...
> >>
> >>
> >>
> >> On Sat, Apr 30, 2016 at 9:22 AM, Raghavendra G
> >> 
> >> wrote:
> >>>
> >>>
> >>>
> >>> On Fri, Apr 29, 2016 at 12:32 AM, Serkan Çoban
> >>> 
> >>> wrote:
> 
> 
>  Hi, I cannot get an answer from user list, so asking to devel
> list.
> 
>  I am getting [dht-diskusage.c:277:dht_is_subvol_filled] 0-v0-dht:
>  inodes on subvolume 'v0-disperse-56' are at (100.00 %), consider
>  adding more bricks.
> 
>  message on client logs.My cluster is empty there are only a couple
>  of
>  GB files for testing. Why this message appear in syslog?
> >>>
> >>>
> >>>
> >>> dht uses disk usage information from backend export.
> >>>
> >>> 1. What is the out put of du -hs ? Please get this
> >>> information for each of the brick that are part of disperse.
> >>> 2. Once you get du information from each brick, the value seen by
> dht
> >>> will
> >>> be based on how cluster/disperse aggregates du info (basically
> statfs
> >>> fop).
> >>>
> >>> The reason for 100% disk usage may be,
> >>> In case of 1, backend fs might be shared by data other than brick.
> >>> In case of 2, some issues with aggregation.
> >>>
>  Is is safe to
>  ignore it?
> >>>
> >>>
> >>>
> >>> dht will try not to have data files on the subvol in question
> >>> (v0-disperse-56). Hence lookup cost will be two hops for files
> >>> hashing
> >>> to
> 

[Gluster-devel] 3.8: Centos Regression Failure

2016-05-05 Thread Kotresh Hiremath Ravishankar
Hi

./tests/bugs/replicate/bug-977797.t fails in the following run.

https://build.gluster.org/job/rackspace-regression-2GB-triggered/20473/console

It succeeds in my local machine. It could be spurious.

Could someone from replication team look into it?


Thanks and Regards,
Kotresh H R

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

2016-05-05 Thread Xavier Hernandez

Can you post the result of 'gluster volume status v0 detail' ?

On 05/05/16 06:49, Serkan Çoban wrote:

Hi, Can anyone suggest something for this issue? df, du has no issue
for the bricks yet one subvolume not being used by gluster..

On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban  wrote:

Hi,

I changed cluster.min-free-inodes to "0". Remount the volume on
clients. inode full messages not coming to syslog anymore but I see
disperse-56 subvolume still not being used.
Anything I can do to resolve this issue? Maybe I can destroy and
recreate the volume but I am not sure It will fix this issue...
Maybe the disperse size 16+4 is too big should I change it to 8+2?

On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban  wrote:

I also checked the df output all 20 bricks are same like below:
/dev/sdu1 7.3T 34M 7.3T 1% /bricks/20

On Tue, May 3, 2016 at 1:40 PM, Raghavendra G  wrote:



On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban  wrote:



1. What is the out put of du -hs ? Please get this
information for each of the brick that are part of disperse.



Sorry. I needed df output of the filesystem containing brick. Not du. Sorry
about that.



There are 20 bricks in disperse-56 and the du -hs output is like:
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
1.8M /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20
80K /bricks/20

I see that gluster is not writing to this disperse set. All other
disperse sets are filled 13GB but this one is empty. I see directory
structure created but no files in directories.
How can I fix the issue? I will try to rebalance but I don't think it
will write to this disperse set...



On Sat, Apr 30, 2016 at 9:22 AM, Raghavendra G 
wrote:



On Fri, Apr 29, 2016 at 12:32 AM, Serkan Çoban 
wrote:


Hi, I cannot get an answer from user list, so asking to devel list.

I am getting [dht-diskusage.c:277:dht_is_subvol_filled] 0-v0-dht:
inodes on subvolume 'v0-disperse-56' are at (100.00 %), consider
adding more bricks.

message on client logs.My cluster is empty there are only a couple of
GB files for testing. Why this message appear in syslog?



dht uses disk usage information from backend export.

1. What is the out put of du -hs ? Please get this
information for each of the brick that are part of disperse.
2. Once you get du information from each brick, the value seen by dht
will
be based on how cluster/disperse aggregates du info (basically statfs
fop).

The reason for 100% disk usage may be,
In case of 1, backend fs might be shared by data other than brick.
In case of 2, some issues with aggregation.


Is is safe to
ignore it?



dht will try not to have data files on the subvol in question
(v0-disperse-56). Hence lookup cost will be two hops for files hashing
to
disperse-56 (note that other fops like read/write/open still have the
cost
of single hop and dont suffer from this penalty). Other than that there
is
no significant harm unless disperse-56 is really running out of space.

regards,
Raghavendra


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel





--
Raghavendra G

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel





--
Raghavendra G

___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Idea: Alternate Release process

2016-05-05 Thread Aravinda

Hi,

Sharing an idea to manage multiple releases without maintaining
multiple release branches and backports.

This idea is heavily inspired by the Rust release model(you may feel
exactly same except the LTS part). I think Chrome/Firefox also follows
the same model.

http://blog.rust-lang.org/2014/10/30/Stability.html

Feature Flag:
--
Compile time variable to prevent compiling featurerelated code when
disabled. (For example, ./configure--disable-geo-replication
or ./configure --disable-xml etc)

Plan
-
- Nightly build with all the features enabled(./build --nightly)

- All new patches will land in Master, if the patch belongs to a
  existing feature then it should be written behind that feature flag.

- If a feature is still work in progress then it will be only enabled in
  nightly build and not enabled in beta or stable builds.
  Once the maintainer thinks the feature is ready for testing then that
  feature will be enabled in beta build.

- Every 6 weeks, beta branch will be created by enabling all the
  features which maintainers thinks it is stable and previous beta
  branch will be promoted as stable.
  All the previous beta features will be enabled in stable unless it
  is marked as unstable during beta testing.

- LTS builds are same as stable builds but without enabling all the
  features. If we decide last stable build will become LTS release,
  then the feature list from last stable build will be saved as
  `features-release-.yaml`, For example:
  features-release-3.9.yaml`
  Same feature list will be used while building minor releases for the
  LTS. For example, `./build --stable --features features-release-3.8.yaml`

- Three branches, nightly/master, testing/beta, stable

To summarize,
- One stable release once in 6 weeks
- One Beta release once in 6 weeks
- Nightly builds every day
- LTS release once in 6 months or 1 year, Minor releases once in 6 weeks.

Advantageous:
-
1. No more backports required to different release branches.(only
   exceptional backports, discussed below)
2. Non feature Bugfix will never get missed in releases.
3. Release process can be automated.
4. Bugzilla process can be simplified.

Challenges:

1. Enforcing Feature flag for every patch
2. Tests also should be behind feature flag
3. New release process

Backports, Bug Fixes and Features:
--
- Release bug fix - Patch only to Master, which will be available in
  next beta/stable build.
- Urgent bug fix - Patch to Master and Backport to beta and stable
  branch, and early release stable and beta build.
- Beta bug fix - Patch to Master and Backport to Beta branch if urgent.
- Security fix - Patch to Master, Beta and last stable branch and build
  all LTS releases.
- Features - Patch only to Master, which will be available in
  stable/beta builds once feature becomes stable.

FAQs:
-
- Can a feature development take more than one release cycle(6 weeks)?
Yes, the feature will be enabled only in nightly build and not in
beta/stable builds. Once the feature is complete mark it as
stable so that it will be included in next beta build and stable
build.


---

Do you like the idea? Let me know what you guys think.

--
regards
Aravinda

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel