Re: [Gluster-devel] CentOS Regression generated core by .tests/basic/tier/tier-file-create.t

2016-03-09 Thread Pranith Kumar Karampuri



On 03/08/2016 08:09 PM, Pranith Kumar Karampuri wrote:
Sorry for the delay in responding. I am looking at this core. Will 
update with my findings/patches.


I think this is happening because dict-data is not guaranteed to have 
refs at the time of accessing it just because we have a ref on the dict. 
I still need to find the exact set of steps that is leading to the 
crash/hang.


Pranith


Pranith

On 03/08/2016 12:29 PM, Kotresh Hiremath Ravishankar wrote:

Hi All,

The regression run has caused the core to generate for below patch.

https://build.gluster.org/job/rackspace-regression-2GB-triggered/18859/console 



 From the initial analysis, it's a tiered setup where ec sub-volume 
is the cold tier and afr is the hot tier.
The crash has happened during lookup, the lookup is wound to 
cold-tier, since it is not present there, dht issued discover
onto hot-tier and while serializing dictionary, it found the 'data' 
is freed for the key 'trusted.ec.size'.


(gdb) bt
#0  0x7fe059df9772 in memcpy () from ./lib64/libc.so.6
#1  0x7fe05b209902 in dict_serialize_lk (this=0x7fe04809f7dc, 
buf=0x7fe0480a2b7c "") at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/dict.c:2533
#2  0x7fe05b20a182 in dict_allocate_and_serialize 
(this=0x7fe04809f7dc, buf=0x7fe04ef6bb08, length=0x7fe04ef6bb00) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/dict.c:2780
#3  0x7fe04e3492de in client3_3_lookup (frame=0x7fe0480a22dc, 
this=0x7fe048008c00, data=0x7fe04ef6bbe0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/protocol/client/src/client-rpc-fops.c:3368
#4  0x7fe04e32c8c8 in client_lookup (frame=0x7fe0480a22dc, 
this=0x7fe048008c00, loc=0x7fe0480a4354, xdata=0x7fe04809f7dc) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/protocol/client/src/client.c:417
#5  0x7fe04dbdaf5f in afr_lookup_do (frame=0x7fe04809f6dc, 
this=0x7fe048029e00, err=0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:2422
#6  0x7fe04dbdb4bb in afr_lookup (frame=0x7fe04809f6dc, 
this=0x7fe048029e00, loc=0x7fe03c0082f4, xattr_req=0x7fe03c00810c) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:2532
#7  0x7fe04de3c2b8 in dht_lookup (frame=0x7fe0480a0a3c, 
this=0x7fe04802c580, loc=0x7fe03c0082f4, xattr_req=0x7fe03c00810c) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:2429
#8  0x7fe04d91f07e in dht_lookup_everywhere 
(frame=0x7fe03c0081ec, this=0x7fe04802d450, loc=0x7fe03c0082f4) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:1803
#9  0x7fe04d920953 in dht_lookup_cbk (frame=0x7fe03c0081ec, 
cookie=0x7fe03c00902c, this=0x7fe04802d450, op_ret=-1, op_errno=2, 
inode=0x0, stbuf=0x0, xattr=0x0, postparent=0x0)
 at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:2056
#10 0x7fe04de35b94 in dht_lookup_everywhere_done 
(frame=0x7fe03c00902c, this=0x7fe0480288a0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:1338
#11 0x7fe04de38281 in dht_lookup_everywhere_cbk 
(frame=0x7fe03c00902c, cookie=0x7fe04809ed2c, this=0x7fe0480288a0, 
op_ret=-1, op_errno=2, inode=0x0, buf=0x0, xattr=0x0, postparent=0x0)
 at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:1768
#12 0x7fe05b27 in default_lookup_cbk (frame=0x7fe04809ed2c, 
cookie=0x7fe048099ddc, this=0x7fe048027590, op_ret=-1, op_errno=2, 
inode=0x0, buf=0x0, xdata=0x0, postparent=0x0) at defaults.c:1188
#13 0x7fe04e0a4861 in ec_manager_lookup (fop=0x7fe048099ddc, 
state=-5) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-generic.c:864
#14 0x7fe04e0a0b3a in __ec_manager (fop=0x7fe048099ddc, error=2) 
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-common.c:2098
#15 0x7fe04e09c912 in ec_resume (fop=0x7fe048099ddc, error=0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-common.c:289
#16 0x7fe04e09caf8 in ec_complete (fop=0x7fe048099ddc) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-common.c:362
#17 0x7fe04e0a41a8 in ec_lookup_cbk (frame=0x7fe04800107c, 
cookie=0x5, this=0x7fe048027590, op_ret=-1, op_errno=2, 
inode=0x7fe03c00152c, buf=0x7fe04ef6c860, xdata=0x0, 
postparent=0x7fe04ef6c7f0)
 at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-generic.c:758
#18 0x7fe04e348239 in client3_3_lookup_cbk (req=0x7fe04809dd4c, 
iov=0x7fe04809dd8c, count=1, myframe=0x7fe04809964c)
 at 
/home/je

Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

2016-03-13 Thread Pranith Kumar Karampuri



On 03/11/2016 10:16 PM, Jeff Darcy wrote:

Tier does send lookups serially, which fail on the hashed subvolumes of
dhts. Both of them trigger lookup_everywhere which is executed in epoll
threads, thus the they are executed in parallel.

According to your earlier description, items are being deleted by EC
(i.e. the cold tier) while AFR (i.e. the hot tier) is trying to access
the same dictionary.  That sounds pretty parallel across the two.  It
doesn't matter, though, because I think we agree that this solution is
too messy anyway.


(3) Enhance dict_t with a gf_lock_t that can be used to serialize
access.  We don't have to use the lock in every invocation of
dict_foreach (though we should probably investigate that).  For
now, we can just use it in the code paths we know are contending.

dict already has a lock.

Yes, we have a lock which is used in get/set/add/delete - but not in
dict_foreach for the reasons you mention.  I should have been clearer
that I was suggesting a *different* lock that's only used in this
case.  Manually locking with the lock we already have might not work
due to recursive locking, but the lock ordering with a separate
higher-level lock is pretty simple and it won't affect any other uses.
I didn't quite get it. Could you elaborate it please? The race is 
between 1) dict_set() and 2) dict_foreach()


Pranith



Xavi was mentioning that dict_copy_with_ref is too costly, which is
true, if we make this change it will be even more costly :-(.

There are probably MVCC-ish approaches that could be both safe and
performant, but they'd be quite complicated to implement.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] sub-directory geo-replication, snapshot features

2016-03-08 Thread Pranith Kumar Karampuri



On 03/09/2016 10:40 AM, Kaushal M wrote:

On Tue, Mar 8, 2016 at 11:58 PM, Atin Mukherjee <amukh...@redhat.com> wrote:


On 03/08/2016 07:32 PM, Pranith Kumar Karampuri wrote:

hi,
  Late last week I sent a solution for how to achieve
subdirectory-mount support with access-controls
(http://www.gluster.org/pipermail/gluster-devel/2016-March/048537.html).
What follows here is a short description of how other features of
gluster volumes are implemented for sub-directories.

Please note that the sub-directories are not allowed to be accessed by
normal mounts i.e. top-level volume mounts. All access to the
sub-directories goes only through sub-directory mounts.

Is this acceptable? If I have a,b,c sub directories in the volume and if
I mount the same volume in /mnt then do you mean to say I won't be able
to access /mnt/a or /mnt/b and I can only access them using sub
directory mounts? Or you are talking about some specific case here?

1) Geo-replication:
The direction in which we are going is to allow geo-replicating just
some sub-directories and not all of the volume based on options. When
these options are set, server xlators populate extra information in the
frames/xdata to write changelog for the fops coming from their
sub-directory mounts. changelog xlator on seeing this will only
geo-replicate the files/directories that are in the changelog. Thus only
the sub-directories are geo-replicated. There is also a suggestion from
Vijay and Aravinda to have separate domains for operations inside
sub-directories for changelogs.

2) Sub-directory snapshots using lvm
Every time a sub-directory needs to be created, Our idea is that the
admin needs to execute subvolume creation command which creates a mount
to an empty snapshot at the given sub-directory name. All these
directories can be modified in parallel and we can take individual
snapshots of each of the directories. We will be providing a detailed
list commands to do the same once they are fleshed out. At the moment
these are the directions we are going to increase granularity from
volume to subdirectory for the main features.

We use hardlinks to the `.glusterfs` directory on bricks. So wouldn't
having multiple filesystems inside a brick break the brick?


You are right. I think we will have to do full separation, where it will 
be more like multiple tenants :-/.




Also, I'd prefer if sub-directory mounts and sub-directory snapshots
remained separate, and not tied with each other. This mail gives the
feeling that they will be tied together.


With the above point, I don't think it will be.




Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

2016-03-11 Thread Pranith Kumar Karampuri

hi,
  I think this is the RCA for the issue:
Basically with distributed ec + disctributed replicate as cold, hot 
tiers. tier
sends a lookup which fails on ec. (By this time dict already 
contains ec
xattrs) After this lookup_everywhere code path is hit in tier which 
triggers
lookup on each of distribute's hash lookup but fails which leads to 
the cold,
hot dht's lookup_everywhere in two parallel epoll threads where in 
ec's thread it

tries to set trusted.ec.version/dirty/size in the dictionary, the older
values against the same key get erased. While this erasing is going 
on if the
thread that is doing lookup on afr's subvolume accesses these 
members either in
dict_copy_with_ref or client xlator trying to serialize, that can 
either lead
to crash or hang based on when the spin/mutex lock is called on 
invalid memory.


At the moment I sent http://review.gluster.org/13680 (I am pressed for 
time because I need to provide a build for our customer with a fix), 
which avoids parallel accesses of elements which step on each other.


Raghavendra G and I discussed about this problem and the right way to 
fix it is to take a copy(without dict_foreach) of the dictionary in 
dict_foreach inside a lock and then loop over the local dictionary. I am 
worried about the performance implication of this, so wondering if 
anyone has a better idea.


Also included Xavi, who earlier said we need to change dict.c but it is 
a bigger change. May be the time has come? I would love to gather all 
your inputs and implement a better version of dict if we need one.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] regarding write-behind default values for o-direct & flush behind

2016-03-16 Thread Pranith Kumar Karampuri

hi Raghavendra,
   Krutika showed me this code in write-behind about not honoring 
O_DIRECT. 1) Is there any reason why we do flush-behind by default and 
2) Not honor O_DIRECT in write-behind by default (strict-O_DIRECT option)?


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] regarding write-behind default values for o-direct & flush behind

2016-03-16 Thread Pranith Kumar Karampuri
flush-behind default is probably fine as without fsync there is no 
guarantee of file contents syncing to disk.


Pranith
On 03/16/2016 04:16 PM, Pranith Kumar Karampuri wrote:

hi Raghavendra,
   Krutika showed me this code in write-behind about not honoring 
O_DIRECT. 1) Is there any reason why we do flush-behind by default and 
2) Not honor O_DIRECT in write-behind by default (strict-O_DIRECT 
option)?


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] conflicting keys for option eager-lock

2016-04-13 Thread Pranith Kumar Karampuri



On 04/13/2016 11:58 AM, Pranith Kumar Karampuri wrote:



On 04/13/2016 09:15 AM, Vijay Bellur wrote:

On 04/08/2016 10:25 PM, Vijay Bellur wrote:

Hey Pranith, Ashish -

We have broken support for group virt after the following commit in
release-3.7:

commit 46920e3bd38d9ae7c1910d0bd83eff309ab20c66
Author: Ashish Pandey <aspan...@redhat.com>
Date:   Fri Mar 4 13:05:09 2016 +0530

 cluster/ec: Provide an option to enable/disable eager lock




Thinking more - do we need two different options to control eager 
lock behavior for afr and ec? cluster.eager-lock can be applicable 
for ec too as ec and afr are normally used in a mutually exclusive 
manner. Are we resorting to a different key for glusterd's op-version 
compatibility?


Tiering breaks all our assumptions, at the  moment we want afr to use 
eager-lock where as disperse to not for tiering, so it is better this 
way.


This is only till we fix the eager-locking behavior in disperse in some 
cases where it is not learning that other clients are competing for 
locks. We already have a solution in place. After which both can be 
enabled for things to work smooth.


Pranith


Pranith



Thanks,
Vijay


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] conflicting keys for option eager-lock

2016-04-13 Thread Pranith Kumar Karampuri



On 04/13/2016 05:10 PM, Vijay Bellur wrote:

On 04/13/2016 03:20 AM, Pranith Kumar Karampuri wrote:



On 04/13/2016 11:58 AM, Pranith Kumar Karampuri wrote:



On 04/13/2016 09:15 AM, Vijay Bellur wrote:

On 04/08/2016 10:25 PM, Vijay Bellur wrote:

Hey Pranith, Ashish -

We have broken support for group virt after the following commit in
release-3.7:

commit 46920e3bd38d9ae7c1910d0bd83eff309ab20c66
Author: Ashish Pandey <aspan...@redhat.com>
Date:   Fri Mar 4 13:05:09 2016 +0530

 cluster/ec: Provide an option to enable/disable eager lock




Thinking more - do we need two different options to control eager
lock behavior for afr and ec? cluster.eager-lock can be applicable
for ec too as ec and afr are normally used in a mutually exclusive
manner. Are we resorting to a different key for glusterd's op-version
compatibility?


Tiering breaks all our assumptions, at the  moment we want afr to use
eager-lock where as disperse to not for tiering, so it is better this
way.


This is only till we fix the eager-locking behavior in disperse in some
cases where it is not learning that other clients are competing for
locks. We already have a solution in place. After which both can be
enabled for things to work smooth.



Thanks, Pranith. In the interim can we also please fix the volume set 
help description for disperse.eager-lock?

Yes, that will be done.

Pranith


-Vijay



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Test-case "./tests/basic/tier/tier-file-create.t" hung

2016-04-12 Thread Pranith Kumar Karampuri



On 04/12/2016 05:54 PM, Jeff Darcy wrote:

tier can lead to parallel lookups in two different epoll threads on
hot/cold tiers. The race-window to hit the common-dictionary in lookup
use-after-free is too low without dict_copy_with_ref() in either ec/afr.
In either afr/ec side one thread should be executing dict_serialization
in client while the other thread should be doing dict_set(). With
dict_copy_with_ref() in ec probability to hit the issue is more. Once
the patch in afr is also merged, there is no race anymore. We still need
a neat way to fix this problem though. I mean at the dict infra level.

Thanks for the explanation, Pranith.  What kind of new dict API do you
think would solve this?
dict_copy_with_ref()/dict_foreach() are not race proof. but taking a 
lock while we go through the elements is slow. So we need a better way 
to do this without leading to any inconsistencies.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] CentOS Regression generated core by .tests/basic/tier/tier-file-create.t

2016-03-08 Thread Pranith Kumar Karampuri
Sorry for the delay in responding. I am looking at this core. Will 
update with my findings/patches.


Pranith

On 03/08/2016 12:29 PM, Kotresh Hiremath Ravishankar wrote:

Hi All,

The regression run has caused the core to generate for below patch.

https://build.gluster.org/job/rackspace-regression-2GB-triggered/18859/console

 From the initial analysis, it's a tiered setup where ec sub-volume is the cold 
tier and afr is the hot tier.
The crash has happened during lookup, the lookup is wound to cold-tier, since 
it is not present there, dht issued discover
onto hot-tier and while serializing dictionary, it found the 'data' is freed 
for the key 'trusted.ec.size'.

(gdb) bt
#0  0x7fe059df9772 in memcpy () from ./lib64/libc.so.6
#1  0x7fe05b209902 in dict_serialize_lk (this=0x7fe04809f7dc, buf=0x7fe0480a2b7c 
"") at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/dict.c:2533
#2  0x7fe05b20a182 in dict_allocate_and_serialize (this=0x7fe04809f7dc, 
buf=0x7fe04ef6bb08, length=0x7fe04ef6bb00) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/dict.c:2780
#3  0x7fe04e3492de in client3_3_lookup (frame=0x7fe0480a22dc, 
this=0x7fe048008c00, data=0x7fe04ef6bbe0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/protocol/client/src/client-rpc-fops.c:3368
#4  0x7fe04e32c8c8 in client_lookup (frame=0x7fe0480a22dc, 
this=0x7fe048008c00, loc=0x7fe0480a4354, xdata=0x7fe04809f7dc) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/protocol/client/src/client.c:417
#5  0x7fe04dbdaf5f in afr_lookup_do (frame=0x7fe04809f6dc, 
this=0x7fe048029e00, err=0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:2422
#6  0x7fe04dbdb4bb in afr_lookup (frame=0x7fe04809f6dc, 
this=0x7fe048029e00, loc=0x7fe03c0082f4, xattr_req=0x7fe03c00810c) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:2532
#7  0x7fe04de3c2b8 in dht_lookup (frame=0x7fe0480a0a3c, 
this=0x7fe04802c580, loc=0x7fe03c0082f4, xattr_req=0x7fe03c00810c) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:2429
#8  0x7fe04d91f07e in dht_lookup_everywhere (frame=0x7fe03c0081ec, 
this=0x7fe04802d450, loc=0x7fe03c0082f4) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:1803
#9  0x7fe04d920953 in dht_lookup_cbk (frame=0x7fe03c0081ec, 
cookie=0x7fe03c00902c, this=0x7fe04802d450, op_ret=-1, op_errno=2, inode=0x0, 
stbuf=0x0, xattr=0x0, postparent=0x0)
 at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:2056
#10 0x7fe04de35b94 in dht_lookup_everywhere_done (frame=0x7fe03c00902c, 
this=0x7fe0480288a0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:1338
#11 0x7fe04de38281 in dht_lookup_everywhere_cbk (frame=0x7fe03c00902c, 
cookie=0x7fe04809ed2c, this=0x7fe0480288a0, op_ret=-1, op_errno=2, inode=0x0, 
buf=0x0, xattr=0x0, postparent=0x0)
 at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/dht/src/dht-common.c:1768
#12 0x7fe05b27 in default_lookup_cbk (frame=0x7fe04809ed2c, 
cookie=0x7fe048099ddc, this=0x7fe048027590, op_ret=-1, op_errno=2, inode=0x0, 
buf=0x0, xdata=0x0, postparent=0x0) at defaults.c:1188
#13 0x7fe04e0a4861 in ec_manager_lookup (fop=0x7fe048099ddc, state=-5) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-generic.c:864
#14 0x7fe04e0a0b3a in __ec_manager (fop=0x7fe048099ddc, error=2) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-common.c:2098
#15 0x7fe04e09c912 in ec_resume (fop=0x7fe048099ddc, error=0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-common.c:289
#16 0x7fe04e09caf8 in ec_complete (fop=0x7fe048099ddc) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-common.c:362
#17 0x7fe04e0a41a8 in ec_lookup_cbk (frame=0x7fe04800107c, cookie=0x5, 
this=0x7fe048027590, op_ret=-1, op_errno=2, inode=0x7fe03c00152c, 
buf=0x7fe04ef6c860, xdata=0x0, postparent=0x7fe04ef6c7f0)
 at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/ec/src/ec-generic.c:758
#18 0x7fe04e348239 in client3_3_lookup_cbk (req=0x7fe04809dd4c, 
iov=0x7fe04809dd8c, count=1, myframe=0x7fe04809964c)
 at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/protocol/client/src/client-rpc-fops.c:3028
#19 0x7fe05afd83e6 in rpc_clnt_handle_reply (clnt=0x7fe048066350, 
pollin=0x7fe0480018f0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:759
#20 

[Gluster-devel] sub-directory geo-replication, snapshot features

2016-03-08 Thread Pranith Kumar Karampuri

hi,
 Late last week I sent a solution for how to achieve 
subdirectory-mount support with access-controls 
(http://www.gluster.org/pipermail/gluster-devel/2016-March/048537.html). 
What follows here is a short description of how other features of 
gluster volumes are implemented for sub-directories.


Please note that the sub-directories are not allowed to be accessed by 
normal mounts i.e. top-level volume mounts. All access to the 
sub-directories goes only through sub-directory mounts.


1) Geo-replication:
The direction in which we are going is to allow geo-replicating just 
some sub-directories and not all of the volume based on options. When 
these options are set, server xlators populate extra information in the 
frames/xdata to write changelog for the fops coming from their 
sub-directory mounts. changelog xlator on seeing this will only 
geo-replicate the files/directories that are in the changelog. Thus only 
the sub-directories are geo-replicated. There is also a suggestion from 
Vijay and Aravinda to have separate domains for operations inside 
sub-directories for changelogs.


2) Sub-directory snapshots using lvm
Every time a sub-directory needs to be created, Our idea is that the 
admin needs to execute subvolume creation command which creates a mount 
to an empty snapshot at the given sub-directory name. All these 
directories can be modified in parallel and we can take individual 
snapshots of each of the directories. We will be providing a detailed 
list commands to do the same once they are fleshed out. At the moment 
these are the directions we are going to increase granularity from 
volume to subdirectory for the main features.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Default quorum for 2 way replication

2016-03-04 Thread Pranith Kumar Karampuri

hi,
 So far default quorum for 2-way replication is 'none' (i.e. 
files/directories may go into split-brain) and for 3-way replication and 
arbiter based replication it is 'auto' (files/directories won't go into 
split-brain). There are requests to make default as 'auto' for 2-way 
replication as well. The line of reasoning is that people value data 
integrity (files not going into split-brain) more than HA (operation of 
mount even when bricks go down). And admins should explicitly change it 
to 'none' when they are fine with split-brains in 2-way replication. We 
were wondering if you have any inputs about what is a sane default for 
2-way replication.


I like the default to be 'none'. Reason: If we have 'auto' as quorum for 
2-way replication and first brick dies, there is no HA. If users are 
fine with it, it is better to use plain distribute volume rather than 
replication with quorum as 'auto'. What are your thoughts on the matter? 
Please guide us in the right direction.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Default quorum for 2 way replication

2016-03-04 Thread Pranith Kumar Karampuri



On 03/04/2016 05:47 PM, Bipin Kunal wrote:

HI Pranith,

Thanks for starting this mail thread.

Looking from a user perspective most important is to get a "good copy"
of data.  I agree that people use replication for HA but having stale
data with HA will not have any value.
So I will suggest to make auto quorum as default configuration even
for 2-way replication.

If user is willing to lose data at the cost of HA, he always have
option disable it. But default preference should be data and its
integrity.


That is the point. There is an illusion of choice between Data integrity 
and HA. But we are not *really* giving HA, are we? HA will be there only 
if second brick in the replica pair goes down. In your typical 
deployment, we can't really give any guarantees about what brick will go 
down when. So I am not sure if we can consider it as HA. But I would 
love to hear what others have to say about this as well. If majority of 
users say they need it to be auto, you will definitely see a patch :-).


Pranith


Thanks,
Bipin Kunal

On Fri, Mar 4, 2016 at 5:43 PM, Ravishankar N <ravishan...@redhat.com> wrote:

On 03/04/2016 05:26 PM, Pranith Kumar Karampuri wrote:

hi,
  So far default quorum for 2-way replication is 'none' (i.e.
files/directories may go into split-brain) and for 3-way replication and
arbiter based replication it is 'auto' (files/directories won't go into
split-brain). There are requests to make default as 'auto' for 2-way
replication as well. The line of reasoning is that people value data
integrity (files not going into split-brain) more than HA (operation of
mount even when bricks go down). And admins should explicitly change it to
'none' when they are fine with split-brains in 2-way replication. We were
wondering if you have any inputs about what is a sane default for 2-way
replication.

I like the default to be 'none'. Reason: If we have 'auto' as quorum for
2-way replication and first brick dies, there is no HA.



+1.  Quorum does not make sense when there are only 2 parties. There is no
majority voting. Arbiter volumes are a better option.
If someone wants some background, please see 'Client quorum' and 'Replica 2
and Replica 3 volumes' section of
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

-Ravi


If users are fine with it, it is better to use plain distribute volume
rather than replication with quorum as 'auto'. What are your thoughts on the
matter? Please guide us in the right direction.

Pranith





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Fuse Subdirectory mounts, access-control

2016-03-02 Thread Pranith Kumar Karampuri

Hi,
  This mail explains the initial design about how this will happen.

Administrators are going to create a directory on the volume with normal 
fuse-mount(Or any other mounts) let's call it 'subdir1'.
Administrator will create auth-allow/reject options with the 
ip/addresses he chooses to grant the access-control to given set of 
machines.
Mount command is executed for the volume 'vol', for subdirectory 
'subdir1' with the following command:

mount -t glusterfs server1:/vol/subdir1 /mnt

When this command is executed, volfile is requested with volfile-id 
'/vol/subdir1'
Glusterd on seeing this volfile-id will generate the client xlator with 
remote-subvolume appending '/subdir1'


When graph initialization on fuse mount happens, client xlator sends 
setvolume with the remote-subvolume which has extra '/subdir1' at the 
end. Server xlator will do the access-control checks based on if this ip 
has access for the subdir1 based on the configuration. If setvolume is 
successful, server xlator sends gfid of the '/subdir1' in the response 
for setvolume. Client xlator sends this in CHILD_UP notification. Fuse 
mount sets this gfid as root_gfid and does a resolution by sending 
lookup fop.


Some of the things we are not clear about:
1) Should acls be set based on paths/gfids of the directories?
2) If answer to 1) is based on paths, what should happen if the 
directories are renamed?


Pranith & Kaushal
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Default quorum for 2 way replication

2016-03-04 Thread Pranith Kumar Karampuri



On 03/04/2016 10:05 PM, Diego Remolina wrote:

I run a few two node glusterfs instances, but always have a third
machine acting as an arbiter. I am with Jeff on this one, better safe
than sorry.

Setting up a 3rd system without bricks to achieve quorum is very easy.


This is server side quorum. This is good. But what we are discussing 
here is for just 2 nodes, what should be the default.


Pranith


Diego



On Fri, Mar 4, 2016 at 10:40 AM, Jeff Darcy  wrote:

I like the default to be 'none'. Reason: If we have 'auto' as quorum for
2-way replication and first brick dies, there is no HA. If users are
fine with it, it is better to use plain distribute volume

"Availability" is a tricky word.  Does it mean access to data now, or
later despite failure?  Taking a volume down due to loss of quorum might
be equivalent to having no replication in the first sense, but certainly
not in the second.  When the possibility (likelihood?) of split brain is
considered, enforcing quorum actually does a *better* job of preserving
availability in the second sense.  I believe this second sense is most
often what users care about, and therefore quorum enforcement should be
the default.

I think we all agree that quorum is a bit slippery when N=2.  That's
where there really is a tradeoff between (immediate) availability and
(highest levels of) data integrity.  That's why arbiters showed up first
in the NSR specs, and later in AFR.  We should definitely try to push
people toward N>=3 as much as we can.  However, the ability to "scale
down" is one of the things that differentiate us vs. both our Ceph
cousins and our true competitors.  Many of our users will stop at N=2 no
matter what we say.  However unwise that might be, we must still do what
we can to minimize harm when things go awry.
___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Default quorum for 2 way replication

2016-03-05 Thread Pranith Kumar Karampuri



On 03/04/2016 08:36 PM, Shyam wrote:

On 03/04/2016 07:30 AM, Pranith Kumar Karampuri wrote:



On 03/04/2016 05:47 PM, Bipin Kunal wrote:

HI Pranith,

Thanks for starting this mail thread.

Looking from a user perspective most important is to get a "good copy"
of data.  I agree that people use replication for HA but having stale
data with HA will not have any value.
So I will suggest to make auto quorum as default configuration even
for 2-way replication.

If user is willing to lose data at the cost of HA, he always have
option disable it. But default preference should be data and its
integrity.


I think we need to consider *maintenance* activities on the volume, 
like replacing a brick in a replica pair, or upgrading one half of the 
replica and then the other, at which time the replica group would 
function read-only, if we choose 'auto' in a 2-way replicated state, 
is this correct?


Yes.



Having said the above, we already have the option in place, right? I.e 
admins can already choose 'auto', it is just the default that we are 
discussing. This could also be tackled via documentation/best 
practices ("yeah right! who reads those again?" is a valid comment here).


Yes. I just sent a reply to Jeff, where I told it is better to have 
interactive question at the time of creating 2-way replica volume which 
gives this information :-).




I guess we need to be clear (in documentation or otherwise) what they 
get when they choose one over the other (like the HA point below and 
also upgrade concerns etc.), irrespective of how this discussion ends 
(just my 2 c's).


Totally agree. We will give an interactive question above, a link which 
gives detailed explanation.






That is the point. There is an illusion of choice between Data integrity
and HA. But we are not *really* giving HA, are we? HA will be there only
if second brick in the replica pair goes down. In your typical


@Pranith, can you elaborate on this? I am not so AFR savvy, so unable 
to comprehend why HA is available if only when the second brick goes 
down and is not when the first does. Just helps in understanding the 
issue at hand.


Because it is client side replication there is a fixed *leader* i.e. 1st 
brick.


As a side note. We recently had a discussion with NSR team (Jeff, avra). 
We will be using some infra for NSR to implement server side afr as well 
with leader election etc.


Pranith



deployment, we can't really give any guarantees about what brick will go
down when. So I am not sure if we can consider it as HA. But I would
love to hear what others have to say about this as well. If majority of
users say they need it to be auto, you will definitely see a patch :-).

Pranith


Thanks,
Bipin Kunal

On Fri, Mar 4, 2016 at 5:43 PM, Ravishankar N <ravishan...@redhat.com>
wrote:

On 03/04/2016 05:26 PM, Pranith Kumar Karampuri wrote:

hi,
  So far default quorum for 2-way replication is 'none' (i.e.
files/directories may go into split-brain) and for 3-way replication
and
arbiter based replication it is 'auto' (files/directories won't go 
into

split-brain). There are requests to make default as 'auto' for 2-way
replication as well. The line of reasoning is that people value data
integrity (files not going into split-brain) more than HA 
(operation of

mount even when bricks go down). And admins should explicitly change
it to
'none' when they are fine with split-brains in 2-way replication. We
were
wondering if you have any inputs about what is a sane default for 
2-way

replication.

I like the default to be 'none'. Reason: If we have 'auto' as quorum
for
2-way replication and first brick dies, there is no HA.



+1.  Quorum does not make sense when there are only 2 parties. There
is no
majority voting. Arbiter volumes are a better option.
If someone wants some background, please see 'Client quorum' and
'Replica 2
and Replica 3 volumes' section of
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/ 




-Ravi

If users are fine with it, it is better to use plain distribute 
volume

rather than replication with quorum as 'auto'. What are your
thoughts on the
matter? Please guide us in the right direction.

Pranith





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Default quorum for 2 way replication

2016-03-05 Thread Pranith Kumar Karampuri



On 03/04/2016 09:10 PM, Jeff Darcy wrote:

I like the default to be 'none'. Reason: If we have 'auto' as quorum for
2-way replication and first brick dies, there is no HA. If users are
fine with it, it is better to use plain distribute volume

"Availability" is a tricky word.  Does it mean access to data now, or
later despite failure?  Taking a volume down due to loss of quorum might
be equivalent to having no replication in the first sense, but certainly
not in the second.  When the possibility (likelihood?) of split brain is
considered, enforcing quorum actually does a *better* job of preserving
availability in the second sense.  I believe this second sense is most
often what users care about, and therefore quorum enforcement should be
the default.

I think we all agree that quorum is a bit slippery when N=2.  That's
where there really is a tradeoff between (immediate) availability and
(highest levels of) data integrity.  That's why arbiters showed up first
in the NSR specs, and later in AFR.  We should definitely try to push
people toward N>=3 as much as we can.  However, the ability to "scale
down" is one of the things that differentiate us vs. both our Ceph
cousins and our true competitors.  Many of our users will stop at N=2 no
matter what we say.  However unwise that might be, we must still do what
we can to minimize harm when things go awry.
I always felt 2-way replication, 3-way replication analogy is similar to 
2-wheeler(motor-bikes) and 4-wheeler vehicles(cars). You have more fatal 
accidents with 2-wheelers than 4-wheelers. But it has its place. Arbiter 
volumes is like a 3-wheeler(auto rickshaw) :-). I feel users should be 
given the power to choose what they want based on what they are looking 
for and how much hardware they want to buy (affordability). We should 
educate them about the risks but the final decision should be theirs. So 
in that sense I don't like to *push* them to N>=3.


   "Many of our users will stop at N=2 no matter what we say". That 
right there is what I had to realize, some years back. I naively thought 
that people will rush to replica-3 with client quorum, but it didn't 
happen. That is the reason for investing time in arbiter volumes as a 
solution. Because we wanted to reduce the cost. People didn't want to 
spend so much money for consistency(based on what we are still seeing). 
Fact of the matter is, even after arbiter volumes I am sure some people 
will stick with replica-2 with unsplit-brain patch from facebook (For 
people who don't know: it resolves split-brain based on policies 
automatically without human intervention, it will be available soon in 
gluster). You do have a very good point though. I think it makes sense 
to make more people aware of what they are getting into with 2-way 
replication. So may be an interactive question at the time of 2-way 
replica volume creation about the possibility of split-brains and 
availability of other options(like arbiter/unsplit-brain in 2-way 
replication) could be helpful, keeping the default still as 'none'. I 
think it would be better if we educate users about value of arbiter 
volumes, so that users naturally progress towards that and embrace it. 
We are seeing more and more questions on the IRC and mailing list about 
arbiter volumes, so there is a +ve trend.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Does we know about this crash in tier test case?

2016-04-05 Thread Pranith Kumar Karampuri
On tests/bugs/tier/bug-1286974.t, I see the following crash for the run: 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/19421/consoleFull


#1  0x7f41e15ad2bb in syncenv_task (proc=0xca5540) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:607

env = 0xca5540
task = 0x0
sleep_till = {tv_sec = 1459417079, tv_nsec = 0}
ret = 0
#2  0x7f41e15ad562 in syncenv_processor (thdata=0xca5540) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:699

env = 0xca5540
proc = 0xca5540
task = 0x0
#3  0x7f41e0835aa1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x7f41e019e93d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 8 (Thread 0x7f41d52b3700 (LWP 15648)):
#0  0x7f41e019ef33 in epoll_wait () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f41e15ceda8 in event_dispatch_epoll_worker (data=0xcca510) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:668
event = {events = 1073741827, data = {ptr = 0x10008, fd = 
8, u32 = 8, u64 = 4294967304}}

ret = 0
ev_data = 0xcca510
event_pool = 0xc8bc90
myindex = 1
timetodie = 0
__FUNCTION__ = "event_dispatch_epoll_worker"
#2  0x7f41e0835aa1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x7f41e019e93d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 7 (Thread 0x7f41d74ec700 (LWP 15641)):
#0  0x7f41e0839a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

No symbol table info available.
#1  0x7f41e15ad2bb in syncenv_task (proc=0xca5900) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:607

env = 0xca5540
task = 0x0
sleep_till = {tv_sec = 1459417079, tv_nsec = 0}
ret = 0
#2  0x7f41e15ad562 in syncenv_processor (thdata=0xca5900) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:699

env = 0xca5540
proc = 0xca5900
task = 0x0
#3  0x7f41e0835aa1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x7f41e019e93d in clone () from /lib64/libc.so.6
No symbol table info available.

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Does we know about this crash in tier test case?

2016-04-05 Thread Pranith Kumar Karampuri

I meant *Do* we. Sorry for the typo!

On 04/05/2016 11:33 AM, Pranith Kumar Karampuri wrote:
On tests/bugs/tier/bug-1286974.t, I see the following crash for the 
run: 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/19421/consoleFull


#1  0x7f41e15ad2bb in syncenv_task (proc=0xca5540) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:607

env = 0xca5540
task = 0x0
sleep_till = {tv_sec = 1459417079, tv_nsec = 0}
ret = 0
#2  0x7f41e15ad562 in syncenv_processor (thdata=0xca5540) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:699

env = 0xca5540
proc = 0xca5540
task = 0x0
#3  0x7f41e0835aa1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x7f41e019e93d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 8 (Thread 0x7f41d52b3700 (LWP 15648)):
#0  0x7f41e019ef33 in epoll_wait () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f41e15ceda8 in event_dispatch_epoll_worker (data=0xcca510) 
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:668
event = {events = 1073741827, data = {ptr = 0x10008, fd = 
8, u32 = 8, u64 = 4294967304}}

ret = 0
ev_data = 0xcca510
event_pool = 0xc8bc90
myindex = 1
timetodie = 0
__FUNCTION__ = "event_dispatch_epoll_worker"
#2  0x7f41e0835aa1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x7f41e019e93d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 7 (Thread 0x7f41d74ec700 (LWP 15641)):
#0  0x7f41e0839a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

No symbol table info available.
#1  0x7f41e15ad2bb in syncenv_task (proc=0xca5900) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:607

env = 0xca5540
task = 0x0
sleep_till = {tv_sec = 1459417079, tv_nsec = 0}
ret = 0
#2  0x7f41e15ad562 in syncenv_processor (thdata=0xca5900) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/syncop.c:699

env = 0xca5540
proc = 0xca5900
task = 0x0
#3  0x7f41e0835aa1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x7f41e019e93d in clone () from /lib64/libc.so.6
No symbol table info available.

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Core generated by trash.t

2016-04-22 Thread Pranith Kumar Karampuri
+Krutika

- Original Message -
> From: "Anoop C S" <anoo...@redhat.com>
> To: "Atin Mukherjee" <amukh...@redhat.com>
> Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, "Ravishankar N" 
> <ravishan...@redhat.com>, "Anuradha Talur"
> <ata...@redhat.com>, gluster-devel@gluster.org
> Sent: Friday, April 22, 2016 2:14:28 PM
> Subject: Re: [Gluster-devel] Core generated by trash.t
> 
> On Wed, 2016-04-20 at 16:24 +0530, Atin Mukherjee wrote:
> > I should have said the regression link is irrelevant here. Try
> > running
> > this test on your local setup multiple times on mainline. I do
> > believe
> > you should see the crash.
> > 
> 
> I could see coredump on running trash.t multiple times in a while loop.
> Info from coredump:
> 
> Core was generated by `/usr/local/sbin/glusterfs -s localhost --
> volfile-id gluster/glustershd -p /var/'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0040bd31 in glusterfs_handle_translator_op
> (req=0x7feab8001dec) at glusterfsd-mgmt.c:590
> 590   any = active->first;
> [Current thread is 1 (Thread 0x7feac1657700 (LWP 12050))]
> (gdb) l
> 585   goto out;
> 586   }
> 587
> 588   ctx = glusterfsd_ctx;
> 589   active = ctx->active;
> 590   any = active->first;
> 591   input = dict_new ();
> 592   ret = dict_unserialize (xlator_req.input.input_val,
> 593   xlator_req.input.input_len,
> 594   );
> (gdb) p ctx
> $1 = (glusterfs_ctx_t *) 0x7fa010
> (gdb) p ctx->active
> $2 = (glusterfs_graph_t *) 0x0

I think this is because the request came to shd even before the graph is 
intialized? Thanks for the test case. I will take a look at this.

Pranith
> (gdb) p *req
> $1 = {trans = 0x7feab8000e20, svc = 0x83ca50, prog = 0x874810, xid = 1,
> prognum = 4867634, progver = 2, procnum = 3, type = 0, uid = 0, gid =
> 0, pid = 0, lk_owner = {len = 4,
> data = '\000' }, gfs_id = 0, auxgids =
> 0x7feab800223c, auxgidsmall = {0 }, auxgidlarge =
> 0x0, auxgidcount = 0, msg = {{iov_base = 0x7feacc253840,
>   iov_len = 488}, {iov_base = 0x0, iov_len = 0}  times>}, count = 1, iobref = 0x7feab8000c40, rpc_status = 0, rpc_err =
> 0, auth_err = 0, txlist = {next = 0x7feab800256c,
> prev = 0x7feab800256c}, payloadsize = 0, cred = {flavour = 390039,
> datalen = 24, authdata = '\000' , "\004", '\000'
> }, verf = {flavour = 0,
> datalen = 0, authdata = '\000' }, synctask =
> _gf_true, private = 0x0, trans_private = 0x0, hdr_iobuf = 0x82b038,
> reply = 0x0}
> (gdb) p req->procnum
> $3 = 3 <== GLUSTERD_BRICK_XLATOR_OP
> (gdb) t a a bt
> 
> Thread 6 (Thread 0x7feabf178700 (LWP 12055)):
> #0  0x7feaca522043 in epoll_wait () at ../sysdeps/unix/syscall-
> template.S:84
> #1  0x7feacbe5076f in event_dispatch_epoll_worker (data=0x878130)
> at event-epoll.c:664
> #2  0x7feacac4560a in start_thread (arg=0x7feabf178700) at
> pthread_create.c:334
> #3  0x7feaca521a4d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 5 (Thread 0x7feac2659700 (LWP 12048)):
> #0  do_sigwait (sig=0x7feac2658e3c, set=) at
> ../sysdeps/unix/sysv/linux/sigwait.c:64
> #1  __sigwait (set=, sig=0x7feac2658e3c) at
> ../sysdeps/unix/sysv/linux/sigwait.c:96
> #2  0x00409895 in glusterfs_sigwaiter (arg=0x7ffe3debbf00) at
> glusterfsd.c:2032
> #3  0x7feacac4560a in start_thread (arg=0x7feac2659700) at
> pthread_create.c:334
> #4  0x7feaca521a4d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 4 (Thread 0x7feacc2b4780 (LWP 12046)):
> #0  0x7feacac466ad in pthread_join (threadid=140646205064960,
> thread_return=0x0) at pthread_join.c:90
> #1  0x7feacbe509bb in event_dispatch_epoll (event_pool=0x830b80) at
> event-epoll.c:758
> #2  0x7feacbe17a91 in event_dispatch (event_pool=0x830b80) at
> event.c:124
> #3  0x0040a3c8 in main (argc=13, argv=0x7ffe3debd0f8) at
> glusterfsd.c:2376
> 
> Thread 3 (Thread 0x7feac2e5a700 (LWP 12047)):
> #0  0x7feacac4e27d in nanosleep () at ../sysdeps/unix/syscall-
> template.S:84
> #1  0x7feacbdfc152 in gf_timer_proc (ctx=0x7fa010) at timer.c:188
> #2  0x7feacac4560a in start_thread (arg=0x7feac2e5a700) at
> pthread_create.c:334
> #3  0x7feaca521a4d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 2 (Thread 0x7feac1e58700 (LWP 12049)):
> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
> #

Re: [Gluster-devel] Need inputs in multi-threaded self-heal option name change

2016-05-11 Thread Pranith Kumar Karampuri
That sounds better. I will wait till evening for more suggestions and
change the name :-).

Pranith

On Thu, May 12, 2016 at 8:38 AM, Paul Cuzner <pcuz...@redhat.com> wrote:

> cluster.shd-max-heals  ... would work for me :)
>
> On Thu, May 12, 2016 at 3:04 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> hi
>>For multi-threaded self-heal, we have introduced new option called
>> cluster.shd-max-threads, which is confusing people who think as many new
>> threads are going to be launched to perform heals where as all it does is
>> increase number of parallel heals in multi-tasking by syncop framework. So
>> I am thinking a better name could be 'cluster.shd-num-parallel-heals' which
>> is a bit lengthy. Wondering if anyone has better suggestions.
>>
>> Pranith
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Need inputs in multi-threaded self-heal option name change

2016-05-11 Thread Pranith Kumar Karampuri
hi
   For multi-threaded self-heal, we have introduced new option called
cluster.shd-max-threads, which is confusing people who think as many new
threads are going to be launched to perform heals where as all it does is
increase number of parallel heals in multi-tasking by syncop framework. So
I am thinking a better name could be 'cluster.shd-num-parallel-heals' which
is a bit lengthy. Wondering if anyone has better suggestions.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Bitrot Review Request

2016-05-02 Thread Pranith Kumar Karampuri
On Fri, Apr 29, 2016 at 12:37 PM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Hi Pranith,
>
> You had a concern of consuming I/O threads when bit-rot uses rchecksum
> interface to
> signing, normal scrubbing and on-demand scrubbing with tiering.
>
>
> http://review.gluster.org/#/c/13833/5/xlators/storage/posix/src/posix.c
>
> As discussed over comments, the concern is valid and the above patch is
> not being
> taken in and would be abandoned.
>
> I have the following patch where the signing and normal scrubbing would not
> consume io-threads. Only the on-demand scrubbing consumes io-threads. I
> think
> this should be fine as tiering is single threaded and only consumes
> one I/O thread (as told by Joseph on PatchSet 6).
>
>   http://review.gluster.org/#/c/13969/


I have a feeling that even this will become multi-threaded just like
rebalance/self-heal have become. How do we future proof it?

Pranith


>
>
> Since, on-demand scrubbing is disabled by default and there is a size cap
> and
> we document to increase the default number of I/O threads, consuming one
> I/O
> thread for scrubbing would be fine I guess.
>
> Let me know your thoughts.
>
> Thanks and Regards,
> Kotresh H R
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-22 Thread Pranith Kumar Karampuri
On Fri, Jul 22, 2016 at 7:39 PM, Nithya Balachandran 
wrote:

>
>
> On Fri, Jul 22, 2016 at 7:31 PM, Jeff Darcy  wrote:
>
>> > I attempted to get us more space on NetBSD by creating a new partition
>> called
>> > /data and putting /build as a symlink to /data/build. This has caused
>> > problems
>> > with tests/basic/quota.t. It's marked as bad for master, but not for
>> > release-3.7. This is possibly because we have a hard-coded grep for
>> > /build/install against df -h.
>>
>> For the benefit of anyone else looking at this, the grep actually seems
>> to be
>> in volume.rc and not in the test itself.
>>
>
> That's right -  it appears to have been done to exclude the install path
> components from the df output which is what is being done to find the aux
> mount. Is there a better way to figure out if the aux mount is running?
>
>>
>> > Nithya has spent the last 2 days debugging
>> > without much success. What's a good way forward here? Mark the test as
>> > failing for 3.7?
>>
>
> Right. Something went wrong with the system and it refused to run the
> tests after a while.
>
>
>>
>> I don't think so.  There are 13 tests that use the affected function
>> (get_aux).  Do we want to disable 13 tests?  I think we actually need
>> to fix the function instead.  It seems to me that the check we're
>> making is very hacky in two ways:
>>
>>Checking for both /run and /var/run instead of using GLUSTERD_WORKDIR
>>
>>Excluding /build/install for no obvious reason at all
>>
>
> This looks like it was done to remove the /build/install components from
> the df -h outputs. Changing the path to /data/build/install broke this as
> it did not strip the "/data" from the paths.
> It did work when I changed the sed to act on /data/build/install but
> hardcoded paths are not a good approach.
>

Give me some time, I can send out a patch to print out the default run
directory if that helps?
something similar to 'gluster --print-logdir'. What shall we call this?
'gluster --print-rundir'? it will


>
>> These auxiliary mounts should be in a much more specific place, and we
>> should check for that instead of looking for any that might exist.  Who
>> knows where that place is?  I've copied Raghavendra G as the quota
>> maintainer, since that seems like our best bet.
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-22 Thread Pranith Kumar Karampuri
On Fri, Jul 22, 2016 at 4:46 PM, Nigel Babu  wrote:

> Hello,
>
> I attempted to get us more space on NetBSD by creating a new partition
> called
> /data and putting /build as a symlink to /data/build. This has caused
> problems
> with tests/basic/quota.t. It's marked as bad for master, but not for
> release-3.7. This is possibly because we have a hard-coded grep for
> /build/install against df -h. Nithya has spent the last 2 days debugging
> without much success. What's a good way forward here? Mark the test as
> failing
> for 3.7?
>

Dude, she spent only 2 hours on this. Yesterday was another netbsd problem.
She is not that bad you know :-P.


>
> We're going to have to accept that some of our tests might be in a
> different
> drive as we try to get more disk space for our machines. How can we make
> our
> tests more resilient from breakage due to regular expressions?
>
> --
> nigelb
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-22 Thread Pranith Kumar Karampuri
On Fri, Jul 22, 2016 at 8:12 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> I am playing with the following diff, let me see.
>
> diff --git a/tests/volume.rc b/tests/volume.rc
> index 331a802..b288508 100644
> --- a/tests/volume.rc
> +++ b/tests/volume.rc
> @@ -579,7 +579,9 @@ function num_graphs
>  function get_aux()
>  {
>  ##Check if a auxiliary mount is there
> -df -h 2>&1 | sed 's#/build/install##' | grep -e
> "[[:space:]]/run/gluster/${V0}$" -e "[[:space:]]/var/run/gluster/${V0}$" -
> +local rundir=$(gluster --print-statedumpdir)
> +local pid=$(cat ${rundir}/${V0}.pid)
> +pidof glusterfs 2>&1 | grep -w $pid
>
>  if [ $? -eq 0 ]
>  then
>

Based on what I saw in code, this seems to get the job done. Comments
welcome:
http://review.gluster.org/14988


>
> On Fri, Jul 22, 2016 at 7:44 PM, Nithya Balachandran <nbala...@redhat.com>
> wrote:
>
>>
>>
>> On Fri, Jul 22, 2016 at 7:42 PM, Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Jul 22, 2016 at 7:39 PM, Nithya Balachandran <
>>> nbala...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jul 22, 2016 at 7:31 PM, Jeff Darcy <jda...@redhat.com> wrote:
>>>>
>>>>> > I attempted to get us more space on NetBSD by creating a new
>>>>> partition called
>>>>> > /data and putting /build as a symlink to /data/build. This has caused
>>>>> > problems
>>>>> > with tests/basic/quota.t. It's marked as bad for master, but not for
>>>>> > release-3.7. This is possibly because we have a hard-coded grep for
>>>>> > /build/install against df -h.
>>>>>
>>>>> For the benefit of anyone else looking at this, the grep actually
>>>>> seems to be
>>>>> in volume.rc and not in the test itself.
>>>>>
>>>>
>>>> That's right -  it appears to have been done to exclude the install
>>>> path components from the df output which is what is being done to find the
>>>> aux mount. Is there a better way to figure out if the aux mount is running?
>>>>
>>>>>
>>>>> > Nithya has spent the last 2 days debugging
>>>>> > without much success. What's a good way forward here? Mark the test
>>>>> as
>>>>> > failing for 3.7?
>>>>>
>>>>
>>>> Right. Something went wrong with the system and it refused to run the
>>>> tests after a while.
>>>>
>>>>
>>>>>
>>>>> I don't think so.  There are 13 tests that use the affected function
>>>>> (get_aux).  Do we want to disable 13 tests?  I think we actually need
>>>>> to fix the function instead.  It seems to me that the check we're
>>>>> making is very hacky in two ways:
>>>>>
>>>>>Checking for both /run and /var/run instead of using
>>>>> GLUSTERD_WORKDIR
>>>>>
>>>>>Excluding /build/install for no obvious reason at all
>>>>>
>>>>
>>>> This looks like it was done to remove the /build/install components
>>>> from the df -h outputs. Changing the path to /data/build/install broke this
>>>> as it did not strip the "/data" from the paths.
>>>> It did work when I changed the sed to act on /data/build/install but
>>>> hardcoded paths are not a good approach.
>>>>
>>>
>>> Give me some time, I can send out a patch to print out the default run
>>> directory if that helps?
>>> something similar to 'gluster --print-logdir'. What shall we call this?
>>> 'gluster --print-rundir'? it will
>>>
>>>
>>
>> This path might be available as an env variable - but is there a better
>> way to figure out the aux mount without bothering with df -h?
>>
>>>
>>>>> These auxiliary mounts should be in a much more specific place, and we
>>>>> should check for that instead of looking for any that might exist.  Who
>>>>> knows where that place is?  I've copied Raghavendra G as the quota
>>>>> maintainer, since that seems like our best bet.
>>>>> ___
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel@gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>>
>>>> ___
>>>> Gluster-devel mailing list
>>>> Gluster-devel@gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>
>>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-22 Thread Pranith Kumar Karampuri
On Fri, Jul 22, 2016 at 7:42 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On Fri, Jul 22, 2016 at 7:39 PM, Nithya Balachandran <nbala...@redhat.com>
> wrote:
>
>>
>>
>> On Fri, Jul 22, 2016 at 7:31 PM, Jeff Darcy <jda...@redhat.com> wrote:
>>
>>> > I attempted to get us more space on NetBSD by creating a new partition
>>> called
>>> > /data and putting /build as a symlink to /data/build. This has caused
>>> > problems
>>> > with tests/basic/quota.t. It's marked as bad for master, but not for
>>> > release-3.7. This is possibly because we have a hard-coded grep for
>>> > /build/install against df -h.
>>>
>>> For the benefit of anyone else looking at this, the grep actually seems
>>> to be
>>> in volume.rc and not in the test itself.
>>>
>>
>> That's right -  it appears to have been done to exclude the install path
>> components from the df output which is what is being done to find the aux
>> mount. Is there a better way to figure out if the aux mount is running?
>>
>>>
>>> > Nithya has spent the last 2 days debugging
>>> > without much success. What's a good way forward here? Mark the test as
>>> > failing for 3.7?
>>>
>>
>> Right. Something went wrong with the system and it refused to run the
>> tests after a while.
>>
>>
>>>
>>> I don't think so.  There are 13 tests that use the affected function
>>> (get_aux).  Do we want to disable 13 tests?  I think we actually need
>>> to fix the function instead.  It seems to me that the check we're
>>> making is very hacky in two ways:
>>>
>>>Checking for both /run and /var/run instead of using GLUSTERD_WORKDIR
>>>
>>>Excluding /build/install for no obvious reason at all
>>>
>>
>> This looks like it was done to remove the /build/install components from
>> the df -h outputs. Changing the path to /data/build/install broke this as
>> it did not strip the "/data" from the paths.
>> It did work when I changed the sed to act on /data/build/install but
>> hardcoded paths are not a good approach.
>>
>
> Give me some time, I can send out a patch to print out the default run
> directory if that helps?
> something similar to 'gluster --print-logdir'. What shall we call this?
> 'gluster --print-rundir'? it will
>

Wait, it seems like 'gluster --print-statedumpdir' already prints
'/var/run/gluster', is this the directory we want?


>
>
>>
>>> These auxiliary mounts should be in a much more specific place, and we
>>> should check for that instead of looking for any that might exist.  Who
>>> knows where that place is?  I've copied Raghavendra G as the quota
>>> maintainer, since that seems like our best bet.
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-22 Thread Pranith Kumar Karampuri
I am playing with the following diff, let me see.

diff --git a/tests/volume.rc b/tests/volume.rc
index 331a802..b288508 100644
--- a/tests/volume.rc
+++ b/tests/volume.rc
@@ -579,7 +579,9 @@ function num_graphs
 function get_aux()
 {
 ##Check if a auxiliary mount is there
-df -h 2>&1 | sed 's#/build/install##' | grep -e
"[[:space:]]/run/gluster/${V0}$" -e "[[:space:]]/var/run/gluster/${V0}$" -
+local rundir=$(gluster --print-statedumpdir)
+local pid=$(cat ${rundir}/${V0}.pid)
+pidof glusterfs 2>&1 | grep -w $pid

 if [ $? -eq 0 ]
 then


On Fri, Jul 22, 2016 at 7:44 PM, Nithya Balachandran <nbala...@redhat.com>
wrote:

>
>
> On Fri, Jul 22, 2016 at 7:42 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Fri, Jul 22, 2016 at 7:39 PM, Nithya Balachandran <nbala...@redhat.com
>> > wrote:
>>
>>>
>>>
>>> On Fri, Jul 22, 2016 at 7:31 PM, Jeff Darcy <jda...@redhat.com> wrote:
>>>
>>>> > I attempted to get us more space on NetBSD by creating a new
>>>> partition called
>>>> > /data and putting /build as a symlink to /data/build. This has caused
>>>> > problems
>>>> > with tests/basic/quota.t. It's marked as bad for master, but not for
>>>> > release-3.7. This is possibly because we have a hard-coded grep for
>>>> > /build/install against df -h.
>>>>
>>>> For the benefit of anyone else looking at this, the grep actually seems
>>>> to be
>>>> in volume.rc and not in the test itself.
>>>>
>>>
>>> That's right -  it appears to have been done to exclude the install path
>>> components from the df output which is what is being done to find the aux
>>> mount. Is there a better way to figure out if the aux mount is running?
>>>
>>>>
>>>> > Nithya has spent the last 2 days debugging
>>>> > without much success. What's a good way forward here? Mark the test as
>>>> > failing for 3.7?
>>>>
>>>
>>> Right. Something went wrong with the system and it refused to run the
>>> tests after a while.
>>>
>>>
>>>>
>>>> I don't think so.  There are 13 tests that use the affected function
>>>> (get_aux).  Do we want to disable 13 tests?  I think we actually need
>>>> to fix the function instead.  It seems to me that the check we're
>>>> making is very hacky in two ways:
>>>>
>>>>Checking for both /run and /var/run instead of using GLUSTERD_WORKDIR
>>>>
>>>>Excluding /build/install for no obvious reason at all
>>>>
>>>
>>> This looks like it was done to remove the /build/install components from
>>> the df -h outputs. Changing the path to /data/build/install broke this as
>>> it did not strip the "/data" from the paths.
>>> It did work when I changed the sed to act on /data/build/install but
>>> hardcoded paths are not a good approach.
>>>
>>
>> Give me some time, I can send out a patch to print out the default run
>> directory if that helps?
>> something similar to 'gluster --print-logdir'. What shall we call this?
>> 'gluster --print-rundir'? it will
>>
>>
>
> This path might be available as an env variable - but is there a better
> way to figure out the aux mount without bothering with df -h?
>
>>
>>>> These auxiliary mounts should be in a much more specific place, and we
>>>> should check for that instead of looking for any that might exist.  Who
>>>> knows where that place is?  I've copied Raghavendra G as the quota
>>>> maintainer, since that seems like our best bet.
>>>> ___
>>>> Gluster-devel mailing list
>>>> Gluster-devel@gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>>
>> --
>> Pranith
>>
>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Pranith Kumar Karampuri
On Fri, Jul 22, 2016 at 7:07 PM, Jeff Darcy  wrote:

> > Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So
> xavi
> > and I were wondering why cleanup_and_exit() is not sending GF_PARENT_DOWN
> > event.
>
> OK, then that grinding sound you hear is my brain shifting gears.  ;)  It
> seems that cleanup_and_exit will call xlator.fini in some few cases, but
> it doesn't do anything that would send notify events.  I'll bet the answer
> to "why" is just that nobody thought of it or got around to it.  The next
> question I'd ask is: can you do what you need to do from ec.fini instead?
> That would require enabling it in should_call_fini as well, but otherwise
> seems pretty straightforward.
>
> If the answer to that question is no, then things get more complicated.
> Can we do one loop that sends GF_EVENT_PARENT_DOWN events, then another
> that calls fini?  Can we just do a basic list traversal (as we do now for
> fini) or do we need to do something more complicated to deal with cluster
> translators?  I think a separate loop doing basic list traversal would
> work, even with brick multiplexing, so it's probably worth just coding it
> up as an experiment.
>

I don't think we need any list traversal because notify sends it down the
graph. I guess I will start the experiment then.


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] fail-over taking too long when a node reboots

2016-07-27 Thread Pranith Kumar Karampuri
hi,
 Does anyone have complete understanding of keepalive timeout vs TCP
User timeout (UTO) options? For both afr and EC when the server reboots it
takes 42 seconds for the fops to fail with ENOTCONN
(saved_frames_unwind()). I am wondering if there is any way to reduce this
time by playing with these two options. As per our earlier research on this
(I think it was kp who did that) keepalive was not getting triggered when
there are fops in progress and he saw quite a few game-dev forums talk
about this problem too. It seems like there is a new timeout called TCP
User timeout which seems to address this. I am wondering if anyone of you
have any experience with this and suggest defaults to be changed for these
timeouts which are more meaningful. I think at the moment default is 42
seconds.

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] volfile init/reconfigure have been working by accident?

2016-07-14 Thread Pranith Kumar Karampuri
The problem with approach is if partial reconfigure succeeds/fails we don't
know which keys to update and which ones to not update.

On Thu, Jul 14, 2016 at 5:42 PM, Mohammed Rafi K C <rkavu...@redhat.com>
wrote:

> How about storing the same data variable(from new_xl options dict) with a
> ref in the  options dictionary of old xlator.
>
> Regards
> Rafi KC
>
>
> On 07/14/2016 08:28 AM, Pranith Kumar Karampuri wrote:
>
> hi,
> I wanted to remove 'get_new_dict()', 'dict_destroy()' usage
> through out the code base to prevent people from using it wrong. Regression
> for that patch http://review.gluster.org/13183 kept failing and I found
> that the 'xl->options' dictionary is created using get_new_dict() i.e. it
> doesn't have any refs. And in xlator_members_free() we try to destroy it
> using dict_unref() i.e. ref count becomes '-1' and the dictionary doesn't
> get destroyed. so every reconfigure is leaking dictionaries. So all the
> options which use string options actually point to the values in these
> dictionaries. Initially I thought we can have latest reconfigured options
> dictionary also stored in new member 'xl->reconfigured_options' but the
> problem is reconfigure can partially succeed leading to dilemma about which
> options succeeded/failed and which dictionary to keep around. Failing in
> reconfigure doesn't stop the brick. At the moment the only way out I see is
> to perform [de]allocation of the string, bool(we can prevent for bool)
> options, may be there are more, I need to check. But this becomes one more
> big patch('fini' should GF_FREE all these options), so wondering if anyone
> has any other thoughts on fixing this properly without a lot of code
> changes.
>
> --
> Pranith
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Reducing merge conflicts

2016-07-14 Thread Pranith Kumar Karampuri
On Fri, Jul 8, 2016 at 7:27 PM, Jeff Darcy  wrote:

> (combining replies to multiple people)
>
> Pranith:
> > I agree about encouraging specific kind of review. At the same time we
> need
> > to make reviewing, helping users in the community as important as sending
> > patches in the eyes of everyone. It is very important to know these
> > statistics to move in the right direction. My main problem with this is,
> > everyone knows that reviews are important, then why are they not
> happening?
> > Is it really laziness?
>
> "Laziness" was clearly a bad choice of words, for which I apologize.  I
> should have said "lack of diligence" or something to reflect that it's an
> *organizational* rather than personal problem.  We *as a group* have not
> been keeping up with the review workload.  Whatever the reasons are, to
> change the outcome we need to change behavior, and to change behavior we
> need to change the incentives.
>
>
> Raghavendra G:
> > Personally I've found a genuine -1 to be more valuable than a +1. Since
> we
> > are discussing about measuring, how does one measure the issues that are
> > prevented (through a good design, thoughtful coding/review) than the
> issues
> > that are _fixed_?
>
> Another excellent point.  It's easier to see the failures than the
> successes.
> It's a bit like traffic accidents.  Everyone sees when you cause one, but
> not
> when you avoid one.  If a regression occurs, everyone can look back to see
> who the author and reviewers were.  If there's no regression ... what then?
> Pranith has suggested some mechanism to give credit/karma in cases where it
> can't be done automatically.  Meta-review (review of reviews) is another
> possibility.  I've seen it work in other contexts, but I'm not sure how to
> apply it here.
>
> > Measuring -1s and -2s along with +1s and +2s can be a good
> > place to start with (though as with many measurements, they may not
> reflect
> > the underlying value accurately).
>
> The danger here is that we'll incentivize giving a -1 for superficial
> reasons.  We don't need more patches blocked because a reviewer doesn't
> like a file/variable name, or wants to play "I know a better way" games.
> Unfortunately, it's hard to distinguish those from enforcing standards
> that really matter, or avoiding technical debt.  I guess that brings us
> back to manual overrides and/or meta-review.
>
>
> Poornima:
> > Below are the few things that we can do to reduce our review backlog:
> > - No time for maintainers to review is not a good enough reason to bitrot
> > patches in review for months, it clearly means we need additional
> > maintainers for that component?
> > - Add maintainers for every component that is in Gluster(atleast the ones
> > which have incoming patches)
> > - For every patch we submit we add 'component(s)' label, and evaluate if
> > gerrit can automatically add maintainers as reviewers, and have another
> > label 'Maintainers ack' which needs to be present for any patch to be
> > merged.
>
> Excellent points.  Not much to add here, except that we also need a way to
> deal with patches that cross many components (as many of yours and mine
> do).
> If getting approval from one maintainer is a problem, getting approval from
> several will be worse.  Maybe it's enough to say that approval by one of
> those several maintainers is sufficient, and to rely on maintainers talking
> to one another.
>
>
> Atin:
> > How about having "review marathon" once a week by every team? In past
> this
> > has worked well and I don't see any reason why can't we spend 3-4 hours
> in a
> > meeting on weekly basis to review incoming patches on the component that
> the
> > team owns.
>
> I love this idea.  If I may add to it, I suggest that such "marathons" are
> a
> good way not only to reduce the backlog but also to teach people how to
> review well.  Reviewing's a skill, learnable like any other.  In addition
> to
> improving review quantity, getting reviews more focused on real bugs and
> technical debt would be great.
>
>
> Pranith (again):
> > Everyone in the team started reviewing the patches and giving +1 and I am
> > reviewing only after a +1.
>
> In the past I've done this myself (as a project-level maintainer), so I
> totally understand the motivation, but I'm still ambivalent about whether
> it's a good idea.  On the one hand, it seems like projects bigger than
> ours essentially work this way.  For example, how often does Linus review
> something that hasn't already been reviewed by one of his lieutenants?
> Not often, it seems.  On the other hand, reinforcing such hierarchies in
> the review process is counter to our goal of breaking them down in a more
> general sense.  I hope some day we can get to the point where people are
> actively seeking out things to review, instead of actively filtering the
> list they already have.
>

The feedback I got is, "it is not motivating to review patches that are
already merged by maintainer." 

Re: [Gluster-devel] Reducing merge conflicts

2016-07-14 Thread Pranith Kumar Karampuri
On Thu, Jul 14, 2016 at 11:29 PM, Joe Julian <j...@julianfamily.org> wrote:

> On 07/07/2016 08:58 PM, Pranith Kumar Karampuri wrote:
>
>
>
> On Fri, Jul 8, 2016 at 8:40 AM, Jeff Darcy <jda...@redhat.com> wrote:
>
>> > What gets measured gets managed.
>>
>> Exactly.  Reviewing is part of everyone's job, but reviews aren't tracked
>> in any way that matters.  Contrast that with the *enormous* pressure most
>> of us are under to get our own patches in, and it's pretty predictable
>> what will happen.  We need to change that calculation.
>>
>>
>> > What I have seen at least is that it is easy to find
>> > people who sent patches, how many patches someone sent in a month etc.
>> There
>> > is no easy way to get these numbers for reviews. 'Reviewed-by' tag in
>> commit
>> > only includes the people who did +1/+2 on the final revision of the
>> patch,
>> > which is bad.
>>
>> That's a very good point.  I think people people who comment also get
>> Reviewed-by: lines, but it doesn't matter because there's still a whole
>> world of things completely outside of Gerrit.  Reviews done by email won't
>> get counted, nor will consultations in the hallway or on IRC.  I have some
>> ideas who's most active in those ways.  Some (such as yourself) show up in
>> the Reviewed-by: statistics.  Others do not.  In terms of making sure
>> people get all the credit they deserve, those things need to be counted
>> too.  However, in terms of *getting the review queue unstuck* I'm not so
>> sure.  What matters for that is the reviews that Gerrit uses to determine
>> merge eligibility, so I think encouraging that specific kind of review
>> still moves us in a positive direction.
>>
>
> In my experience at least it was only adding 'reviewied-by' for the people
> who gave +1/+2 on the final version of the patch
>
> I agree about encouraging specific kind of review. At the same time we
> need to make reviewing, helping users in the community as important as
> sending patches in the eyes of everyone. It is very important to know these
> statistics to move in the right direction. My main problem with this is,
> everyone knows that reviews are important, then why are they not happening?
> Is it really laziness? Are we sure if there are people in the team who are
> not sharing the burden because of which it is becoming too much for 1 or 2
> people to handle the total load? All these things become very easy to
> reason about if we have this data. Then I am sure we can easily find how
> best to solve this issue. Same goes for spurious failures. These are not
> problems that are not faced by others in the world either. I remember
> watching a video where someone shared (I think it was in google) that they
> started putting giant TVs in the hall-way in all the offices and the people
> who don't attend to spurious-build-failure problems would show up on the
> screen for everyone in the world to see. Apparently the guy with the
> biggest picture(the one who was not attending to any build failures at all
> I guess) came to these folks and asked how should he get his picture
> removed from the screen, and it was solved in a day or two. We don't have
> to go to those lengths, but we do need data to nudge people in the right
> direction.
>
>
>
> Perhaps it's imposter syndrome. I know that even when I do leave comments
> on a patch, I don't add a +-1 because I don't think that my vote counts. I
> know I'm not part of the core developers so maybe I'm right, I don't know.
> Maybe some sort of published guidelines or mentorship could help?
>

Well it does count. I agree, some sort of published guidelines definitely
help. I absolutely hate what '-1' means though, it says 'I would prefer you
didn't submit this'. Somebody who doesn't know what he/she is doing still
goes ahead and sends his/her first patch and we say 'I would prefer you
didn't submit this'. It is like the tool is working against more
contributions. It could also say 'Thanks for your contribution, I feel we
can improve the patch further together' on -1 too you know.


>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Reducing merge conflicts

2016-07-14 Thread Pranith Kumar Karampuri
On Fri, Jul 15, 2016 at 1:09 AM, Jeff Darcy  wrote:

> > The feedback I got is, "it is not motivating to review patches that are
> > already merged by maintainer."
>
> I can totally understand that.  I've been pretty active reviewing lately,
> and it's an *awful* demotivating grind.  On the other hand, it's also
> pretty demotivating to see one's own hard work "rot" as the lack of
> reviews forces rebase after rebase.  Haven't we all seen that?  I'm
> sure the magnitude of that effect varies across teams and across parts
> of the code, but I'm equally sure that it affects all of us to some
> degree.
>
>
> > Do you suggest they should change that
> > behaviour in that case?
>
> Maybe.  The fact is that all of our maintainers have plenty of other
> responsibilities, and not all of them prioritize the same way.  I know I
> wouldn't be reviewing so many patches myself otherwise.  If reviews are
> being missed under the current rules, maybe we do need new rules.
>
> > let us give equal recognition for:
> > patches sent
> > patches reviewed - this one is missing.
> > helping users on gluster-users
> > helping users on #gluster/#gluster-dev
> >
> > Feel free to add anything more I might have missed out. May be new
> > ideas/design/big-refactor?
>
> Also doc, infrastructure work, blog/meetup/conference outreach, etc.
>
> > let people do what they like more among these and let us also recognize
> them
> > for all their contributions. Let us celebrate their work in each monthly
> > news letter.
>
> Good idea.
>

I think may be we should summarize and come up with action items so that we
can take proper steps to improve things. Or do we have any other things we
need to discuss?

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Reducing merge conflicts

2016-07-14 Thread Pranith Kumar Karampuri
On Fri, Jul 15, 2016 at 1:25 AM, Jeff Darcy  wrote:

> > I absolutely hate what '-1' means though, it says 'I would prefer you
> > didn't submit this'. Somebody who doesn't know what he/she is doing still
> > goes ahead and sends his/her first patch and we say 'I would prefer you
> > didn't submit this'. It is like the tool is working against more
> > contributions. It could also say 'Thanks for your contribution, I feel we
> > can improve the patch further together' on -1 too you know.
>
> When it comes to what -1 means, I've noticed quite a bit of variation
> across the group.  Sometimes it means the person doesn't want it merged
> *yet* because of minor issues (including style).  Sometimes it means they
> think the whole idea or approach is fundamentally misguided and they'll
> need significant convincing before they'll even look at the details.  (I
> tend to use -2 for that, but that's just me.)  It's definitely bad the
> way the message is worded to imply that mere *submission* is unwelcome.
> If Gerrit supports it - sadly I don't think it does - I think we could
> have a much more constructive set of -1 reasons:
>
>  * Needs style or packaging fixes (e.g. missing bug ID).
>
>  * Needs a test.
>
>  * Needs fixes for real bugs found in review.
>
>  * Needs answers/explanations/comments.
>
>  * Needs coordination with other patch .
>
> Alternatively, we could adopt an official set of such reasons as a
> matter of convention, much like we do with including the component
> in the one-line summary.  Would that help?
>

Yes that will help. Are you saying we add it in the comments when we give
'-1'?


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-22 Thread Pranith Kumar Karampuri
On Fri, Jul 22, 2016 at 10:25 PM, Jeff Darcy  wrote:

> > Based on what I saw in code, this seems to get the job done. Comments
> > welcome:
> > http://review.gluster.org/14988
>
> Good thinking.  Thanks, Pranith!
>

Nitya clarified my doubts as well on IRC :-).


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] readdir() harmful in threaded code

2016-07-23 Thread Pranith Kumar Karampuri
On Sat, Jul 23, 2016 at 8:02 PM, Emmanuel Dreyfus <m...@netbsd.org> wrote:

> Pranith Kumar Karampuri <pkara...@redhat.com> wrote:
>
> > So should we do readdir() with external locks for everything instead?
>
> readdir() with a per-directory lock is safe. However, it may come with a
> performance hit in some scenarios, since two threads cannot read the
> same directory at once. But I am not sure it can happen in GlusterFS.
>
> I am a bit disturbed by readdir_r() being planned for deprecation. The
> Open Group does not say that, or I missed it:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html


I will wait for more people to comment on this. Let us see what they think
as well.


>
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-23 Thread Pranith Kumar Karampuri
Thanks Atin

On Sat, Jul 23, 2016 at 7:29 PM, Atin Mukherjee <amukh...@redhat.com> wrote:

>
>
> On Saturday 23 July 2016, Pranith Kumar Karampuri <pkara...@redhat.com>
> wrote:
>
>> If someone could give +1 on 3.7 backport
>> http://review.gluster.org/#/c/14991, I can merge the patch. Then we can
>> start rebasing may be?
>>
>
> Merged!
>
>
>>
>> On Sat, Jul 23, 2016 at 12:23 PM, Atin Mukherjee <amukh...@redhat.com>
>> wrote:
>>
>>> AFAIK, an explicit rebase is required.
>>>
>>>
>>> On Saturday 23 July 2016, Pranith Kumar Karampuri <pkara...@redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Jul 23, 2016 at 10:17 AM, Nithya Balachandran <
>>>> nbala...@redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sat, Jul 23, 2016 at 9:45 AM, Nithya Balachandran <
>>>>> nbala...@redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 22, 2016 at 9:07 PM, Pranith Kumar Karampuri <
>>>>>> pkara...@redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 22, 2016 at 8:12 PM, Pranith Kumar Karampuri <
>>>>>>> pkara...@redhat.com> wrote:
>>>>>>>
>>>>>>>> I am playing with the following diff, let me see.
>>>>>>>>
>>>>>>>> diff --git a/tests/volume.rc b/tests/volume.rc
>>>>>>>> index 331a802..b288508 100644
>>>>>>>> --- a/tests/volume.rc
>>>>>>>> +++ b/tests/volume.rc
>>>>>>>> @@ -579,7 +579,9 @@ function num_graphs
>>>>>>>>  function get_aux()
>>>>>>>>  {
>>>>>>>>  ##Check if a auxiliary mount is there
>>>>>>>> -df -h 2>&1 | sed 's#/build/install##' | grep -e
>>>>>>>> "[[:space:]]/run/gluster/${V0}$" -e 
>>>>>>>> "[[:space:]]/var/run/gluster/${V0}$" -
>>>>>>>> +local rundir=$(gluster --print-statedumpdir)
>>>>>>>> +local pid=$(cat ${rundir}/${V0}.pid)
>>>>>>>> +pidof glusterfs 2>&1 | grep -w $pid
>>>>>>>>
>>>>>>>>  if [ $? -eq 0 ]
>>>>>>>>  then
>>>>>>>>
>>>>>>>
>>>>>>> Based on what I saw in code, this seems to get the job done.
>>>>>>> Comments welcome:
>>>>>>> http://review.gluster.org/14988
>>>>>>>
>>>>>>>
>>>>>> Nice work Pranith :)
>>>>>> All, once this is backported to release-3.7, any patches on
>>>>>> release-3.7 patches will need to be rebased so they will pass the NetBSD
>>>>>> regression.
>>>>>>
>>>>>
>>>>> I am suddenly confused about this - will the patches need to be
>>>>> rebased or with the next run automatically include the changes once
>>>>> Pranith's fix is merged?
>>>>>
>>>>
>>>> May be someone more knowledgeable about this should confirm this, but
>>>> at least from the build-log, I don't see any rebase command being executed
>>>> with origin/master:
>>>>
>>>> *04:07:36* Triggered by Gerrit: http://review.gluster.org/13762*04:07:36* 
>>>> Building remotely on slave26.cloud.gluster.org 
>>>> <https://build.gluster.org/computer/slave26.cloud.gluster.org> 
>>>> (smoke_tests rackspace_regression_2gb glusterfs-devrpms) in workspace 
>>>> /home/jenkins/root/workspace/rackspace-regression-2GB-triggered*04:07:36*  
>>>> > git rev-parse --is-inside-work-tree # timeout=10*04:07:36* Fetching 
>>>> changes from the remote Git repository*04:07:36*  > git config 
>>>> remote.origin.url git://review.gluster.org/glusterfs.git # 
>>>> timeout=10*04:07:36* Fetching upstream changes from 
>>>> git://review.gluster.org/glusterfs.git*04:07:36*  > git --version # 
>>>> timeout=10*04:07:36*  > git -c core.askpass=true fetch --tags --progress 
>>>> git://review.gluster.org/glusterfs.git refs/changes/62/13762/4*04:07:44*  
>>>> > git rev-parse 838b5c34127edd0450b0449e38f075f56056f2c

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-23 Thread Pranith Kumar Karampuri
On Sat, Jul 23, 2016 at 10:17 AM, Nithya Balachandran <nbala...@redhat.com>
wrote:

>
>
> On Sat, Jul 23, 2016 at 9:45 AM, Nithya Balachandran <nbala...@redhat.com>
> wrote:
>
>>
>>
>> On Fri, Jul 22, 2016 at 9:07 PM, Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Jul 22, 2016 at 8:12 PM, Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>
>>>> I am playing with the following diff, let me see.
>>>>
>>>> diff --git a/tests/volume.rc b/tests/volume.rc
>>>> index 331a802..b288508 100644
>>>> --- a/tests/volume.rc
>>>> +++ b/tests/volume.rc
>>>> @@ -579,7 +579,9 @@ function num_graphs
>>>>  function get_aux()
>>>>  {
>>>>  ##Check if a auxiliary mount is there
>>>> -df -h 2>&1 | sed 's#/build/install##' | grep -e
>>>> "[[:space:]]/run/gluster/${V0}$" -e "[[:space:]]/var/run/gluster/${V0}$" -
>>>> +local rundir=$(gluster --print-statedumpdir)
>>>> +local pid=$(cat ${rundir}/${V0}.pid)
>>>> +pidof glusterfs 2>&1 | grep -w $pid
>>>>
>>>>  if [ $? -eq 0 ]
>>>>  then
>>>>
>>>
>>> Based on what I saw in code, this seems to get the job done. Comments
>>> welcome:
>>> http://review.gluster.org/14988
>>>
>>>
>> Nice work Pranith :)
>> All, once this is backported to release-3.7, any patches on release-3.7
>> patches will need to be rebased so they will pass the NetBSD regression.
>>
>
> I am suddenly confused about this - will the patches need to be rebased or
> with the next run automatically include the changes once Pranith's fix is
> merged?
>

May be someone more knowledgeable about this should confirm this, but at
least from the build-log, I don't see any rebase command being executed
with origin/master:

*04:07:36* Triggered by Gerrit:
http://review.gluster.org/13762*04:07:36* Building remotely on
slave26.cloud.gluster.org
<https://build.gluster.org/computer/slave26.cloud.gluster.org>
(smoke_tests rackspace_regression_2gb glusterfs-devrpms) in workspace
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered*04:07:36*
 > git rev-parse --is-inside-work-tree # timeout=10*04:07:36* Fetching
changes from the remote Git repository*04:07:36*  > git config
remote.origin.url git://review.gluster.org/glusterfs.git #
timeout=10*04:07:36* Fetching upstream changes from
git://review.gluster.org/glusterfs.git*04:07:36*  > git --version #
timeout=10*04:07:36*  > git -c core.askpass=true fetch --tags
--progress git://review.gluster.org/glusterfs.git
refs/changes/62/13762/4*04:07:44*  > git rev-parse
838b5c34127edd0450b0449e38f075f56056f2c7^{commit} #
timeout=10*04:07:44* Checking out Revision
838b5c34127edd0450b0449e38f075f56056f2c7 (master)*04:07:44*  > git
config core.sparsecheckout # timeout=10*04:07:44*  > git checkout -f
838b5c34127edd0450b0449e38f075f56056f2c7*04:07:45*  > git rev-parse
FETCH_HEAD^{commit} # timeout=10*04:07:45*  > git rev-list
8cbee639520bf4631ce658e2da9b4bc3010d2eaa # timeout=10*04:07:45*  > git
tag -a -f -m Jenkins Build #22315
jenkins-rackspace-regression-2GB-triggered-22315 # timeout=10




>
>>>>
>>>> On Fri, Jul 22, 2016 at 7:44 PM, Nithya Balachandran <
>>>> nbala...@redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 22, 2016 at 7:42 PM, Pranith Kumar Karampuri <
>>>>> pkara...@redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 22, 2016 at 7:39 PM, Nithya Balachandran <
>>>>>> nbala...@redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 22, 2016 at 7:31 PM, Jeff Darcy <jda...@redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> > I attempted to get us more space on NetBSD by creating a new
>>>>>>>> partition called
>>>>>>>> > /data and putting /build as a symlink to /data/build. This has
>>>>>>>> caused
>>>>>>>> > problems
>>>>>>>> > with tests/basic/quota.t. It's marked as bad for master, but not
>>>>>>>> for
>>>>>>>> > release-3.7. This is possibly because we have a hard-coded grep
>>>>>>>> for
>>>>>>>> > /build/install against df -h.
>>>

Re: [Gluster-devel] readdir() harmful in threaded code

2016-07-23 Thread Pranith Kumar Karampuri
Emmanuel,
   I procrastinated too long on this :-/, It is July already :-(. I
just looked at the man page in Linux and it is a bit confusing, so I am not
sure how to go ahead.

For readdir_r(), I see:

DESCRIPTION
   This function is deprecated; use readdir(3) instead.

   The readdir_r() function was invented as a reentrant version of
readdir(3).  It reads
   the next directory entry from the directory stream dirp, and returns
it in the  call‐
   er-allocated  buffer  pointed  to by entry.  For details of the
dirent structure, see
   readir(3).

For readdir(3) I see:
ATTRIBUTES
   For an explanation of the terms used in this section, see
attributes(7).

   ┌──┬───┬──┐
   │Interface │ Attribute │ Value│
   ├──┼───┼──┤
   │readdir() │ Thread safety │ MT-Unsafe race:dirstream │
   └──┴───┴──┘

   In the current POSIX.1 specification (POSIX.1-2008), readdir() is
not required to be thread-safe.  However, in modern implementations
(including the glibc implementation), concur‐
   rent  calls  to readdir() that specify different directory streams
are thread-safe.  In cases where multiple threads must read from the same
directory stream, using readdir() with
   external synchronization is still preferable to the use of the
deprecated readdir_r(3) function.  It is expected that a future version of
POSIX.1 will require  that  readdir()  be
   thread-safe when concurrently employed on different directory
streams.


So should we do readdir() with external locks for everything instead?


On Thu, Feb 11, 2016 at 2:35 PM, Emmanuel Dreyfus  wrote:

> Juste to make sure there is no misunderstanding here: unfortunately I
> do not have time right now to submit a fix. It would be nice if someone
> else coule look at it.
>
> On Wed, Feb 10, 2016 at 01:48:52PM +, Emmanuel Dreyfus wrote:
> > Hi
> >
> > After obtaining a core in a regression, I noticed there are a few
> readdir()
> > use in threaded code. This is begging for a crash, as readdir() maintains
> > an internal state that will be trashed on concurent use. readdir_r()
> > should be used instead.
> >
> > A quick search shows readdir(à usage here:
> > contrib/fuse-util/mount_util.c:30
> > extras/test/ld-preload-test/ld-preload-test.c:310
> > extras/test/test-ffop.c:550
> > libglusterfs/src/compat.c:256
> > libglusterfs/src/compat.c:315
> > libglusterfs/src/syscall.c:97
> > tests/basic/fops-sanity.c:662
> > tests/utils/arequal-checksum.c:331
> >
> > Occurences in contrib, extra and tests are probably harmless are there
> > are usage in standalone programs that are not threaded. We are left with
> > three groups of problems:
> >
> > 1) libglusterfs/src/compat.c:256 and libglusterfs/src/compat.c:315
> > This is Solaris compatibility code. Is it used at all?
> >
> > 2)  libglusterfs/src/syscall.c:97 This is the sys_readdir() wrapper,
> > which is in turn used in:
> > libglusterfs/src/run.c:284
> > xlators/features/bit-rot/src/stub/bit-rot-stub-helpers.c:582
> > xlators/features/changelog/lib/src/gf-history-changelog.c:854
> > xlators/features/index/src/index.c:471
> > xlators/mgmt/glusterd/src/glusterd-snapshot-utils.c
> > xlators/storage/posix/src/posix.c:3700
> > xlators/storage/posix/src/posix.c:5896
> >
> > 3) We also find sys_readdir() in libglusterfs/src/common-utils.h for
> > GF_FOR_EACH_ENTRY_IN_DIR() which in turn appears in:
> > libglusterfs/src/common-utils.c:3979
> > libglusterfs/src/common-utils.c:4002
> > xlators/mgmt/glusterd/src/glusterd-hooks.c:365
> > xlators/mgmt/glusterd/src/glusterd-hooks.c:379
> > xlators/mgmt/glusterd/src/glusterd-store.c:651
> > xlators/mgmt/glusterd/src/glusterd-store.c:661
> > xlators/mgmt/glusterd/src/glusterd-store.c:1781
> > xlators/mgmt/glusterd/src/glusterd-store.c:1806
> > xlators/mgmt/glusterd/src/glusterd-store.c:3044
> > xlators/mgmt/glusterd/src/glusterd-store.c:3072
> > xlators/mgmt/glusterd/src/glusterd-store.c:3593
> > xlators/mgmt/glusterd/src/glusterd-store.c:3606
> > xlators/mgmt/glusterd/src/glusterd-store.c:4032
> > xlators/mgmt/glusterd/src/glusterd-store.c:4111
> >
> > There a hive of sprious bugs to squash here.
> >
> > --
> > Emmanuel Dreyfus
> > m...@netbsd.org
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
>
> --
> Emmanuel Dreyfus
> m...@netbsd.org
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] spurious failure in tests/basic/gfapi/libgfapi-fini-hang.t

2016-07-22 Thread Pranith Kumar Karampuri
I see both of your names in git blame output.
https://build.gluster.org/job/rackspace-regression-2GB-triggered/22439/console
has more information about the failure. This failure happened on
http://review.gluster.org/#/c/14985/ which changes only .t files so I
believe the reason for the failure to be something else. Could you please
take a look to find if there is a problem with the test or if it caught a
race in some recent code change?

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] I missed a patch in earlier releases of 3.7.x which is breaking virt usecase

2016-07-29 Thread Pranith Kumar Karampuri
On Sat, Jul 30, 2016 at 7:05 AM, Kaushal Madappa <kaus...@redhat.com> wrote:

> On 29 Jul 2016 23:16, "Pranith Kumar Karampuri" <pkara...@redhat.com>
> wrote:
> >
> > Krutika RC'd that I missed a patch which broke virt usecase.
> http://review.gluster.org/15050 is posted for this bug. Please don't
> release without this one. I will be available until this patch is merged in
> the morning...
>
> If someone reviews with a +1 I'll merge.
>

It is a straight backport. I merged it.


> >
> > --
> > Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Events API: Adding support for Client Events

2016-08-02 Thread Pranith Kumar Karampuri
On Tue, Aug 2, 2016 at 4:57 PM, Aravinda  wrote:

> Hi,
>
> As many of you aware, Gluster Eventing feature is available in Master. To
> add support to listen to the Events from GlusterFS Clients following
> changes are identified
>
> - Change in Eventsd to listen to tcp socket instead of Unix domain socket.
> This enables Client to send message to Eventsd running in Storage node.
> - On Client connection, share Port and Token details with Xdata
>

This is as a response for GETSPEC


> - Client gf_event will connect to this port and pushes the event(Includes
> Token)
> - Eventsd validates Token, publishes events only if Token is valid.
>
>
> Kaushal, Pranith, Atin Please add if I missed anything.
>
> Ref:
> Events API Design: http://review.gluster.org/13115
> Events API Intro & Demo:
> http://aravindavk.in/blog/10-mins-intro-to-gluster-eventing/ (CLI name
> changed from "gluster-eventing" to "gluster-eventsapi")
>
> --
> regards
> Aravinda
>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] rebase + regression run voting

2016-08-02 Thread Pranith Kumar Karampuri
hi Nigel,
   When we rebase the patch by just changing the
commit-message/description it rightly doesn't re-trigger the regression,
but the regression results are still coming for patchset before rebase so
the +1s are not appearing. You can see http://review.gluster.org/#/c/15070/
as an example. Is there a way to give +1s now for the patch or shall I
retrigger the runs?

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Events API: Adding support for Client Events

2016-08-02 Thread Pranith Kumar Karampuri
On Tue, Aug 2, 2016 at 8:21 PM, Vijay Bellur  wrote:

> On 08/02/2016 07:27 AM, Aravinda wrote:
>
>> Hi,
>>
>> As many of you aware, Gluster Eventing feature is available in Master.
>> To add support to listen to the Events from GlusterFS Clients following
>> changes are identified
>>
>> - Change in Eventsd to listen to tcp socket instead of Unix domain
>> socket. This enables Client to send message to Eventsd running in
>> Storage node.
>> - On Client connection, share Port and Token details with Xdata
>> - Client gf_event will connect to this port and pushes the
>> event(Includes Token)
>> - Eventsd validates Token, publishes events only if Token is valid.
>>
>>
> Is there a lifetime/renewal associated with this token? Are there more
> details on how token management is being done? Sorry if these are repeat
> questions as I might have missed something along the review trail!
>

At least in the discussion it didn't seem like we needed any new tokens
once it is generated. Do you have any usecase?


>
> -Vijay
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS-3.7.14 released

2016-08-11 Thread Pranith Kumar Karampuri
On Thu, Aug 11, 2016 at 4:29 PM, Serkan Çoban <cobanser...@gmail.com> wrote:

> I can wait for the patch to complete, please inform me when you ready.
> If it will take too much time to solve the crawl issue I can test
> without it too...
>

I don't know the Root cause for the problem, so I am not sure by when it
will be ready. Let me build the rpms, I have a meeting now for around an
hour. I will start building rpms after that.


>
> Serkan
>
> On Thu, Aug 11, 2016 at 5:52 AM, Pranith Kumar Karampuri
> <pkara...@redhat.com> wrote:
> >
> >
> > On Wed, Aug 10, 2016 at 1:58 PM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> Any progress about the patch?
> >
> >
> > hi Serkan,
> >While testing the patch by myself, I am seeing that it is taking
> more
> > than one crawl to complete heals even when there are no  directory
> > hierarchies. It is faster than before but it shouldn't take more than 1
> > crawl to complete the heal because all the files exist already. I am
> > investigating why that is the case now. If you want to test things out
> > without this patch I will give you rpms today. Otherwise we need to find
> > until we find RCA for this crawl problem. Let me know your decision. If
> you
> > are okay with testing progressive versions of this feature, that would be
> > great. We can compare how each patch improved the performance.
> >
> > Pranith
> >
> >>
> >>
> >> On Thu, Aug 4, 2016 at 10:16 AM, Pranith Kumar Karampuri
> >> <pkara...@redhat.com> wrote:
> >> >
> >> >
> >> > On Thu, Aug 4, 2016 at 11:30 AM, Serkan Çoban <cobanser...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Thanks Pranith,
> >> >> I am waiting for RPMs to show, I will do the tests as soon as
> possible
> >> >> and inform you.
> >> >
> >> >
> >> > I guess on 3.7.x the RPMs are not automatically built. Let me find how
> >> > it
> >> > can be done. I will inform you after finding that out. Give me a day.
> >> >
> >> >>
> >> >>
> >> >> On Wed, Aug 3, 2016 at 11:19 PM, Pranith Kumar Karampuri
> >> >> <pkara...@redhat.com> wrote:
> >> >> >
> >> >> >
> >> >> > On Thu, Aug 4, 2016 at 1:47 AM, Pranith Kumar Karampuri
> >> >> > <pkara...@redhat.com> wrote:
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Thu, Aug 4, 2016 at 12:51 AM, Serkan Çoban
> >> >> >> <cobanser...@gmail.com>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> I use rpms for installation. Redhat/Centos 6.8.
> >> >> >>
> >> >> >>
> >> >> >> http://review.gluster.org/#/c/15084 is the patch. In some time
> the
> >> >> >> rpms
> >> >> >> will be built actually.
> >> >> >
> >> >> >
> >> >> > In the same URL above it will actually post the rpms for
> >> >> > fedora/el6/el7
> >> >> > at
> >> >> > the end of the page.
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> Use gluster volume set  disperse.shd-max-threads
> >> >> >>  >> >> >> (range: 1-64)>
> >> >> >>
> >> >> >> While testing this I thought of ways to decrease the number of
> >> >> >> crawls
> >> >> >> as
> >> >> >> well. But they are a bit involved. Try to create same set of data
> >> >> >> and
> >> >> >> see
> >> >> >> what is the time it takes to complete heals using number of
> threads
> >> >> >> as
> >> >> >> you
> >> >> >> increase the number of parallel heals from 1 to 64.
> >> >> >>
> >> >> >>>
> >> >> >>> On Wed, Aug 3, 2016 at 10:16 PM, Pranith Kumar Karampuri
> >> >> >>> <pkara...@redhat.com> wrote:
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > On Thu, Aug 4, 2016 at 12:45 AM, Serkan Çoban
> >> >> >

Re: [Gluster-devel] [Gluster-users] GlusterFS-3.7.14 released

2016-08-04 Thread Pranith Kumar Karampuri
On Thu, Aug 4, 2016 at 11:30 AM, Serkan Çoban <cobanser...@gmail.com> wrote:

> Thanks Pranith,
> I am waiting for RPMs to show, I will do the tests as soon as possible
> and inform you.
>

I guess on 3.7.x the RPMs are not automatically built. Let me find how it
can be done. I will inform you after finding that out. Give me a day.


>
> On Wed, Aug 3, 2016 at 11:19 PM, Pranith Kumar Karampuri
> <pkara...@redhat.com> wrote:
> >
> >
> > On Thu, Aug 4, 2016 at 1:47 AM, Pranith Kumar Karampuri
> > <pkara...@redhat.com> wrote:
> >>
> >>
> >>
> >> On Thu, Aug 4, 2016 at 12:51 AM, Serkan Çoban <cobanser...@gmail.com>
> >> wrote:
> >>>
> >>> I use rpms for installation. Redhat/Centos 6.8.
> >>
> >>
> >> http://review.gluster.org/#/c/15084 is the patch. In some time the rpms
> >> will be built actually.
> >
> >
> > In the same URL above it will actually post the rpms for fedora/el6/el7
> at
> > the end of the page.
> >
> >>
> >>
> >> Use gluster volume set  disperse.shd-max-threads  >> (range: 1-64)>
> >>
> >> While testing this I thought of ways to decrease the number of crawls as
> >> well. But they are a bit involved. Try to create same set of data and
> see
> >> what is the time it takes to complete heals using number of threads as
> you
> >> increase the number of parallel heals from 1 to 64.
> >>
> >>>
> >>> On Wed, Aug 3, 2016 at 10:16 PM, Pranith Kumar Karampuri
> >>> <pkara...@redhat.com> wrote:
> >>> >
> >>> >
> >>> > On Thu, Aug 4, 2016 at 12:45 AM, Serkan Çoban <cobanser...@gmail.com
> >
> >>> > wrote:
> >>> >>
> >>> >> I prefer 3.7 if it is ok for you. Can you also provide build
> >>> >> instructions?
> >>> >
> >>> >
> >>> > 3.7 should be fine. Do you use rpms/debs/anything-else?
> >>> >
> >>> >>
> >>> >>
> >>> >> On Wed, Aug 3, 2016 at 10:12 PM, Pranith Kumar Karampuri
> >>> >> <pkara...@redhat.com> wrote:
> >>> >> >
> >>> >> >
> >>> >> > On Thu, Aug 4, 2016 at 12:37 AM, Serkan Çoban
> >>> >> > <cobanser...@gmail.com>
> >>> >> > wrote:
> >>> >> >>
> >>> >> >> Yes, but I can create 2+1(or 8+2) ec using two servers right? I
> >>> >> >> have
> >>> >> >> 26 disks on each server.
> >>> >> >
> >>> >> >
> >>> >> > On which release-branch do you want the patch? I am testing it on
> >>> >> > master-branch now.
> >>> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >> On Wed, Aug 3, 2016 at 9:59 PM, Pranith Kumar Karampuri
> >>> >> >> <pkara...@redhat.com> wrote:
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > On Thu, Aug 4, 2016 at 12:23 AM, Serkan Çoban
> >>> >> >> > <cobanser...@gmail.com>
> >>> >> >> > wrote:
> >>> >> >> >>
> >>> >> >> >> I have two of my storage servers free, I think I can use them
> >>> >> >> >> for
> >>> >> >> >> testing. Is two server testing environment ok for you?
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > I think it would be better if you have at least 3. You can test
> >>> >> >> > it
> >>> >> >> > with
> >>> >> >> > 2+1
> >>> >> >> > ec configuration.
> >>> >> >> >
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> On Wed, Aug 3, 2016 at 9:44 PM, Pranith Kumar Karampuri
> >>> >> >> >> <pkara...@redhat.com> wrote:
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > On Wed, Aug 3, 2016 at 6:01 PM, Serkan Çoban
> >>> >> >> >> > <cobanser...@gmail.com>
> >>> >&

Re: [Gluster-devel] [Gluster-users] GlusterFS-3.7.14 released

2016-08-03 Thread Pranith Kumar Karampuri
On Thu, Aug 4, 2016 at 12:23 AM, Serkan Çoban <cobanser...@gmail.com> wrote:

> I have two of my storage servers free, I think I can use them for
> testing. Is two server testing environment ok for you?
>

I think it would be better if you have at least 3. You can test it with 2+1
ec configuration.


>
> On Wed, Aug 3, 2016 at 9:44 PM, Pranith Kumar Karampuri
> <pkara...@redhat.com> wrote:
> >
> >
> > On Wed, Aug 3, 2016 at 6:01 PM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> May I ask if multi-threaded self heal for distributed disperse volumes
> >> implemented in this release?
> >
> >
> > Serkan,
> > At the moment I am a bit busy with different work, Is it possible
> > for you to help test the feature if I provide a patch? Actually the patch
> > should be small. Testing is where lot of time will be spent on.
> >
> >>
> >>
> >> Thanks,
> >> Serkan
> >>
> >> On Tue, Aug 2, 2016 at 5:30 PM, David Gossage
> >> <dgoss...@carouselchecks.com> wrote:
> >> > On Tue, Aug 2, 2016 at 6:01 AM, Lindsay Mathieson
> >> > <lindsay.mathie...@gmail.com> wrote:
> >> >>
> >> >> On 2/08/2016 5:07 PM, Kaushal M wrote:
> >> >>>
> >> >>> GlusterFS-3.7.14 has been released. This is a regular minor release.
> >> >>> The release-notes are available at
> >> >>>
> >> >>>
> >> >>>
> https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.14.md
> >> >>
> >> >>
> >> >> Thanks Kaushal, I'll check it out
> >> >>
> >> >
> >> > So far on my test box its working as expected.  At least the issues
> that
> >> > prevented it from running as before have disappeared.  Will need to
> see
> >> > how
> >> > my test VM behaves after a few days.
> >> >
> >> >
> >> >
> >> >> --
> >> >> Lindsay Mathieson
> >> >>
> >> >> ___
> >> >> Gluster-users mailing list
> >> >> gluster-us...@gluster.org
> >> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >> >
> >> >
> >> >
> >> > ___
> >> > Gluster-users mailing list
> >> > gluster-us...@gluster.org
> >> > http://www.gluster.org/mailman/listinfo/gluster-users
> >> ___
> >> Gluster-users mailing list
> >> gluster-us...@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> >
> > --
> > Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS-3.7.14 released

2016-08-03 Thread Pranith Kumar Karampuri
On Wed, Aug 3, 2016 at 6:01 PM, Serkan Çoban  wrote:

> Hi,
>
> May I ask if multi-threaded self heal for distributed disperse volumes
> implemented in this release?
>

Serkan,
At the moment I am a bit busy with different work, Is it possible
for you to help test the feature if I provide a patch? Actually the patch
should be small. Testing is where lot of time will be spent on.


>
> Thanks,
> Serkan
>
> On Tue, Aug 2, 2016 at 5:30 PM, David Gossage
>  wrote:
> > On Tue, Aug 2, 2016 at 6:01 AM, Lindsay Mathieson
> >  wrote:
> >>
> >> On 2/08/2016 5:07 PM, Kaushal M wrote:
> >>>
> >>> GlusterFS-3.7.14 has been released. This is a regular minor release.
> >>> The release-notes are available at
> >>>
> >>>
> https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.14.md
> >>
> >>
> >> Thanks Kaushal, I'll check it out
> >>
> >
> > So far on my test box its working as expected.  At least the issues that
> > prevented it from running as before have disappeared.  Will need to see
> how
> > my test VM behaves after a few days.
> >
> >
> >
> >> --
> >> Lindsay Mathieson
> >>
> >> ___
> >> Gluster-users mailing list
> >> gluster-us...@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> > ___
> > Gluster-users mailing list
> > gluster-us...@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS-3.7.14 released

2016-08-03 Thread Pranith Kumar Karampuri
On Thu, Aug 4, 2016 at 12:37 AM, Serkan Çoban <cobanser...@gmail.com> wrote:

> Yes, but I can create 2+1(or 8+2) ec using two servers right? I have
> 26 disks on each server.
>

On which release-branch do you want the patch? I am testing it on
master-branch now.


>
> On Wed, Aug 3, 2016 at 9:59 PM, Pranith Kumar Karampuri
> <pkara...@redhat.com> wrote:
> >
> >
> > On Thu, Aug 4, 2016 at 12:23 AM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
> >>
> >> I have two of my storage servers free, I think I can use them for
> >> testing. Is two server testing environment ok for you?
> >
> >
> > I think it would be better if you have at least 3. You can test it with
> 2+1
> > ec configuration.
> >
> >>
> >>
> >> On Wed, Aug 3, 2016 at 9:44 PM, Pranith Kumar Karampuri
> >> <pkara...@redhat.com> wrote:
> >> >
> >> >
> >> > On Wed, Aug 3, 2016 at 6:01 PM, Serkan Çoban <cobanser...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> May I ask if multi-threaded self heal for distributed disperse
> volumes
> >> >> implemented in this release?
> >> >
> >> >
> >> > Serkan,
> >> > At the moment I am a bit busy with different work, Is it
> >> > possible
> >> > for you to help test the feature if I provide a patch? Actually the
> >> > patch
> >> > should be small. Testing is where lot of time will be spent on.
> >> >
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Serkan
> >> >>
> >> >> On Tue, Aug 2, 2016 at 5:30 PM, David Gossage
> >> >> <dgoss...@carouselchecks.com> wrote:
> >> >> > On Tue, Aug 2, 2016 at 6:01 AM, Lindsay Mathieson
> >> >> > <lindsay.mathie...@gmail.com> wrote:
> >> >> >>
> >> >> >> On 2/08/2016 5:07 PM, Kaushal M wrote:
> >> >> >>>
> >> >> >>> GlusterFS-3.7.14 has been released. This is a regular minor
> >> >> >>> release.
> >> >> >>> The release-notes are available at
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.14.md
> >> >> >>
> >> >> >>
> >> >> >> Thanks Kaushal, I'll check it out
> >> >> >>
> >> >> >
> >> >> > So far on my test box its working as expected.  At least the issues
> >> >> > that
> >> >> > prevented it from running as before have disappeared.  Will need to
> >> >> > see
> >> >> > how
> >> >> > my test VM behaves after a few days.
> >> >> >
> >> >> >
> >> >> >
> >> >> >> --
> >> >> >> Lindsay Mathieson
> >> >> >>
> >> >> >> ___
> >> >> >> Gluster-users mailing list
> >> >> >> gluster-us...@gluster.org
> >> >> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >> >> >
> >> >> >
> >> >> >
> >> >> > ___
> >> >> > Gluster-users mailing list
> >> >> > gluster-us...@gluster.org
> >> >> > http://www.gluster.org/mailman/listinfo/gluster-users
> >> >> ___
> >> >> Gluster-users mailing list
> >> >> gluster-us...@gluster.org
> >> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Pranith
> >
> >
> >
> >
> > --
> > Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] why ec fallocate is not supported

2016-08-10 Thread Pranith Kumar Karampuri
We would definitely love patches. I think you should try to follow how
ftruncate fop is implemented for a similar fop implementation.
Take a look at:
1) ec_gf_ftruncate
2) ec_ftruncate
3) ec_wind_ftruncate
4) ec_manager_truncate(truncate, ftruncate reuse this function)

Feel free to send any more questions you may have.

On Tue, Aug 9, 2016 at 12:42 PM, 李迪  wrote:

>  If I want to implement it, is there any suggestions?
>
>
>
> 于 2016年08月09日 14:48, 李迪 写道:
>
>> Hi Xavier,
>>
>> I want to use fallocate to reduce file system fragmentations, but ec
>> can not support it now.
>>
>> Why not support it?
>>
>>
>>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS-3.7.14 released

2016-08-10 Thread Pranith Kumar Karampuri
On Wed, Aug 10, 2016 at 1:58 PM, Serkan Çoban <cobanser...@gmail.com> wrote:

> Hi,
>
> Any progress about the patch?
>

hi Serkan,
   While testing the patch by myself, I am seeing that it is taking
more than one crawl to complete heals even when there are no  directory
hierarchies. It is faster than before but it shouldn't take more than 1
crawl to complete the heal because all the files exist already. I am
investigating why that is the case now. If you want to test things out
without this patch I will give you rpms today. Otherwise we need to find
until we find RCA for this crawl problem. Let me know your decision. If you
are okay with testing progressive versions of this feature, that would be
great. We can compare how each patch improved the performance.

Pranith


>
> On Thu, Aug 4, 2016 at 10:16 AM, Pranith Kumar Karampuri
> <pkara...@redhat.com> wrote:
> >
> >
> > On Thu, Aug 4, 2016 at 11:30 AM, Serkan Çoban <cobanser...@gmail.com>
> wrote:
> >>
> >> Thanks Pranith,
> >> I am waiting for RPMs to show, I will do the tests as soon as possible
> >> and inform you.
> >
> >
> > I guess on 3.7.x the RPMs are not automatically built. Let me find how it
> > can be done. I will inform you after finding that out. Give me a day.
> >
> >>
> >>
> >> On Wed, Aug 3, 2016 at 11:19 PM, Pranith Kumar Karampuri
> >> <pkara...@redhat.com> wrote:
> >> >
> >> >
> >> > On Thu, Aug 4, 2016 at 1:47 AM, Pranith Kumar Karampuri
> >> > <pkara...@redhat.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Aug 4, 2016 at 12:51 AM, Serkan Çoban <cobanser...@gmail.com
> >
> >> >> wrote:
> >> >>>
> >> >>> I use rpms for installation. Redhat/Centos 6.8.
> >> >>
> >> >>
> >> >> http://review.gluster.org/#/c/15084 is the patch. In some time the
> rpms
> >> >> will be built actually.
> >> >
> >> >
> >> > In the same URL above it will actually post the rpms for
> fedora/el6/el7
> >> > at
> >> > the end of the page.
> >> >
> >> >>
> >> >>
> >> >> Use gluster volume set  disperse.shd-max-threads
>  >> >> (range: 1-64)>
> >> >>
> >> >> While testing this I thought of ways to decrease the number of crawls
> >> >> as
> >> >> well. But they are a bit involved. Try to create same set of data and
> >> >> see
> >> >> what is the time it takes to complete heals using number of threads
> as
> >> >> you
> >> >> increase the number of parallel heals from 1 to 64.
> >> >>
> >> >>>
> >> >>> On Wed, Aug 3, 2016 at 10:16 PM, Pranith Kumar Karampuri
> >> >>> <pkara...@redhat.com> wrote:
> >> >>> >
> >> >>> >
> >> >>> > On Thu, Aug 4, 2016 at 12:45 AM, Serkan Çoban
> >> >>> > <cobanser...@gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> I prefer 3.7 if it is ok for you. Can you also provide build
> >> >>> >> instructions?
> >> >>> >
> >> >>> >
> >> >>> > 3.7 should be fine. Do you use rpms/debs/anything-else?
> >> >>> >
> >> >>> >>
> >> >>> >>
> >> >>> >> On Wed, Aug 3, 2016 at 10:12 PM, Pranith Kumar Karampuri
> >> >>> >> <pkara...@redhat.com> wrote:
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > On Thu, Aug 4, 2016 at 12:37 AM, Serkan Çoban
> >> >>> >> > <cobanser...@gmail.com>
> >> >>> >> > wrote:
> >> >>> >> >>
> >> >>> >> >> Yes, but I can create 2+1(or 8+2) ec using two servers right?
> I
> >> >>> >> >> have
> >> >>> >> >> 26 disks on each server.
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > On which release-branch do you want the patch? I am testing it
> on
> >> >>> >> > master-branch now.
> >> >>> >> >
> >> >>> >> >>

[Gluster-devel] volfile init/reconfigure have been working by accident?

2016-07-13 Thread Pranith Kumar Karampuri
hi,
I wanted to remove 'get_new_dict()', 'dict_destroy()' usage through
out the code base to prevent people from using it wrong. Regression for
that patch http://review.gluster.org/13183 kept failing and I found that
the 'xl->options' dictionary is created using get_new_dict() i.e. it
doesn't have any refs. And in xlator_members_free() we try to destroy it
using dict_unref() i.e. ref count becomes '-1' and the dictionary doesn't
get destroyed. so every reconfigure is leaking dictionaries. So all the
options which use string options actually point to the values in these
dictionaries. Initially I thought we can have latest reconfigured options
dictionary also stored in new member 'xl->reconfigured_options' but the
problem is reconfigure can partially succeed leading to dilemma about which
options succeeded/failed and which dictionary to keep around. Failing in
reconfigure doesn't stop the brick. At the moment the only way out I see is
to perform [de]allocation of the string, bool(we can prevent for bool)
options, may be there are more, I need to check. But this becomes one more
big patch('fini' should GF_FREE all these options), so wondering if anyone
has any other thoughts on fixing this properly without a lot of code
changes.

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./tests/basic/afr/split-brain-favorite-child-policy.t regressin failure on NetBSD

2016-07-20 Thread Pranith Kumar Karampuri
On Mon, Jul 18, 2016 at 4:18 PM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Hi,
>
> The above mentioned test has failed for the patch
> http://review.gluster.org/#/c/14927/1
> and is not related to my patch. Can someone from AFR team look into it?
>
>
> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/18132/console


The logs are removed now. But at least from the log of the run, one
possibility for this is if the option didn't take effect in shd by the time
"gluster volume heal" is executed. I need to discuss somethings with Ravi
about this, I will send a patch for this tomorrow. Thanks for the
notification Kotresh.


>
> Thanks and Regards,
> Kotresh H R
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Progress on brick multiplexing

2016-07-15 Thread Pranith Kumar Karampuri
Just went through the commit message. I think similar to attaching if we
also have detaching, then we can simulate killing of bricks in afr using
this approach may be? Even remove brick can do the same I guess.

On Sat, Jul 16, 2016 at 12:09 AM, Jeff Darcy  wrote:

> For those who don't know, "brick multiplexing" is a term some of us have
> been using to mean running multiple brick "stacks" inside a single process
> with a single protocol/server instance.  Discussion from a month or so ago
> is here:
>
>   http://www.gluster.org/pipermail/gluster-devel/2016-June/049801.html
>
> Yes, I know I need to turn that into a real feature page.  Multiplexing
> was originally scoped as a 4.0 feature, but has gained higher priority
> because many of the issues it addresses have turned out to be limiting
> factors in how many bricks or volumes we can support and people running
> container/hyperconverged systems are already chafing under those limits.
> In response, I've been working on this feature recently.  I've just pushed
> a patch, which is far enough along to pass our smoke test.
>
>   http://review.gluster.org/#/c/14763/
>
> While it does pass smoke, I know it would fail spectacularly in a full
> regression test - especially tests that involve killing bricks.  There's
> still a *ton* of work to be done on this.  However, having this much of the
> low-level infrastructure working gives me hope that work on the
> higher-level parts can proceed more swiftly.  Interested parties are
> invited to check out the patch and suggest improvements.  Thanks!
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Progress on brick multiplexing

2016-07-15 Thread Pranith Kumar Karampuri
Cool

On Sat, Jul 16, 2016 at 8:13 AM, Jeff Darcy  wrote:

> > Just went through the commit message. I think similar to attaching if we
> also
> > have detaching, then we can simulate killing of bricks in afr using this
> > approach may be?
>
> Yes, that's pretty much the plan.  Some work to add the new RPC and
> handler,
> a bit more to make the test libraries use it instead of killing processes,
> and voila.  Nothing to it.  ;)
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Pranith Kumar Karampuri
On Mon, Jun 27, 2016 at 2:38 PM, Manoj Pillai <mpil...@redhat.com> wrote:

>
>
> - Original Message -
> > From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> > To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > Sent: Monday, June 27, 2016 12:48:49 PM
> > Subject: Re: [Gluster-devel] performance issues Manoj found in EC testing
> >
> >
> >
> > - Original Message -
> > > From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > > To: "Xavier Hernandez" <xhernan...@datalab.es>
> > > Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > > Sent: Monday, June 27, 2016 12:42:35 PM
> > > Subject: Re: [Gluster-devel] performance issues Manoj found in EC
> testing
> > >
> > >
> > >
> > > On Mon, Jun 27, 2016 at 11:52 AM, Xavier Hernandez <
> xhernan...@datalab.es
> > > >
> > > wrote:
> > >
> > >
> > > Hi Manoj,
> > >
> > > I always enable client-io-threads option for disperse volumes. It
> improves
> > > performance sensibly, most probably because of the problem you have
> > > detected.
> > >
> > > I don't see any other way to solve that problem.
> > >
> > > I agree. Updated the bug with same info.
> > >
> > >
> > >
> > > I think it would be a lot better to have a true thread pool (and maybe
> an
> > > I/O
> > > thread pool shared by fuse, client and server xlators) in libglusterfs
> > > instead of the io-threads xlator. This would allow each xlator to
> decide
> > > when and what should be parallelized in a more intelligent way, since
> > > basing
> > > the decision solely on the fop type seems too simplistic to me.
> > >
> > > In the specific case of EC, there are a lot of operations to perform
> for a
> > > single high level fop, and not all of them require the same priority.
> Also
> > > some of them could be executed in parallel instead of sequentially.
> > >
> > > I think it is high time we actually schedule(for which release) to get
> this
> > > in gluster. May be you should send out a doc where we can work out
> details?
> > > I will be happy to explore options to integrate io-threads,
> syncop/barrier
> > > with this infra based on the design may be.
> >
> > +1. I can volunteer too.
>
> Thanks, folks! As a quick update, throughput on a single client test jumped
> from ~180 MB/s to 700+MB/s after enabling client-io-threads. Throughput is
> now more in line with what is expected for this workload based on
> back-of-the-envelope calculations.
>
> Are there any reservations about recommending client-io-threads=on as
> "default" tuning, until the enhancement discussed above becomes reality?
>

The only thing I can think of is possible races we may have to address
after enabling this option. So I would let it bake on master for a while
with this as default may be?


> -- Manoj
>
> >
> > >
> > >
> > >
> > > Xavi
> > >
> > >
> > > On 25/06/16 19:42, Manoj Pillai wrote:
> > >
> > >
> > >
> > > - Original Message -
> > >
> > >
> > > From: "Pranith Kumar Karampuri" < pkara...@redhat.com >
> > > To: "Xavier Hernandez" < xhernan...@datalab.es >
> > > Cc: "Manoj Pillai" < mpil...@redhat.com >, "Gluster Devel" <
> > > gluster-devel@gluster.org >
> > > Sent: Thursday, June 23, 2016 8:50:44 PM
> > > Subject: performance issues Manoj found in EC testing
> > >
> > > hi Xavi,
> > > Meet Manoj from performance team Redhat. He has been testing EC
> > > performance in his stretch clusters. He found some interesting things
> we
> > > would like to share with you.
> > >
> > > 1) When we perform multiple streams of big file writes(12 parallel dds
> I
> > > think) he found one thread to be always hot (99%CPU always). He was
> asking
> > > me if fuse_reader thread does any extra processing in EC compared to
> > > replicate. Initially I thought it would just lock and epoll threads
> will
> > > perform the encoding but later realized that once we have the lock and
> > > version details, next writes on the file would be encoded in the same
> > > thread that comes to EC. wri

Re: [Gluster-devel] How to solve the FSYNC() ERR

2016-07-10 Thread Pranith Kumar Karampuri
Is it possible to share the test you are running? As per your volume,
o-direct is not enabled on your volume, i.e. the file shouldn't be opened
with o-direct but as per the logs it is giving Invalid Argument as if there
is something wrong with the arguments when we do o-direct write with wrong
size. so I would like to test out why exactly is it giving this problem.
Please note that for o-direct write to succeed, both offset and size should
be page-aligned, something like multiple of 512 is one way to check it.

On Sun, Jul 10, 2016 at 5:19 PM, Keiviw  wrote:

> My volume info:
>
> Volume Name: test
> Type: Distribute
> Volume ID: 9294b122-d81e-4b12-9b5c-46e89ee0e40b
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: compute2:/home/brick1
> Brick2: compute2:/home/brick2
> Options Reconfigured:
> performance.flush-behind: off
> storage.linux-aio: off
> My brick logs(I have cleaned up the history log):
> [2016-07-10 11:42:50.577683] E [posix.c:2128:posix_writev]
> 0-test-posix: write failed: offset 0, Invalid argument
> [2016-07-10 11:42:50.577735] I
> [server3_1-fops.c:1414:server_writev_cbk] 0-test-server: 8569840: WRITEV 5
> (526a3118-9994-429e-afc0-4aa063606bde) ==> -1 (Invalid argument)
> [2016-07-10 11:42:54.583038] E [posix.c:2128:posix_writev]
> 0-test-posix: write failed: offset 0, Invalid argument
> [2016-07-10 11:42:54.583080] I
> [server3_1-fops.c:1414:server_writev_cbk] 0-test-server: 8569870: WRITEV 5
> (c3d28f34-8f43-446d-8d0b-80841ae8ec5b) ==> -1 (Invalid argument)
> My mnt-test-.logs:
> [2016-07-10 11:42:50.577816] W
> [client3_1-fops.c:876:client3_1_writev_cbk] 0-test-client-1: remote
> operation failed: Invalid argument
> [2016-07-10 11:42:50.578508] W [fuse-bridge.c:968:fuse_err_cbk]
> 0-glusterfs-fuse: 12398282: FSYNC() ERR => -1 (Invalid argument)
> [2016-07-10 11:42:54.583156] W
> [client3_1-fops.c:876:client3_1_writev_cbk] 0-test-client-1: remote
> operation failed: Invalid argument
> [2016-07-10 11:42:54.583762] W [fuse-bridge.c:968:fuse_err_cbk]
> 0-glusterfs-fuse: 12398317: FSYNC() ERR => -1 (Invalid argument)
>
>
>
>
>
>
> 在 2016-07-10 19:18:18,"Krutika Dhananjay"  写道:
>
>
> To me it looks like a case of a flush triggering a write() that was cached
> by write-behind and because the write buffer
> did not meet the page alignment requirement with o-direct write, it was
> failed with EINVAL and the trigger fop - i.e., flush() was failed with the
> 'Invalid argument' error code.
>
> Could you attach the brick logs as well, so that we can confirm the theory?
>
> -Krutika
>
> On Sat, Jul 9, 2016 at 9:31 PM, Atin Mukherjee 
> wrote:
>
>> Pranith/Krutika,
>>
>> Your inputs please, IIRC we'd need to turn on some o_direct option here?
>>
>>
>> On Saturday 9 July 2016, Keiviw  wrote:
>>
>>> The errors also occured in GlusterFS 3.6.7,I just add the O_DIRECT flag
>>> in client protocol open() and create()! How to explain and solve the
>>> problem?
>>>
>>> 发自 网易邮箱大师 
>>> On 07/09/2016 17:58, Atin Mukherjee wrote:
>>>
>>> Any specific reason of using 3.3 given that its really quite old? We are
>>> at 3.6, 3.7 & 3.8 supportability matrix now.
>>>
>>>
>>> On Saturday 9 July 2016, Keiviw  wrote:
>>>
 hi,
 I have installed GlusterFS 3.3.0, and now I get Fsync failures when
 saving files with the O_DIRECT flag in open() and create().
 1, I tried to save a flie in vi and got this error:
 "test" E667: Fsync failed
 2, I see this in the logs:
 [2016-07-07 14:20:10.325400] W [fuse-bridge.c:968:fuse_err_cbk]
 0-glusterfs-fuse: 102: FSYNC() ERR => -1 (Invalid argument)
 [2016-07-07 14:20:13.930384] W [fuse-bridge.c:968:fuse_err_cbk]
 0-glusterfs-fuse: 137: FSYNC() ERR => -1 (Invalid argument)
 [2016-07-07 14:20:51.199448] W [fuse-bridge.c:968:fuse_err_cbk]
 0-glusterfs-fuse: 174: FLUSH() ERR => -1 (Invalid argument)
 [2016-07-07 14:21:32.804738] W [fuse-bridge.c:968:fuse_err_cbk]
 0-glusterfs-fuse: 206: FLUSH() ERR => -1 (Invalid argument)
 [2016-07-07 14:21:43.702146] W [fuse-bridge.c:968:fuse_err_cbk]
 0-glusterfs-fuse: 276: FSYNC() ERR => -1 (Invalid argument)
 [2016-07-07 14:21:51.296809] W [fuse-bridge.c:968:fuse_err_cbk]
 0-glusterfs-fuse: 314: FSYNC() ERR => -1 (Invalid argument)
 [2016-07-07 14:21:54.062687] W [fuse-bridge.c:968:fuse_err_cbk]
 0-glusterfs-fuse: 349: FSYNC() ERR => -1 (Invalid argument)
 [2016-07-07 14:22:54.678960] W [fuse-bridge.c:968:fuse_err_cbk]
 0-glusterfs-fuse: 429: FSYNC() ERR => -1 (Invalid argument)
 [2016-07-07 14:24:35.546980] W [fuse-bridge.c:968:fuse_err_cbk]
 0-glusterfs-fuse: 505: 

Re: [Gluster-devel] Reducing merge conflicts

2016-07-07 Thread Pranith Kumar Karampuri
+Nigel

On Fri, Jul 8, 2016 at 7:42 AM, Pranith Kumar Karampuri <pkara...@redhat.com
> wrote:

> What gets measured gets managed. It is good that you started this thread.
> Problem is two fold. We need a way to first find people who are reviewing a
> lot and give them more karma points in the community by encouraging that
> behaviour(making these stats known to public lets say in monthly news
> letter is one way). It is equally important to review patches when you
> compare it to sending patches. What I have seen at least is that it is easy
> to find people who sent patches, how many patches someone sent in a month
> etc. There is no easy way to get these numbers for reviews. 'Reviewed-by'
> tag in commit only includes the people who did +1/+2 on the final revision
> of the patch, which is bad. So I feel that is the first problem to be
> solved if we have to get better at this. Once I know how I am doing on a
> regular basis in this aspect I am sure I will change my ways to contribute
> better in this aspect. I would love to know what others think about this
> too.
>

Would it be possible for you to get this data using some script may be? I
think we do have apis?


>
> On Fri, Jul 8, 2016 at 2:02 AM, Jeff Darcy <jda...@redhat.com> wrote:
>
>> I'm sure a lot of you are pretty frustrated with how long it can take to
>> get even a trivial patch through our Gerrit/Jenkins pipeline.  I know I
>> am.  Slow tests, spurious failures, and bikeshedding over style issues are
>> all contributing factors.  I'm not here to talk about those today.  What I
>> am here to talk about is the difficulty of getting somebody - anybody - to
>> look at a patch and (possibly) give it the votes it needs to be merged.  To
>> put it bluntly, laziness here is *killing* us.  The more patches we have in
>> flight, the more merge conflicts and rebases we have to endure for each
>> one.  It's a quadratic effect.  That's why I personally have been trying
>> really hard to get patches that have passed all regression tests and
>> haven't gotten any other review attention "across the finish line" so they
>> can be merged and removed from conflict with every other patch still in
>> flight.  The search I use for this, every day, is as follows:
>>
>>
>> http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+label:CentOS-regression%253E0+label:NetBSD-regression%253E0+-label:Code-Review%253C0
>>
>> That is:
>>
>> open patches on glusterfs master (change project/branch as
>> appropriate to your role)
>>
>> CentOS and NetBSD regression tests complete
>>
>> no -1 or -2 votes which might represent legitimate cause for delay
>>
>> If other people - especially team leads and release managers - could make
>> a similar habit of checking the queue and helping to get such "low hanging
>> fruit" out of the way, we might see an appreciable increase in our overall
>> pace of development.  If not, we might have to start talking about
>> mandatory reviews with deadlines and penalties for non-compliance.  I'm
>> sure nobody wants to see their own patches blocked and their own deadlines
>> missed because they weren't doing their part to review peers' work, but
>> that's a distinct possibility.  Let's all try to get this train unstuck and
>> back on track before extreme measures become necessary.
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Reducing merge conflicts

2016-07-07 Thread Pranith Kumar Karampuri
What gets measured gets managed. It is good that you started this thread.
Problem is two fold. We need a way to first find people who are reviewing a
lot and give them more karma points in the community by encouraging that
behaviour(making these stats known to public lets say in monthly news
letter is one way). It is equally important to review patches when you
compare it to sending patches. What I have seen at least is that it is easy
to find people who sent patches, how many patches someone sent in a month
etc. There is no easy way to get these numbers for reviews. 'Reviewed-by'
tag in commit only includes the people who did +1/+2 on the final revision
of the patch, which is bad. So I feel that is the first problem to be
solved if we have to get better at this. Once I know how I am doing on a
regular basis in this aspect I am sure I will change my ways to contribute
better in this aspect. I would love to know what others think about this
too.

On Fri, Jul 8, 2016 at 2:02 AM, Jeff Darcy  wrote:

> I'm sure a lot of you are pretty frustrated with how long it can take to
> get even a trivial patch through our Gerrit/Jenkins pipeline.  I know I
> am.  Slow tests, spurious failures, and bikeshedding over style issues are
> all contributing factors.  I'm not here to talk about those today.  What I
> am here to talk about is the difficulty of getting somebody - anybody - to
> look at a patch and (possibly) give it the votes it needs to be merged.  To
> put it bluntly, laziness here is *killing* us.  The more patches we have in
> flight, the more merge conflicts and rebases we have to endure for each
> one.  It's a quadratic effect.  That's why I personally have been trying
> really hard to get patches that have passed all regression tests and
> haven't gotten any other review attention "across the finish line" so they
> can be merged and removed from conflict with every other patch still in
> flight.  The search I use for this, every day, is as follows:
>
>
> http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+label:CentOS-regression%253E0+label:NetBSD-regression%253E0+-label:Code-Review%253C0
>
> That is:
>
> open patches on glusterfs master (change project/branch as appropriate
> to your role)
>
> CentOS and NetBSD regression tests complete
>
> no -1 or -2 votes which might represent legitimate cause for delay
>
> If other people - especially team leads and release managers - could make
> a similar habit of checking the queue and helping to get such "low hanging
> fruit" out of the way, we might see an appreciable increase in our overall
> pace of development.  If not, we might have to start talking about
> mandatory reviews with deadlines and penalties for non-compliance.  I'm
> sure nobody wants to see their own patches blocked and their own deadlines
> missed because they weren't doing their part to review peers' work, but
> that's a distinct possibility.  Let's all try to get this train unstuck and
> back on track before extreme measures become necessary.
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Reducing merge conflicts

2016-07-07 Thread Pranith Kumar Karampuri
On Fri, Jul 8, 2016 at 8:40 AM, Jeff Darcy  wrote:

> > What gets measured gets managed.
>
> Exactly.  Reviewing is part of everyone's job, but reviews aren't tracked
> in any way that matters.  Contrast that with the *enormous* pressure most
> of us are under to get our own patches in, and it's pretty predictable
> what will happen.  We need to change that calculation.
>
>
> > What I have seen at least is that it is easy to find
> > people who sent patches, how many patches someone sent in a month etc.
> There
> > is no easy way to get these numbers for reviews. 'Reviewed-by' tag in
> commit
> > only includes the people who did +1/+2 on the final revision of the
> patch,
> > which is bad.
>
> That's a very good point.  I think people people who comment also get
> Reviewed-by: lines, but it doesn't matter because there's still a whole
> world of things completely outside of Gerrit.  Reviews done by email won't
> get counted, nor will consultations in the hallway or on IRC.  I have some
> ideas who's most active in those ways.  Some (such as yourself) show up in
> the Reviewed-by: statistics.  Others do not.  In terms of making sure
> people get all the credit they deserve, those things need to be counted
> too.  However, in terms of *getting the review queue unstuck* I'm not so
> sure.  What matters for that is the reviews that Gerrit uses to determine
> merge eligibility, so I think encouraging that specific kind of review
> still moves us in a positive direction.
>

In my experience at least it was only adding 'reviewied-by' for the people
who gave +1/+2 on the final version of the patch

I agree about encouraging specific kind of review. At the same time we need
to make reviewing, helping users in the community as important as sending
patches in the eyes of everyone. It is very important to know these
statistics to move in the right direction. My main problem with this is,
everyone knows that reviews are important, then why are they not happening?
Is it really laziness? Are we sure if there are people in the team who are
not sharing the burden because of which it is becoming too much for 1 or 2
people to handle the total load? All these things become very easy to
reason about if we have this data. Then I am sure we can easily find how
best to solve this issue. Same goes for spurious failures. These are not
problems that are not faced by others in the world either. I remember
watching a video where someone shared (I think it was in google) that they
started putting giant TVs in the hall-way in all the offices and the people
who don't attend to spurious-build-failure problems would show up on the
screen for everyone in the world to see. Apparently the guy with the
biggest picture(the one who was not attending to any build failures at all
I guess) came to these folks and asked how should he get his picture
removed from the screen, and it was solved in a day or two. We don't have
to go to those lengths, but we do need data to nudge people in the right
direction.


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Glusterfs-3.7.13 release plans

2016-07-07 Thread Pranith Kumar Karampuri
Could you take in http://review.gluster.org/#/c/14598/ as well? It is ready
for merge.

On Thu, Jul 7, 2016 at 3:02 PM, Atin Mukherjee  wrote:

> Can you take in http://review.gluster.org/#/c/14861 ?
>
>
> On Thursday 7 July 2016, Kaushal M  wrote:
>
>> On Thu, Jun 30, 2016 at 11:08 AM, Kaushal M  wrote:
>> > Hi all,
>> >
>> > I'm (or was) planning to do a 3.7.13 release on schedule today. 3.7.12
>> > has a huge issue with libgfapi, solved by [1].
>> > I'm not sure if this fixes the other issues with libgfapi noticed by
>> > Lindsay on gluster-users.
>> >
>> > This patch has been included in the packages 3.7.12 built for CentOS,
>> > Fedora, Ubuntu, Debian and SUSE. I guess Lindsay is using one of these
>> > packages, so it might be that the issue seen is new. So I'd like to do
>> > a quick release once we have a fix.
>> >
>> > Maintainers can merge changes into release-3.7 that follow the
>> > criteria given in [2]. Please make sure to add the bugs for patches
>> > you are merging are added as dependencies for the 3.7.13 tracker bug
>> > [3].
>> >
>>
>> I've just merged the fix for the gfapi breakage into release-3.7, and
>> hope to tag 3.7.13 soon.
>>
>> The current head for release-3.7 is commit bddf6f8. 18 patches have
>> been merged since 3.7.12 for the following components,
>>  - gfapi
>>  - nfs (includes ganesha related changes)
>>  - glusterd/cli
>>  - libglusterfs
>>  - fuse
>>  - build
>>  - geo-rep
>>  - afr
>>
>> I need and acknowledgement from the maintainers of the above
>> components that they are ready.
>> If any maintainers know of any other issues, please reply here. We'll
>> decide how to address them for this release here.
>>
>> Also, please don't merge anymore changes into release-3.7. If you need
>> to get something merged, please inform me.
>>
>> Thanks,
>> Kaushal
>>
>> > Thanks,
>> > Kaushal
>> >
>> > [1]: https://review.gluster.org/14822
>> > [2]: https://public.pad.fsfe.org/p/glusterfs-release-process-201606
>> > under the GlusterFS minor release heading
>> > [3]: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.7.13
>> ___
>> maintainers mailing list
>> maintain...@gluster.org
>> http://www.gluster.org/mailman/listinfo/maintainers
>>
>
>
> --
> Atin
> Sent from iPhone
>
> ___
> maintainers mailing list
> maintain...@gluster.org
> http://www.gluster.org/mailman/listinfo/maintainers
>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Reducing merge conflicts

2016-07-08 Thread Pranith Kumar Karampuri
On Fri, Jul 8, 2016 at 11:23 AM, Poornima Gurusiddaiah 
wrote:

>
> Completely agree with your concern here. Keeping aside the regression
> part, few observations and suggestions:
> As per the Maintainers guidelines (
> https://gluster.readthedocs.io/en/latest/Contributors-Guide/Guidelines-For-Maintainers/
> ):
>
> a> Merge patches of owned components only.
> b> Seek approvals from all maintainers before merging a patchset
> spanning multiple components.
> c> Ensure that regression tests pass for all patches before merging.
> d> Ensure that regression tests accompany all patch submissions.
> e> Ensure that documentation is updated for a noticeable change in
> user perceivable behavior or design.
> f> Encourage code unit tests from patch submitters to improve the
> overall quality of the codebase.
> g> Not merge patches written by themselves until there is a +2 Code
> Review vote by other reviewers.
>
> Clearly a, b, are not being strictly followed, because of multiple reasons.
> - Not every component in Gluster has a Maintainer
>

We need to fix this. Do you want to take up the task of coming up with a
list?


> - Its getting difficult to get review time from maintainers as they are
> maintainers for several component, and they are also active developers.
>

Is it your experience that the patch is not at all getting a single review
or there are no other people who can review? In my experience even when
there were other people who could do the reviews people want to lean on
maintainers to do the initial reviews because they would find most of the
problems in the first review. I am guilty of leaning on the main
maintainers too :-(. If this happens others in the team won't improve in
finding issues in reviewing others'/their own patches. Did you guys already
solve this problem in the components you are working on? What are you guys
doing for improving in reviews/get more participation? In our team both
Krutika and Ravi frequent top-10 people who send patches per month, so it
was too much for 1 maintainer to take this kind of load. Everyone in the
team started reviewing the patches and giving +1 and I am reviewing only
after a +1. It still feels a bit skewed though.

- What is enforced by mere documentation of procedure, is hard to implement.
>
> Below are the few things that we can do to reduce our review backlog:
> - No time for maintainers to review is not a good enough reason to bitrot
> patches in review for months, it clearly means we need additional
> maintainers for that component?

- Add maintainers for every component that is in Gluster(atleast the ones
> which have incoming patches)
> - For every patch we submit we add 'component(s)' label, and evaluate if
> gerrit can automaticallyIn our team both Krutika and Ravi frequent top-10
> people who send patches per month, so it was too much for 1 maintainer to
> take this kind of load. Everyone in the team started reviewing the patches
> and giving +1 and I am reviewing only after a +1. My hope is this will lead
> to faster patch acceptance over time. add maintainers as reviewers, and
> have another label 'Maintainers ack' which needs to be present for any
> patch to be merged.
> - Before every major(or minor also?) release, any patch that is not making
> to the release should have a '-1' by the maintainer or the developer
> themselves stating the reason(preferably not no time to review).
>   The release manager should ensure that there are no patches in below
> gerrit search link provided by Jeff.
>
> Any thoughts?
>

I am in favour of more people knowing more components in the
stack(preferably more projects). What I have seen from my experience is
that you would be able to come up with solutions fast because you would
have seen the problem solved in different ways in these different
components/projects. Reviewing is one way to gain more knowledge about a
different component. Ashish surprises me with his reviews sometimes even
when he doesn't know much about the component he is reviewing. So how can
we encourage more people to pick up new components? Do you have any ideas?
Getting more reviews will be a very small problem if we have more
knowledgeable people per component.


> Regards,
> Poornima
>
> - Original Message -
> > From: "Jeff Darcy" 
> > To: "Gluster Devel" 
> > Sent: Friday, July 8, 2016 2:02:27 AM
> > Subject: [Gluster-devel] Reducing merge conflicts
> >
> > I'm sure a lot of you are pretty frustrated with how long it can take to
> get
> > even a trivial patch through our Gerrit/Jenkins pipeline.  I know I am.
> > Slow tests, spurious failures, and bikeshedding over style issues are all
> > contributing factors.  I'm not here to talk about those today.  What I am
> > here to talk about is the difficulty of getting somebody - anybody - to
> look
> > at a patch and (possibly) give it the votes it needs to be merged.  To
> put
> > it 

Re: [Gluster-devel] Non Shared Persistent Gluster Storage with Kubernetes

2016-07-06 Thread Pranith Kumar Karampuri
On Wed, Jul 6, 2016 at 12:24 AM, Shyam  wrote:

> On 07/01/2016 01:45 AM, B.K.Raghuram wrote:
>
>> I have not gone through this implementation nor the new iscsi
>> implementation being worked on for 3.9 but I thought I'd share the
>> design behind a distributed iscsi implementation that we'd worked on
>> some time back based on the istgt code with a libgfapi hook.
>>
>> The implementation used the idea of using one file to represent one
>> block (of a chosen size) thus allowing us to use gluster as the backend
>> to store these files while presenting a single block device of possibly
>> infinite size. We used a fixed file naming convention based on the block
>> number which allows the system to determine which file(s) needs to be
>> operated on for the requested byte offset. This gave us the advantage of
>> automatically accessing all of gluster's file based functionality
>> underneath to provide a fully distributed iscsi implementation.
>>
>> Would this be similar to the new iscsi implementation thats being worked
>> on for 3.9?
>>
>
> 
>
> Ultimately the idea would be to use sharding, as a part of the gluster
> volume graph, to distribute the blocks (or rather shard the blocks), rather
> than having the disk image on one distribute subvolume and hence scale disk
> sizes to the size of the cluster. Further, sharding should work well here,
> as this is a single client access case (or are we past that hurdle
> already?).
>

Not yet, we need common transaction frame in place to reduce the latency
for synchronization.


>
> What this achieves is similar to the iSCSI implementation that you talk
> about, but gluster doing the block splitting and hence distribution, rather
> than the iSCSI implementation (istgt) doing the same.
>
> < I did a cursory check on the blog post, but did not find a shard
> reference, so maybe others could pitch in here, if they know about the
> direction>
>

There are two directions which will eventually converge.
1) Granular data self-heal implementation so that taking snapshot becomes
as simple as reflink.
2) Bring in snapshots of file with shards - this is a bit involved compared
to the solution above.

Once 2) is also complete we will have both 1) + 2) combined so that
data-self-heal will heal the exact blocks inside each shard.

If the users are not worried about snapshots 2) is the best option.


> Further, in your original proposal, how do you maintain device properties,
> such as size of the device and used/free blocks? I ask about used and free,
> as that is an overhead to compute, if each block is maintained as a
> separate file by itself, or difficult to achieve consistency of the size
> and block update (as they are separate operations). Just curious.
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-23 Thread Pranith Kumar Karampuri
If someone could give +1 on 3.7 backport http://review.gluster.org/#/c/14991,
I can merge the patch. Then we can start rebasing may be?

On Sat, Jul 23, 2016 at 12:23 PM, Atin Mukherjee <amukh...@redhat.com>
wrote:

> AFAIK, an explicit rebase is required.
>
>
> On Saturday 23 July 2016, Pranith Kumar Karampuri <pkara...@redhat.com>
> wrote:
>
>>
>>
>> On Sat, Jul 23, 2016 at 10:17 AM, Nithya Balachandran <
>> nbala...@redhat.com> wrote:
>>
>>>
>>>
>>> On Sat, Jul 23, 2016 at 9:45 AM, Nithya Balachandran <
>>> nbala...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jul 22, 2016 at 9:07 PM, Pranith Kumar Karampuri <
>>>> pkara...@redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 22, 2016 at 8:12 PM, Pranith Kumar Karampuri <
>>>>> pkara...@redhat.com> wrote:
>>>>>
>>>>>> I am playing with the following diff, let me see.
>>>>>>
>>>>>> diff --git a/tests/volume.rc b/tests/volume.rc
>>>>>> index 331a802..b288508 100644
>>>>>> --- a/tests/volume.rc
>>>>>> +++ b/tests/volume.rc
>>>>>> @@ -579,7 +579,9 @@ function num_graphs
>>>>>>  function get_aux()
>>>>>>  {
>>>>>>  ##Check if a auxiliary mount is there
>>>>>> -df -h 2>&1 | sed 's#/build/install##' | grep -e
>>>>>> "[[:space:]]/run/gluster/${V0}$" -e "[[:space:]]/var/run/gluster/${V0}$" 
>>>>>> -
>>>>>> +local rundir=$(gluster --print-statedumpdir)
>>>>>> +local pid=$(cat ${rundir}/${V0}.pid)
>>>>>> +pidof glusterfs 2>&1 | grep -w $pid
>>>>>>
>>>>>>  if [ $? -eq 0 ]
>>>>>>  then
>>>>>>
>>>>>
>>>>> Based on what I saw in code, this seems to get the job done. Comments
>>>>> welcome:
>>>>> http://review.gluster.org/14988
>>>>>
>>>>>
>>>> Nice work Pranith :)
>>>> All, once this is backported to release-3.7, any patches on release-3.7
>>>> patches will need to be rebased so they will pass the NetBSD regression.
>>>>
>>>
>>> I am suddenly confused about this - will the patches need to be rebased
>>> or with the next run automatically include the changes once Pranith's fix
>>> is merged?
>>>
>>
>> May be someone more knowledgeable about this should confirm this, but at
>> least from the build-log, I don't see any rebase command being executed
>> with origin/master:
>>
>> *04:07:36* Triggered by Gerrit: http://review.gluster.org/13762*04:07:36* 
>> Building remotely on slave26.cloud.gluster.org 
>> <https://build.gluster.org/computer/slave26.cloud.gluster.org> (smoke_tests 
>> rackspace_regression_2gb glusterfs-devrpms) in workspace 
>> /home/jenkins/root/workspace/rackspace-regression-2GB-triggered*04:07:36*  > 
>> git rev-parse --is-inside-work-tree # timeout=10*04:07:36* Fetching changes 
>> from the remote Git repository*04:07:36*  > git config remote.origin.url 
>> git://review.gluster.org/glusterfs.git # timeout=10*04:07:36* Fetching 
>> upstream changes from git://review.gluster.org/glusterfs.git*04:07:36*  > 
>> git --version # timeout=10*04:07:36*  > git -c core.askpass=true fetch 
>> --tags --progress git://review.gluster.org/glusterfs.git 
>> refs/changes/62/13762/4*04:07:44*  > git rev-parse 
>> 838b5c34127edd0450b0449e38f075f56056f2c7^{commit} # timeout=10*04:07:44* 
>> Checking out Revision 838b5c34127edd0450b0449e38f075f56056f2c7 
>> (master)*04:07:44*  > git config core.sparsecheckout # timeout=10*04:07:44*  
>> > git checkout -f 838b5c34127edd0450b0449e38f075f56056f2c7*04:07:45*  > git 
>> rev-parse FETCH_HEAD^{commit} # timeout=10*04:07:45*  > git rev-list 
>> 8cbee639520bf4631ce658e2da9b4bc3010d2eaa # timeout=10*04:07:45*  > git tag 
>> -a -f -m Jenkins Build #22315 
>> jenkins-rackspace-regression-2GB-triggered-22315 # timeout=10
>>
>>
>>
>>
>>>
>>>>>>
>>>>>> On Fri, Jul 22, 2016 at 7:44 PM, Nithya Balachandran <
>>>>>> nbala...@redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 22, 2016 at 7:42 PM, Pranith Kumar Karampuri <

[Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Pranith Kumar Karampuri
Does anyone know why GF_PARENT_DOWN is not triggered on SIGKILL? It will
give a chance for xlators to do any cleanup they need to do. For example ec
can complete the delayed xattrops.

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Pranith Kumar Karampuri
Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So xavi
and I were wondering why cleanup_and_exit() is not sending GF_PARENT_DOWN
event.

On Fri, Jul 22, 2016 at 6:24 PM, Jeff Darcy  wrote:

> > Does anyone know why GF_PARENT_DOWN is not triggered on SIGKILL? It will
> give
> > a chance for xlators to do any cleanup they need to do. For example ec
> can
> > complete the delayed xattrops.
>
> Nothing is triggered on SIGKILL.  SIGKILL is explicitly defined to
> terminate a
> process *immediately*.  Among other things, this means it can not be
> ignored or
> caught, to preclude handlers doing something that might delay termination.
>
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>
> Since at least 4.2BSD and SVr2 (the first version of UNIX that I worked on)
> there have even been distinct kernel code paths to ensure special handling
> of
> SIGKILL.  There's nothing we can do about SIGKILL except be prepared to
> deal
> with it the same way we'd deal with the entire machine crashing.
>
> If you mean why is there nothing we can do on a *server* in response to
> SIGKILL on a *client*, that's a slightly more interesting question.  It's
> possible that the unique nature of SIGKILL puts connections into a
> different state than either system failure (on the more abrupt side) or
> clean shutdown (less abrupt).  If so, we probably need to take a look at
> the socket/RPC code or perhaps even protocol/server to see why these
> connections are not being cleaned up and shut down in a timely fashion.
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Pranith Kumar Karampuri
It is only calling fini() apart from that not much.

On Fri, Jul 22, 2016 at 6:36 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So
> xavi and I were wondering why cleanup_and_exit() is not sending
> GF_PARENT_DOWN event.
>
> On Fri, Jul 22, 2016 at 6:24 PM, Jeff Darcy <jda...@redhat.com> wrote:
>
>> > Does anyone know why GF_PARENT_DOWN is not triggered on SIGKILL? It
>> will give
>> > a chance for xlators to do any cleanup they need to do. For example ec
>> can
>> > complete the delayed xattrops.
>>
>> Nothing is triggered on SIGKILL.  SIGKILL is explicitly defined to
>> terminate a
>> process *immediately*.  Among other things, this means it can not be
>> ignored or
>> caught, to preclude handlers doing something that might delay termination.
>>
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>>
>> Since at least 4.2BSD and SVr2 (the first version of UNIX that I worked
>> on)
>> there have even been distinct kernel code paths to ensure special
>> handling of
>> SIGKILL.  There's nothing we can do about SIGKILL except be prepared to
>> deal
>> with it the same way we'd deal with the entire machine crashing.
>>
>> If you mean why is there nothing we can do on a *server* in response to
>> SIGKILL on a *client*, that's a slightly more interesting question.  It's
>> possible that the unique nature of SIGKILL puts connections into a
>> different state than either system failure (on the more abrupt side) or
>> clean shutdown (less abrupt).  If so, we probably need to take a look at
>> the socket/RPC code or perhaps even protocol/server to see why these
>> connections are not being cleaned up and shut down in a timely fashion.
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Pranith Kumar Karampuri
http://review.gluster.org/14980, this is where we have all the context
about why I sent out this mail. Basically the tests were failing because
umount is racing with version-updation xattrop. While I fixed the test to
handle that race, xavi was wondering why GF_PARENT_DOWN event didn't come.
I found that in cleanup_and_exit() we don't send this event. We are only
calling 'fini()'. So wondering if any one knows why this is so.

On Fri, Jul 22, 2016 at 6:37 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> It is only calling fini() apart from that not much.
>
> On Fri, Jul 22, 2016 at 6:36 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So
>> xavi and I were wondering why cleanup_and_exit() is not sending
>> GF_PARENT_DOWN event.
>>
>> On Fri, Jul 22, 2016 at 6:24 PM, Jeff Darcy <jda...@redhat.com> wrote:
>>
>>> > Does anyone know why GF_PARENT_DOWN is not triggered on SIGKILL? It
>>> will give
>>> > a chance for xlators to do any cleanup they need to do. For example ec
>>> can
>>> > complete the delayed xattrops.
>>>
>>> Nothing is triggered on SIGKILL.  SIGKILL is explicitly defined to
>>> terminate a
>>> process *immediately*.  Among other things, this means it can not be
>>> ignored or
>>> caught, to preclude handlers doing something that might delay
>>> termination.
>>>
>>>
>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>>>
>>> Since at least 4.2BSD and SVr2 (the first version of UNIX that I worked
>>> on)
>>> there have even been distinct kernel code paths to ensure special
>>> handling of
>>> SIGKILL.  There's nothing we can do about SIGKILL except be prepared to
>>> deal
>>> with it the same way we'd deal with the entire machine crashing.
>>>
>>> If you mean why is there nothing we can do on a *server* in response to
>>> SIGKILL on a *client*, that's a slightly more interesting question.  It's
>>> possible that the unique nature of SIGKILL puts connections into a
>>> different state than either system failure (on the more abrupt side) or
>>> clean shutdown (less abrupt).  If so, we probably need to take a look at
>>> the socket/RPC code or perhaps even protocol/server to see why these
>>> connections are not being cleaned up and shut down in a timely fashion.
>>>
>>
>>
>>
>> --
>> Pranith
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Gluster Developer Summit Program Committee

2016-08-16 Thread Pranith Kumar Karampuri
I'm interested in this as well.

On Wed, Aug 17, 2016 at 12:00 AM, Amye Scavarda  wrote:

> Hi all,
> As we get closer to the CfP wrapping up (August 31, per
> http://www.gluster.org/pipermail/gluster-users/2016-August/028002.html) -
> we'll be looking for 3-4 people for the program committee to help arrange
> the schedule.
>
> Go ahead and respond here if you're interested, and I'll work to gather us
> together after September 1st.
> Thanks!
> - amye
>
>
> --
> Amye Scavarda | a...@redhat.com | Gluster Community Lead
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Spurious regression failure? tests/basic/ec/ec-background-heals.t

2017-01-26 Thread Pranith Kumar Karampuri
On Thu, Jan 26, 2017 at 7:45 PM, Ashish Pandey <aspan...@redhat.com> wrote:

>
> Xavi,
>
> shd has been disabled in this test on line number 12 and we have also
> disabled client side heal.
> So, no body is going to try to heal it.
>

Already enqueued heals should be healed. I am taking a look at it. Let's
see.


>
> Ashish
>
> --
> *From: *"Atin Mukherjee" <amukh...@redhat.com>
> *To: *"Ashish Pandey" <aspan...@redhat.com>, "Raghavendra Gowdappa" <
> rgowd...@redhat.com>, "Xavier Hernandez" <xhernan...@datalab.es>
> *Cc: *"Gluster Devel" <gluster-devel@gluster.org>
> *Sent: *Thursday, January 26, 2017 5:50:00 PM
> *Subject: *Re: [Gluster-devel] Spurious regression failure?
> tests/basic/ec/ec-background-heals.t
>
>
> I've +1ed it now.
>
> On Thu, 26 Jan 2017 at 15:05, Xavier Hernandez <xhernan...@datalab.es>
> wrote:
>
>> Hi Atin,
>>
>> I don't clearly see what's the problem. Even if the truncate causes a
>> dirty flag to be set, eventually it should be removed before the
>> $HEAL_TIMEOUT value.
>>
>> For now I've marked the test as bad.
>>
>> Patch is: https://review.gluster.org/16470
>>
>> Xavi
>>
>> On 25/01/17 17:24, Atin Mukherjee wrote:
>> > Can we please address this as early as possible, my patch has hit this
>> > failure 3 out of 4 recheck attempts now. I'm guessing some recent
>> > changes has caused it.
>> >
>> > On Wed, 25 Jan 2017 at 12:10, Ashish Pandey <aspan...@redhat.com
>> > <mailto:aspan...@redhat.com>> wrote:
>> >
>> >
>> > Pranith,
>> >
>> > In this test tests/basic/ec/ec-background-heals.t, I think the line
>> > number 86 actually creating a heal entry instead of
>> > helping data heal quickly. What if all the data was already healed
>> > at that moment, truncate came and in preop set the dirty flag and
>> at the
>> > end, as part of the heal, dirty flag was unset on previous good
>> > bricks only and the brick which acted as heal-sink still has dirty
>> > marked by truncate.
>> > That is why we are only seeing "1" as get_pending_heal_count. If a
>> > file was actually not healed it should be "2".
>> > If heal on this file completes and unset of dirty flag happens
>> > before truncate everything will be fine.
>> >
>> > I think we can wait for file to be heal without truncate?
>> >
>> >  71 #Test that disabling background-heals still drains the queue
>> >  72 TEST $CLI volume set $V0 disperse.background-heals 1
>> >  73 TEST touch $M0/{a,b,c,d}
>> >  74 TEST kill_brick $V0 $H0 $B0/${V0}2
>> >  75 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "1" mount_get_option_value
>> > $M0 $V0-disperse-0 background-heals
>> >  76 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "200"
>> > mount_get_option_value $M0 $V0-disperse-0 heal-wait-qlength
>> >  77 TEST truncate -s 1GB $M0/a
>> >  78 echo abc > $M0/b
>> >  79 echo abc > $M0/c
>> >  80 echo abc > $M0/d
>> >  81 TEST $CLI volume start $V0 force
>> >  82 EXPECT_WITHIN $CHILD_UP_TIMEOUT "3" ec_child_up_count $V0 0
>> >  83 TEST chown root:root $M0/{a,b,c,d}
>> >  84 TEST $CLI volume set $V0 disperse.background-heals 0
>> >  85 EXPECT_NOT "0" mount_get_option_value $M0 $V0-disperse-0
>> > heal-waiters
>> >
>> >  86 TEST truncate -s 0 $M0/a # This completes the heal fast ;-)
>> <<<<<<<
>> >
>> >  87 EXPECT_WITHIN $HEAL_TIMEOUT "^0$" get_pending_heal_count $V0
>> >
>> > 
>> > Ashish
>> >
>> >
>> >
>> >
>> >
>> > ---
>> -
>> > *From: *"Raghavendra Gowdappa" <rgowd...@redhat.com
>> > <mailto:rgowd...@redhat.com>>
>> >     *To: *"Nithya Balachandran" <nbala...@redhat.com
>> > <mailto:nbala...@redhat.com>>
>> > *Cc: *"Gluster Devel" <gluster-devel@gluster.org
>> > <mailto:gluster-devel@gluster.org>>, "Pranith Kumar Karampuri"
>> > <pkara...@redhat.com <mailto:pkara...@redhat.com>>, "Ashish Pandey"

Re: [Gluster-devel] tests/bitrot/bug-1373520.t is failing multiple times

2017-01-28 Thread Pranith Kumar Karampuri
It is a bug in EC name heal code path. I sent a fix but review.gluster.org
is not accessible now to paste the link here. Will send a mail again once
it is accessible.

On Fri, Jan 27, 2017 at 5:41 PM, Jeff Darcy  wrote:

> > Few of the failure links:
> >
> > https://build.gluster.org/job/centos6-regression/2934/console
> > https://build.gluster.org/job/centos6-regression/2911/console
>
> Looks familiar.  Fix (probably) here:
>
> https://review.gluster.org/#/c/14763/72/tests/bitrot/bug-1373520.t
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] tests/bitrot/bug-1373520.t is failing multiple times

2017-01-29 Thread Pranith Kumar Karampuri
Xavi, Ashish,
https://review.gluster.org/#/c/16468/ is the patch. I found that
ec_need_heal is not considering size/permission changes in backends, that
is causing spurious failures as well. I will be sending out a patch to fix
them all.

On Sat, Jan 28, 2017 at 3:56 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> It is a bug in EC name heal code path. I sent a fix but review.gluster.org
> is not accessible now to paste the link here. Will send a mail again once
> it is accessible.
>
> On Fri, Jan 27, 2017 at 5:41 PM, Jeff Darcy <jda...@redhat.com> wrote:
>
>> > Few of the failure links:
>> >
>> > https://build.gluster.org/job/centos6-regression/2934/console
>> > https://build.gluster.org/job/centos6-regression/2911/console
>>
>> Looks familiar.  Fix (probably) here:
>>
>> https://review.gluster.org/#/c/14763/72/tests/bitrot/bug-1373520.t
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] IMP: Release 3.10: RC1 Pending bugs (Need fixes by 21st Feb)

2017-02-21 Thread Pranith Kumar Karampuri
On Mon, Feb 20, 2017 at 10:26 PM, Shyam <srang...@redhat.com> wrote:

> On 02/20/2017 11:29 AM, Pranith Kumar Karampuri wrote:
>
>>
>>
>> On Mon, Feb 20, 2017 at 7:57 PM, Pranith Kumar Karampuri
>> <pkara...@redhat.com <mailto:pkara...@redhat.com>> wrote:
>>
>>
>>
>> On Mon, Feb 20, 2017 at 8:25 AM, Shyam <srang...@redhat.com
>> <mailto:srang...@redhat.com>> wrote:
>>
>> Hi,
>>
>> RC1 tagging is *tentatively* scheduled for 21st Feb, 2017
>>
>> The intention is that RC1 becomes the release, hence we would
>> like to chase down all blocker bugs [1] and get them fixed
>> before RC1 is tagged.
>>
>> This mail requests information on the various bugs and to
>> understand if it is possible to get them fixed done by the 21st.
>>
>>   3) Bug 1421956 - Disperse: Fallback to pre-compiled code
>> execution when dynamic code generation fails
>> - Status: Awaiting review closure
>> - *Pranith/Ashish*, request one of you to close the review
>> on this one, so that Xavi can backport this to 3.10
>> - master bug:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1421955
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1421955>
>>   - Review: https://review.gluster.org/16614
>> <https://review.gluster.org/16614>
>>
>>   6) Bug 1423385 - Crash in index xlator because of race in
>> inode_ctx_set and inode_ref
>> - Status: Review posted for master, awaiting review closure
>>   - *Du/Pranith*, please close the review of the above
>> - Review: https://review.gluster.org/16622
>> <https://review.gluster.org/16622>
>> - master bug:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1423373
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1423373>
>> - Related note: I was facing the same crash on the client
>> stack as mentioned in bug #1423065, cherry picking this fix and
>> rerunning my tests does not reproduce the crash (as was
>> suggested by Ravi and Poornima).
>>
>>
>> Merged this in the morning. Spending time on 3) above, hope to close
>> it soon.
>>
>>
>> 3) is taking more time than expected. Will complete it tomorrow IST.
>>
>
> Thanks, let's try and chase that down tomorrow, most of the other bugs
> seem to be getting into the code base as of now for the tagging to happen.


Merged the patch on master. Backport is @
https://review.gluster.org/#/c/16697, I will merge one the regressions pass
while I am online, otherwise feel free to merge it.


>
>
>
>>
>>
>>
>> Thanks,
>> Shyam
>>
>> [1] 3.10 tracker bug:
>> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.10.0
>> <https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.10.0>
>>
>> [2] Dynamic tracker list:
>> https://bugzilla.redhat.com/showdependencytree.cgi?id=gluste
>> rfs-3.10.0=1_resolved=1
>> <https://bugzilla.redhat.com/showdependencytree.cgi?id=glust
>> erfs-3.10.0=1_resolved=1>
>>
>>
>>
>>
>> --
>> Pranith
>>
>>
>>
>>
>> --
>> Pranith
>>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] IMP: Release 3.10: RC1 Pending bugs (Need fixes by 21st Feb)

2017-02-20 Thread Pranith Kumar Karampuri
On Mon, Feb 20, 2017 at 7:57 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On Mon, Feb 20, 2017 at 8:25 AM, Shyam <srang...@redhat.com> wrote:
>
>> Hi,
>>
>> RC1 tagging is *tentatively* scheduled for 21st Feb, 2017
>>
>> The intention is that RC1 becomes the release, hence we would like to
>> chase down all blocker bugs [1] and get them fixed before RC1 is tagged.
>>
>> This mail requests information on the various bugs and to understand if
>> it is possible to get them fixed done by the 21st.
>>
>> Bugs pending for RC1 tagging:
>>   1) Bug 1415226 - packaging: python/python2(/python3) cleanup
>> - Status: Review awaiting verification and a backport
>> - master bug: https://bugzilla.redhat.com/show_bug.cgi?id=1414902
>>   - Review: https://review.gluster.org/#/c/16649/
>> - *Niels*, I was not able verify this over the weekend, there is a
>> *chance* I can do this tomorrow. Do you have alternate plans to get this
>> verified?
>>
>>   2) Bug 1421590 - Bricks take up new ports upon volume restart after
>> add-brick op with brick mux enabled
>> - Status: *Atin/Samikshan/Jeff*, any update on this?
>>   - Can we document this as a known issue? What would be the way to
>> get volume to use the older ports (a glusterd restart?)?
>>
>>   3) Bug 1421956 - Disperse: Fallback to pre-compiled code execution when
>> dynamic code generation fails
>> - Status: Awaiting review closure
>> - *Pranith/Ashish*, request one of you to close the review on this
>> one, so that Xavi can backport this to 3.10
>> - master bug: https://bugzilla.redhat.com/show_bug.cgi?id=1421955
>>   - Review: https://review.gluster.org/16614
>>
>>   4) Bug 1422769 - brick process crashes when glusterd is restarted
>> - Status: As per comment #6, the test case that Jeff developed for
>> this is not reporting a crash
>> - *Atin*, should we defer this form the blocker list for 3.10? Can
>> you take a look at the test case as well?
>>   - Tet case: https://review.gluster.org/#/c/16651/
>>
>>   5) Bug 1422781 - Transport endpoint not connected error seen on client
>> when glusterd is restarted
>> - Status: Repro not clean across setups, still debugging the problem
>> - *Atin*, we may need someone from your team to take this up and
>> narrow this down to a fix or determine if this is really a blocker
>>
>>   6) Bug 1423385 - Crash in index xlator because of race in inode_ctx_set
>> and inode_ref
>> - Status: Review posted for master, awaiting review closure
>>   - *Du/Pranith*, please close the review of the above
>> - Review: https://review.gluster.org/16622
>> - master bug: https://bugzilla.redhat.com/show_bug.cgi?id=1423373
>> - Related note: I was facing the same crash on the client stack as
>> mentioned in bug #1423065, cherry picking this fix and rerunning my tests
>> does not reproduce the crash (as was suggested by Ravi and Poornima).
>>
>
> Merged this in the morning. Spending time on 3) above, hope to close it
> soon.
>

3) is taking more time than expected. Will complete it tomorrow IST.

>
>
>>
>> Thanks,
>> Shyam
>>
>> [1] 3.10 tracker bug: https://bugzilla.redhat.com/sh
>> ow_bug.cgi?id=glusterfs-3.10.0
>>
>> [2] Dynamic tracker list: https://bugzilla.redhat.com/sh
>> owdependencytree.cgi?id=glusterfs-3.10.0=1_resolved=1
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] IMP: Release 3.10: RC1 Pending bugs (Need fixes by 21st Feb)

2017-02-20 Thread Pranith Kumar Karampuri
On Mon, Feb 20, 2017 at 8:25 AM, Shyam  wrote:

> Hi,
>
> RC1 tagging is *tentatively* scheduled for 21st Feb, 2017
>
> The intention is that RC1 becomes the release, hence we would like to
> chase down all blocker bugs [1] and get them fixed before RC1 is tagged.
>
> This mail requests information on the various bugs and to understand if it
> is possible to get them fixed done by the 21st.
>
> Bugs pending for RC1 tagging:
>   1) Bug 1415226 - packaging: python/python2(/python3) cleanup
> - Status: Review awaiting verification and a backport
> - master bug: https://bugzilla.redhat.com/show_bug.cgi?id=1414902
>   - Review: https://review.gluster.org/#/c/16649/
> - *Niels*, I was not able verify this over the weekend, there is a
> *chance* I can do this tomorrow. Do you have alternate plans to get this
> verified?
>
>   2) Bug 1421590 - Bricks take up new ports upon volume restart after
> add-brick op with brick mux enabled
> - Status: *Atin/Samikshan/Jeff*, any update on this?
>   - Can we document this as a known issue? What would be the way to
> get volume to use the older ports (a glusterd restart?)?
>
>   3) Bug 1421956 - Disperse: Fallback to pre-compiled code execution when
> dynamic code generation fails
> - Status: Awaiting review closure
> - *Pranith/Ashish*, request one of you to close the review on this
> one, so that Xavi can backport this to 3.10
> - master bug: https://bugzilla.redhat.com/show_bug.cgi?id=1421955
>   - Review: https://review.gluster.org/16614
>
>   4) Bug 1422769 - brick process crashes when glusterd is restarted
> - Status: As per comment #6, the test case that Jeff developed for
> this is not reporting a crash
> - *Atin*, should we defer this form the blocker list for 3.10? Can you
> take a look at the test case as well?
>   - Tet case: https://review.gluster.org/#/c/16651/
>
>   5) Bug 1422781 - Transport endpoint not connected error seen on client
> when glusterd is restarted
> - Status: Repro not clean across setups, still debugging the problem
> - *Atin*, we may need someone from your team to take this up and
> narrow this down to a fix or determine if this is really a blocker
>
>   6) Bug 1423385 - Crash in index xlator because of race in inode_ctx_set
> and inode_ref
> - Status: Review posted for master, awaiting review closure
>   - *Du/Pranith*, please close the review of the above
> - Review: https://review.gluster.org/16622
> - master bug: https://bugzilla.redhat.com/show_bug.cgi?id=1423373
> - Related note: I was facing the same crash on the client stack as
> mentioned in bug #1423065, cherry picking this fix and rerunning my tests
> does not reproduce the crash (as was suggested by Ravi and Poornima).
>

Merged this in the morning. Spending time on 3) above, hope to close it
soon.


>
> Thanks,
> Shyam
>
> [1] 3.10 tracker bug: https://bugzilla.redhat.com/sh
> ow_bug.cgi?id=glusterfs-3.10.0
>
> [2] Dynamic tracker list: https://bugzilla.redhat.com/sh
> owdependencytree.cgi?id=glusterfs-3.10.0=1_resolved=1
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Reviews needed

2017-02-19 Thread Pranith Kumar Karampuri
On Thu, Feb 16, 2017 at 2:13 PM, Xavier Hernandez 
wrote:

> Hi everyone,
>
> I would need some reviews if you have some time:
>
> A memory leak fix in fuse:
> * Patch already merged in master and 3.10
> * Backport to 3.9: https://review.gluster.org/16402
> * Backport to 3.8: https://review.gluster.org/16403
>
> A safe fallback for dynamic code generation in EC:
> * Master: https://review.gluster.org/16614


Will take care of this today.


>
>
> A fix for incompatibilities with FreeBSD:
> * Master: https://review.gluster.org/16417
>
> A fix for FreeBSD's statvfs():
> * Patch already merged in master
> * Backport to 3.10: https://review.gluster.org/16631
> * Backport to 3.9: https://review.gluster.org/16632
> * Backport to 3.8: https://review.gluster.org/16634
>
> I also have two reviews for 3.7 but I think it won't have any new
> releases, right ?
>
> Thank you very much :)
>
> Xavi
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] CFP for Gluster Developer Summit

2016-08-20 Thread Pranith Kumar Karampuri
Proposals:
1) Design of glfstrace tool

What happens in a File operation has been a bit difficult to figure out
looking at the workload, so we need a tool similar to strace which shows
the fops that are being wound/unwound though the clients and servers. We
can use the eventing infra by Aravinda to get this information from both
clients and servers which would help in debugging.

May even show a demo if I find time to implement this :-).

2) Reducing negative lookups using hash of the files/bloom filters/cuckoo
filters

Poornima and I were discussing about how to reduce the number of lookups
that need to fail with ENOENT before a create/mknod/mkdir etc fops need to
come. We see that almost 40% of the workload was negative lookups for
small-file create workload. What we came up with is a translator which
starts tracking creation of files in a directory as soon as a directory is
created using that mount. For each creation inside this new directory it
marks the file name as used using hashtable or a filter(bloom/cuckoo). Now
if a lookup on a name that was never created in that directory comes, we
can give ENOENT directly from the client without doing a lookup on the
cluster. We will discuss how we will be using leases to make sure this
solution is accurate. We would also like to present/seek inputs about how
to extend it in future.

May even show a demo if I find time to implement this :-).


On Sat, Aug 13, 2016 at 1:18 AM, Vijay Bellur  wrote:

> Hey All,
>
> Gluster Developer Summit 2016 is fast approaching [1] on us. We are
> looking to have talks and discussions related to the following themes in
> the summit:
>
> 1. Gluster.Next - focusing on features shaping the future of Gluster
>
> 2. Experience - Description of real world experience and feedback from:
>a> Devops and Users deploying Gluster in production
>b> Developers integrating Gluster with other ecosystems
>
> 3. Use cases  - focusing on key use cases that drive Gluster.today and
> Gluster.Next
>
> 4. Stability & Performance - focusing on current improvements to reduce
> our technical debt backlog
>
> 5. Process & infrastructure  - focusing on improving current workflow,
> infrastructure to make life easier for all of us!
>
> If you have a talk/discussion proposal that can be part of these themes,
> please send out your proposal(s) by replying to this thread. Please clearly
> mention the theme for which your proposal is relevant when you do so. We
> will be ending the CFP by 12 midnight PDT on August 31st, 2016.
>
> If you have other topics that do not fit in the themes listed, please feel
> free to propose and we might be able to accommodate some of them as
> lightening talks or something similar.
>
> Please do reach out to me or Amye if you have any questions.
>
> Thanks!
> Vijay
>
> [1] https://www.gluster.org/events/summit2016/
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] CFP for Gluster Developer Summit

2016-08-22 Thread Pranith Kumar Karampuri
On Mon, Aug 22, 2016 at 5:15 PM, Jeff Darcy  wrote:

> Two proposals, both pretty developer-focused.
>
> (1) Gluster: The Ugly Parts
> Like any code base its size and age, Gluster has accumulated its share of
> dead, redundant, or simply inelegant code.  This code makes us more
> vulnerable to bugs, and slows our entire development process for any
> feature.  In this interactive discussion, we'll identify translators or
> other modules that can be removed or significantly streamlined, and develop
> a plan for doing so within the next year or so.  Bring your favorite gripes
> and pet peeves (about the code).
>
> (2) Gluster Debugging
> Every developer has their own "bag of tricks" for debugging Gluster code -
> things to look for in logs, options to turn on, obscure test-script
> features, gdb macros, and so on.  In this session we'll share many of these
> tricks, and hopefully collect more, along with a plan to document them so
> that newcomers can get up to speed more quickly.
>
>
> I could extend #2 to cover more user/support level problem diagnosis, but
> I think I'd need a co-presenter for that because it's not an area in which
> I feel like an expert myself.
>

I can help here. We can chat offline about what exactly you had in mind and
take it from there.


> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./tests/bitrot/bug-1373520.t failure on master

2017-02-28 Thread Pranith Kumar Karampuri
This is being tracked @ https://bugzilla.redhat.com/show_bug.cgi?id=1427404,
krutika posted a patch to move it to bad tests until we find why a lookup
on one file is leading to lookup on the hardlink instead.

On Tue, Feb 28, 2017 at 2:56 PM, Susant Palai  wrote:

> Hi,
>test case: ./tests/bitrot/bug-1373520.t is seen to be failing on
> different regression runs on master.
> Requesting to look in to it.
>
> Few instances:
> https://build.gluster.org/job/netbsd7-regression/3118/consoleFull
> https://build.gluster.org/job/centos6-regression/3451/
> https://build.gluster.org/job/netbsd7-regression/3122/consoleFull
>
> Thanks,
> Susant
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./tests/bitrot/bug-1373520.t failure on master

2017-03-01 Thread Pranith Kumar Karampuri
Thanks for the work Atin/Krutika. My analysis is wrong. I need to also see
why lookup on one file lead to lookup on its hardlink. Which I will do
later.

On Wed, Mar 1, 2017 at 12:41 PM, Krutika Dhananjay <kdhan...@redhat.com>
wrote:

> Found the bug. Please see https://review.gluster.org/#/
> c/16462/5/xlators/storage/posix/src/posix-handle.c@977
>
> Will be posting the fix in some time.
>
> -Krutika
>
> On Tue, Feb 28, 2017 at 5:45 PM, Atin Mukherjee <amukh...@redhat.com>
> wrote:
>
>>
>>
>> On Tue, Feb 28, 2017 at 5:10 PM, Atin Mukherjee <amukh...@redhat.com>
>> wrote:
>>
>>> Can this patch be reverted asap as it has blocked other patches to get
>>> in? IMO, we shouldn't be marking the test as bad given we know the patch
>>> which introduced the regression.
>>>
>>
>> I've posted https://review.gluster.org/#/c/16787 to revert the change.
>> Can this be merged once it passes regression?
>>
>>
>>>
>>> On Tue, Feb 28, 2017 at 4:16 PM, Atin Mukherjee <amukh...@redhat.com>
>>> wrote:
>>>
>>>> https://review.gluster.org/16462 has caused this regression as per the
>>>> git bisect.
>>>>
>>>> On Tue, Feb 28, 2017 at 3:28 PM, Pranith Kumar Karampuri <
>>>> pkara...@redhat.com> wrote:
>>>>
>>>>> This is being tracked @ https://bugzilla.redhat.com/sh
>>>>> ow_bug.cgi?id=1427404, krutika posted a patch to move it to bad tests
>>>>> until we find why a lookup on one file is leading to lookup on the 
>>>>> hardlink
>>>>> instead.
>>>>>
>>>>> On Tue, Feb 28, 2017 at 2:56 PM, Susant Palai <spa...@redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>test case: ./tests/bitrot/bug-1373520.t is seen to be failing on
>>>>>> different regression runs on master.
>>>>>> Requesting to look in to it.
>>>>>>
>>>>>> Few instances:
>>>>>> https://build.gluster.org/job/netbsd7-regression/3118/consoleFull
>>>>>> https://build.gluster.org/job/centos6-regression/3451/
>>>>>> https://build.gluster.org/job/netbsd7-regression/3122/consoleFull
>>>>>>
>>>>>> Thanks,
>>>>>> Susant
>>>>>> ___
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel@gluster.org
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>>
>>>>> ___
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel@gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> ~ Atin (atinm)
>>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] CFP for Gluster Developer Summit

2016-08-23 Thread Pranith Kumar Karampuri
For some reason Ravi's mail is not coming on the lists, not sure why. Here
is his mail:

Hello,

Here is a proposal I'd like to make.

Title: Throttling in gluster (https://github.com/gluster/gl
usterfs-specs/blob/master/accepted/throttling.md)
Theme: Performance and scalability.

The talk/ discussion will be focused on server side throttling of FOPS,
using a throttling translator. The primary consumer of this would be
self-heal traffic in AFR but can be extended to other clients as well.
I'm working on getting it working for AFR for the first cut so that the
multi-threaded self-heal (courtesy facebook) can be enabled without
consuming system resources too much possibly leading to client starvation.
I'm hoping to have some discussions around this to make it more generic and
see if it can be aligned with long term goals for QoS in gluster.

Thanks.
Ravi

On Tue, Aug 23, 2016 at 3:55 PM, Aravinda  wrote:

> Title: Events APIs for GlusterFS
> Theme: Gluster.Next
>
> With 3.9 release Gluster will have Events APIs support. Cluster events
> will be pushed to registered Client applications in realtime.
>
> I plan to cover the following,
>
> - Introduction
> - Demo
> - List of supported Events
> - How to consume Events - Example Events Client (CLI client and example
> Web App)
> - Future
>
> regards
> Aravinda
>
> On Saturday 13 August 2016 03:45 AM, Amye Scavarda wrote:
>
>
>
> On Fri, Aug 12, 2016 at 12:48 PM, Vijay Bellur  wrote:
>
>> Hey All,
>>
>> Gluster Developer Summit 2016 is fast approaching [1] on us. We are
>> looking to have talks and discussions related to the following themes in
>> the summit:
>>
>> 1. Gluster.Next - focusing on features shaping the future of Gluster
>>
>> 2. Experience - Description of real world experience and feedback from:
>>a> Devops and Users deploying Gluster in production
>>b> Developers integrating Gluster with other ecosystems
>>
>> 3. Use cases  - focusing on key use cases that drive Gluster.today and
>> Gluster.Next
>>
>> 4. Stability & Performance - focusing on current improvements to reduce
>> our technical debt backlog
>>
>> 5. Process & infrastructure  - focusing on improving current workflow,
>> infrastructure to make life easier for all of us!
>>
>> If you have a talk/discussion proposal that can be part of these themes,
>> please send out your proposal(s) by replying to this thread. Please clearly
>> mention the theme for which your proposal is relevant when you do so. We
>> will be ending the CFP by 12 midnight PDT on August 31st, 2016.
>>
>> If you have other topics that do not fit in the themes listed, please
>> feel free to propose and we might be able to accommodate some of them as
>> lightening talks or something similar.
>>
>> Please do reach out to me or Amye if you have any questions.
>>
>> Thanks!
>> Vijay
>>
>> [1] https://www.gluster.org/events/summit2016/
>
>
>
> Annoyingly enough, the Google Doc form won't let people outside of the
> Google Apps domain view it, which is not going to be super helpful for
> this.
>
> I'll go ahead and close the Google form, send out the talks that have
> already been added, and have the form link back to this mailing list post.
> Thanks!
>
> - amye
>
>
> --
> Amye Scavarda | a...@redhat.com | Gluster Community Lead
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] doubts in using gf_event/gf_msg

2016-08-24 Thread Pranith Kumar Karampuri
Just resending in case you missed this mail.

On Tue, Aug 23, 2016 at 2:31 AM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> hi Aravinda,
>I was wondering what is your opinion in sending selected logs as
> events instead of treating them specially. Is this something you guys
> considered? Do you think it is a bad idea to do it that way? We can even
> come up with a new api which logs and then sends it as event.
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] doubts in using gf_event/gf_msg

2016-08-24 Thread Pranith Kumar Karampuri
On Wed, Aug 24, 2016 at 5:38 PM, Aravinda <avish...@redhat.com> wrote:

> It is possible to overload gf_lg/gf_msg to send events to local eventsd.
> But we have to do additional parsing if we need to select the logs to
> convert as events. If we introduce new API, then it is not different from
> existing wrt adding events code in multiple places. But if it helps in
> sharing the host/port and other information(which we are discussing to
> support client events), I can give it a try(Move gf_event inside logging
> infra)
>

Sounds good to me. I will wait for others inputs here as well


>
> regards
> Aravinda
>
> On Wednesday 24 August 2016 05:15 PM, Pranith Kumar Karampuri wrote:
>
> Just resending in case you missed this mail.
>
> On Tue, Aug 23, 2016 at 2:31 AM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> hi Aravinda,
>>I was wondering what is your opinion in sending selected logs as
>> events instead of treating them specially. Is this something you guys
>> considered? Do you think it is a bad idea to do it that way? We can even
>> come up with a new api which logs and then sends it as event.
>>
>> --
>> Pranith
>>
>
>
>
> --
> Pranith
>
>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] NetBSD aborted runs

2016-08-31 Thread Pranith Kumar Karampuri
I am seeing a pause when the .t runs that seem to last close to how much
ever time we put in EXPECT_WITHIN

[2016-09-01 03:24:21.852744] I
[common.c:1134:pl_does_monkey_want_stuck_lock] 0-patchy-locks: stuck lock
[2016-09-01 03:24:21.852775] W [inodelk.c:659:pl_inode_setlk]
0-patchy-locks: MONKEY LOCKING (forcing stuck lock)! at 2016-09-01 03:24:21
[2016-09-01 03:24:21.852792] I [server-rpc-fops.c:317:server_finodelk_cbk]
0-patchy-server: replied
[2016-09-01 03:24:21.861937] I [server-rpc-fops.c:5682:server3_3_inodelk]
0-patchy-server: inbound
[2016-09-01 03:24:21.862318] I [server-rpc-fops.c:278:server_inodelk_cbk]
0-patchy-server: replied
[2016-09-01 03:24:21.862627] I [server-rpc-fops.c:5682:server3_3_inodelk]
0-patchy-server: inbound << No I/O after this.
[2016-09-01 03:27:19.6N]:++ G_LOG:tests/features/lock_revocation.t:
TEST: 52 append_to_file /mnt/glusterfs/1/testfile ++
[2016-09-01 03:27:19.871044] I [server-rpc-fops.c:5772:server3_3_finodelk]
0-patchy-server: inbound
[2016-09-01 03:27:19.871280] I [clear.c:219:clrlk_clear_inodelk]
0-patchy-locks: 2
[2016-09-01 03:27:19.871307] I [clear.c:273:clrlk_clear_inodelk]
0-patchy-locks: released_granted
[2016-09-01 03:27:19.871330] I [server-rpc-fops.c:278:server_inodelk_cbk]
0-patchy-server: replied
[2016-09-01 03:27:19.871389] W [inodelk.c:228:__inodelk_prune_stale]
0-patchy-locks: Lock revocation [reason: age; gfid:
3ccca736-ba89-4f8c-ba17-f6cdbcd0e3c3; domain: patchy-replicate-0; age: 178
sec] - Inode lock revoked:  0 granted & 1 blocked locks cleared

We can prevent the hang with adding $CLI volume stop $V0, but the test
would fail. When that happens, the following error is printed on the
console from perfused

perfused: perfuse_node_inactive: perfuse_node_fsync failed error = 57:
Resource temporarily unavailable <<--- I wonder if this comes because
INODELK fop fails with EAGAIN.

I am also seeing a weird behaviour where  it says it is releasing granted
locks but prints that it released 1 blocked lock.

+Manu
I think there are 2 things going on here. 1) There is a hang, I am still
guessing it is gluster issue until proven otherwise.
2) I got to figure out why the counters are showing wrong information from
the information printed in the logs. I kept going through the code, it
seems fine. It should have printed that it released 1 granted lock & 0
blocked locks. But it prints it in reverse.

If you do git diff on nbslave72.cloud.gluster.org, you can see the changes
I made. Could you please help?


On Sun, Aug 28, 2016 at 7:36 AM, Atin Mukherjee  wrote:

> This is still bothering us a lot and looks like there is a genuine issue
> in the code which is making the the process to be hung/deadlocked?
>
> Raghavendra T - any more findings?
>
>
> On Friday 19 August 2016, Atin Mukherjee  wrote:
>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1368421
>>
>> NetBSD regressions are getting aborted very frequently. Apart from the
>> infra issue related to connectivity (Nigel has started looking into it),
>> lock_revocation.t is getting hung in such instances which is causing run to
>> be aborted after 300 minutes. This has already started impacting the
>> patches to get in which eventually impacts the upcoming release cycles.
>>
>> I'd request the feature owner/maintainer to have a look at it asap.
>>
>> --Atin
>>
>
>
> --
> --Atin
>
> ___
> maintainers mailing list
> maintain...@gluster.org
> http://www.gluster.org/mailman/listinfo/maintainers
>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for perf xlator components for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi Raghavendra,
   I feel running
https://github.com/avati/perf-test/blob/master/perf-test.sh is good enough
for testing these. Do you feel anything more needs to be done before the
release?

I can update it at
https://public.pad.fsfe.org/p/gluster-component-release-checklist

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Checklist for Bitrot component for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
On Fri, Sep 2, 2016 at 11:39 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> hi,
> Did you get a chance to decide on the tests that need to be done
> before doing a release for Bitrot component? Could you let me know who will
> be providing with the list?
>
> I can update it at https://public.pad.fsfe.org/p/
> gluster-component-release-checklist
>
> --
> Pranith
>
Sorry, forgot to add 'Aravinda & Pranith'
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for FUSE bridge component for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi,
Did you get a chance to decide on the tests that need to be done
before doing a release for FUSE bridge component? Could you let me know who
will be providing with the list?

I can update it at https://public.pad.fsfe.org/p/
gluster-component-release-checklist

-- 
Aravinda & Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for glusterd component for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi,
Did you get a chance to decide on the tests that need to be done
before doing a release for glusterd component? Could you let me know who
will be providing with the list?

I think just the cases that cover the infra part should be good enough.
Component based commands should come in the component testing like healing
commands/rebalance commands/quota commands/geo-rep/bitrot/tiering etc...

I can update it at https://public.pad.fsfe.org/p/
gluster-component-release-checklist
-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for Quota+Marker component for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi,
Did you get a chance to decide on the tests that need to be done
before doing a release for Quota+Marker component?

I can update it at https://public.pad.fsfe.org/p/
gluster-component-release-checklist

-- 
Aravinda & Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for tier component for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi,
   Did you get a chance to decide on the tests that need to be
done before doing a release for Tier component? Could you let me know who
will be providing with the list?

I can update it at https://public.pad.fsfe.org/p/
gluster-component-release-checklist
-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for georep family of components for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi,
Did you get a chance to decide on the tests that need to be
done before doing a release for georep family of components? Could you let
me know who will be providing with the list?

I think changelog, marker, georep are the features that should come under
this bucket right? Are there any more?

I can update it at https://public.pad.fsfe.org/p/gluster-component-release-
checklist

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for gfapi for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi,
I think most of this testing will be covered in nfsv4, smb testing.
But I could be wrong. Could you let me know who will be providing with the
list if you think there are more tests that need to be run?

I can update it at https://public.pad.fsfe.org/p/
gluster-component-release-checklist

-- 
Aravinda & Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for Bitrot component for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi,
Did you get a chance to decide on the tests that need to be done
before doing a release for Bitrot component? Could you let me know who will
be providing with the list?

I can update it at https://public.pad.fsfe.org/p/
gluster-component-release-checklist

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Checklist for tier component for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
On Fri, Sep 2, 2016 at 11:52 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> hi,
>Did you get a chance to decide on the tests that need to be
> done before doing a release for Tier component? Could you let me know who
> will be providing with the list?
>
> I can update it at https://public.pad.fsfe.org/p/
> gluster-component-release-checklist
> --
> Pranith
>
Sorry, forgot to add 'Aravinda & Pranith'
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Checklist for perf xlator components for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
On Fri, Sep 2, 2016 at 11:42 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> hi Raghavendra,
>I feel running https://github.com/avati/perf-
> test/blob/master/perf-test.sh is good enough for testing these. Do you
> feel anything more needs to be done before the release?
>
> I can update it at https://public.pad.fsfe.org/p/
> gluster-component-release-checklist
>
> --
> Pranith
>

Sorry, forgot to add 'Aravinda & Pranith'
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Checklist for DHT component for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
On Fri, Sep 2, 2016 at 11:31 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> hi,
>   Did you get a chance to decide on the tests that need to be done
> before doing a release for DHT component? Could you let me know who will be
> providing with the list?
>
> I can update it at https://public.pad.fsfe.org/p/
> gluster-component-release-checklist
>
> --
> Pranith
>
Sorry, forgot to add 'Aravinda & Pranith'
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Checklist for georep family of components for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
On Fri, Sep 2, 2016 at 11:36 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> hi,
> Did you get a chance to decide on the tests that need to be
> done before doing a release for georep family of components? Could you let
> me know who will be providing with the list?
>
> I think changelog, marker, georep are the features that should come under
> this bucket right? Are there any more?
>
> I can update it at https://public.pad.fsfe.org/p/
> gluster-component-release-checklist
>
> --
> Pranith
>
Sorry, forgot to add 'Aravinda & Pranith'
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for Upcall component for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi,
Did you get a chance to decide on the tests that need to be done
before doing a release for Upcall component? Could you let me know who will
be providing with the list?

I can update it at https://public.pad.fsfe.org/p/
gluster-component-release-checklist

-- 
Aravinda & Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Checklist for RPC for upstream release

2016-09-02 Thread Pranith Kumar Karampuri
hi,
  I think this should be covered as part of other component testing,
but if you think any more tests need to be added, please let us know.

I can update it at
https://public.pad.fsfe.org/p/gluster-component-release-checklist

-- 
Aravinda & Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

<    1   2   3   4   5   6   7   8   >