Re: [Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x)

2016-02-12 Thread Richard Wareing
Hey,

Sorry for the late reply but I missed this e-mail.  With respect to identifying 
locking domains, we use the identical logic that GlusterFS itself uses to 
identify the domains; which is just a simple string comparison if I'm not 
mistaken.   System processes (SHD/Rebalance) locking domains are treated 
identical to any other, this is specifically critical to things like DHT 
healing as this locking domain is used both in userland and by SHDs (you cannot 
disable DHT healing).  To illustrate this, consider the case where a SHD holds 
a lock to do a DHT heal but can't because of GFID split-braina user comes 
along a hammers that directory attempting to get a lockyou can pretty much 
kiss your cluster good-bye after that :).

With this in mind, we explicitly choose not to respect system process 
(SHD/rebalance) locks any more than a user lock request as they can be just as 
likely (if not more so) to cause a system to fall over vs. a user (see example 
above).  Although this might seem unwise at first, I'd put forth that having 
clusters fall over catastrophically pushes far worse decisions on operators 
such as re-kicking random bricks or entire clusters in desperate attempts at 
freeing locks (the CLI is often unable to free the locks in our experience) or 
stopping run away memory consumption due to frames piling up on the bricks.  To 
date, we haven't even observed a single instance of data corruption (and we've 
been looking for it!) due to this feature.

We've even used it on clusters where they were on the verge of falling over and 
we enable revocation and the entire system stabilizes almost instantly (it's 
really like magic when you see it :) ).

Hope this helps!

Richard



From: raghavendra...@gmail.com [raghavendra...@gmail.com] on behalf of 
Raghavendra G [raghaven...@gluster.com]
Sent: Tuesday, January 26, 2016 9:49 PM
To: Raghavendra Gowdappa
Cc: Richard Wareing; Gluster Devel
Subject: Re: [Gluster-devel] Feature: Automagic lock-revocation for 
features/locks xlator (v3.7.x)



On Mon, Jan 25, 2016 at 10:39 AM, Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


- Original Message -
> From: "Richard Wareing" mailto:rware...@fb.com>>
> To: "Pranith Kumar Karampuri" 
> mailto:pkara...@redhat.com>>
> Cc: gluster-devel@gluster.org<mailto:gluster-devel@gluster.org>
> Sent: Monday, January 25, 2016 8:17:11 AM
> Subject: Re: [Gluster-devel] Feature: Automagic lock-revocation for 
> features/locks xlator (v3.7.x)
>
> Yup per domain would be useful, the patch itself currently honors domains as
> well. So locks in a different domains will not be touched during revocation.
>
> In our cases we actually prefer to pull the plug on SHD/DHT domains to ensure
> clients do not hang, this is important for DHT self heals which cannot be
> disabled via any option, we've found in most cases once we reap the lock
> another properly behaving client comes along and completes the DHT heal
> properly.

Flushing waiting locks of DHT can affect application continuity too. Though 
locks requested by rebalance process can be flushed to certain extent without 
applications noticing any failures, there is no guarantee that locks requested 
in DHT_LAYOUT_HEAL_DOMAIN and DHT_FILE_MIGRATE_DOMAIN, are issued by only 
rebalance process.

I missed this point in my previous mail. Now I remember that we can use 
frame->root->pid (being negative) to identify internal processes. Was this the 
approach you followed to identify locks from rebalance process?

These two domains are used for locks to synchronize among and between rebalance 
process(es) and client(s). So, there is equal probability that these locks 
might be requests from clients and hence application can see some file 
operations failing.

In case of pulling plug on DHT_LAYOUT_HEAL_DOMAIN, dentry operations that 
depend on layout can fail. These operations can include create, link, unlink, 
symlink, mknod, mkdir, rename for files/directory within the directory on which 
lock request is failed.

In case of pulling plug on DHT_FILE_MIGRATE_DOMAIN, rename of immediate 
subdirectories/files can fail.


>
> Richard
>
>
> Sent from my iPhone
>
> On Jan 24, 2016, at 6:42 PM, Pranith Kumar Karampuri < 
> pkara...@redhat.com<mailto:pkara...@redhat.com> >
> wrote:
>
>
>
>
>
>
> On 01/25/2016 02:17 AM, Richard Wareing wrote:
>
>
>
> Hello all,
>
> Just gave a talk at SCaLE 14x today and I mentioned our new locks revocation
> feature which has had a significant impact on our GFS cluster reliability.
> As such I wanted to share the patch with the community, so here's the
> bugzilla report:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1301401
>
> =
> Summary:
> Mis-beha

Re: [Gluster-devel] Throttling xlator on the bricks

2016-02-12 Thread Richard Wareing
Hey Ravi,

I'll ping Shreyas about this today.  There's also a patch we'll need for 
multi-threaded SHD to fix the least-pri queuing.  The PID of the process wasn't 
tagged correctly via the call frame in my original patch.  The patch below 
fixes this (for 3.6.3), I didn't see multi-threaded self heal on github/master 
yet so let me know what branch you need this patch on and I can come up with a 
clean patch.

Richard


=


diff --git a/xlators/cluster/afr/src/afr-self-heald.c 
b/xlators/cluster/afr/src/afr-self-heald.c
index 028010d..b0f6248 100644
--- a/xlators/cluster/afr/src/afr-self-heald.c
+++ b/xlators/cluster/afr/src/afr-self-heald.c
@@ -532,6 +532,9 @@ afr_mt_process_entries_done (int ret, call_frame_t 
*sync_frame,
 pthread_cond_signal (&mt_data->task_done);
 }
 pthread_mutex_unlock (&mt_data->lock);
+
+if (task_ctx->frame)
+AFR_STACK_DESTROY (task_ctx->frame);
 GF_FREE (task_ctx);
 return 0;
 }
@@ -787,6 +790,7 @@ _afr_mt_create_process_entries_task (xlator_t *this,
 int   ret = -1;
 afr_mt_process_entries_task_ctx_t *task_ctx;
 afr_mt_data_t *mt_data;
+call_frame_t  *frame = NULL;

 mt_data = &healer->mt_data;

@@ -799,6 +803,8 @@ _afr_mt_create_process_entries_task (xlator_t *this,
 if (!task_ctx)
 goto err;

+task_ctx->frame = afr_frame_create (this);
+
 INIT_LIST_HEAD (&task_ctx->list);
 task_ctx->readdir_xl = this;
 task_ctx->healer = healer;
@@ -812,7 +818,7 @@ _afr_mt_create_process_entries_task (xlator_t *this,
 // This returns immediately, and afr_mt_process_entries_done will
 // be called when the task is completed e.g. our queue is empty
 ret = synctask_new (this->ctx->env, afr_mt_process_entries_task,
-afr_mt_process_entries_done, NULL,
+afr_mt_process_entries_done, task_ctx->frame,
 (void *)task_ctx);

 if (!ret) {
diff --git a/xlators/cluster/afr/src/afr-self-heald.h 
b/xlators/cluster/afr/src/afr-self-heald.h
index 817e712..1588fc8 100644
--- a/xlators/cluster/afr/src/afr-self-heald.h
+++ b/xlators/cluster/afr/src/afr-self-heald.h
@@ -74,6 +74,7 @@ typedef struct afr_mt_process_entries_task_ctx_ {
 subvol_healer_t *healer;
 xlator_t*readdir_xl;
 inode_t *idx_inode;  /* inode ref for xattrop dir */
+call_frame_t*frame;
 unsigned intentries_healed;
 unsigned intentries_processed;
 unsigned intalready_healed;


Richard

From: Ravishankar N [ravishan...@redhat.com]
Sent: Sunday, February 07, 2016 11:15 PM
To: Shreyas Siravara
Cc: Richard Wareing; Vijay Bellur; Gluster Devel
Subject: Re: [Gluster-devel] Throttling xlator on the bricks

Hello,

On 01/29/2016 06:51 AM, Shreyas Siravara wrote:
> So the way our throttling works is (intentionally) very simplistic.
>
> (1) When someone mounts an NFS share, we tag the frame with a 32 bit hash of 
> the export name they were authorized to mount.
> (2) io-stats keeps track of the "current rate" of fops we're seeing for that 
> particular mount, using a sampling of fops and a moving average over a short 
> period of time.
> (3) Based on whether the share violated its allowed rate (which is defined in 
> a config file), we tag the FOP as "least-pri". Of course this makes the 
> assumption that all NFS endpoints are receiving roughly the same # of FOPs. 
> The rate defined in the config file is a *per* NFS endpoint number. So if 
> your cluster has 10 NFS endpoints, and you've pre-computed that it can do 
> roughly 1000 FOPs per second, the rate in the config file would be 100.
> (4) IO-Threads then shoves the FOP into the least-pri queue, rather than its 
> default. The value is honored all the way down to the bricks.
>
> The code is actually complete, and I'll put it up for review after we iron 
> out a few minor issues.

Did you get a chance to send the patch? Just wanted to run some tests
and see if this is all we need at the moment to regulate shd traffic,
especially with Richard's multi-threaded heal patch
https://urldefense.proofpoint.com/v2/url?u=http-3A__review.gluster.org_-23_c_13329_&d=CwIC-g&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=B873EiTlTeUXIjEcoutZ6Py5KL0bwXIVroPbpwaKD8s&s=fo86UTOQWXf0nQZvvauqIIhlwoZHpRlQMNfQd7Ubu7g&e=
  being revived and made ready for 3.8.

-Ravi

>
>> On Jan 27, 2016, at 9:48 PM, Ravishankar N  wrote:
>>
>> On 01/26/2016 08:41 AM, Richard Wareing wrote:
>>> In

Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Richard Wareing
> If there is one bucket per client and one thread per bucket, it would be
> difficult to scale as the number of clients increase. How can we do this
> better?

On this note... consider that 10's of thousands of clients are not unrealistic 
in production :).  Using a thread per bucket would also beunwise..

On the idea in general, I'm just wondering if there's specific (real-world) 
cases where this has even been an issue where least-prio queuing hasn't been 
able to handle?  Or is this more of a theoretical concern?  I ask as I've not 
really encountered situations where I wished I could give more FOPs to SHD vs 
rebalance and such.

In any event, it might be worth having Shreyas detail his throttling feature 
(that can throttle any directory hierarchy no less) to illustrate how a simpler 
design can achieve similar results to these more complicated (and it 
followsbug prone) approaches.

Richard


From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] on 
behalf of Vijay Bellur [vbel...@redhat.com]
Sent: Monday, January 25, 2016 6:44 PM
To: Ravishankar N; Gluster Devel
Subject: Re: [Gluster-devel] Throttling xlator on the bricks

On 01/25/2016 12:36 AM, Ravishankar N wrote:
> Hi,
>
> We are planning to introduce a throttling xlator on the server (brick)
> process to regulate FOPS. The main motivation is to solve complaints about
> AFR selfheal taking too much of CPU resources. (due to too many fops for
> entry
> self-heal, rchecksums for data self-heal etc.)


I am wondering if we can re-use the same xlator for throttling
bandwidth, iops etc. in addition to fops. Based on admin configured
policies we could provide different upper thresholds to different
clients/tenants and this could prove to be an useful feature in
multitenant deployments to avoid starvation/noisy neighbor class of
problems. Has any thought gone in this direction?

>
> The throttling is achieved using the Token Bucket Filter algorithm
> (TBF). TBF
> is already used by bitrot's bitd signer (which is a client process) in
> gluster to regulate the CPU intensive check-sum calculation. By putting the
> logic on the brick side, multiple clients- selfheal, bitrot, rebalance or
> even the mounts themselves can avail the benefits of throttling.
>
> The TBF algorithm in a nutshell is as follows: There is a bucket which
> is filled
> at a steady (configurable) rate with tokens. Each FOP will need a fixed
> amount
> of tokens to be processed. If the bucket has that many tokens, the FOP is
> allowed and that many tokens are removed from the bucket. If not, the FOP is
> queued until the bucket is filled.
>
> The xlator will need to reside above io-threads and can have different
> buckets,
> one per client. There has to be a communication mechanism between the
> client and
> the brick (IPC?) to tell what FOPS need to be regulated from it, and the
> no. of
> tokens needed etc. These need to be re configurable via appropriate
> mechanisms.
> Each bucket will have a token filler thread which will fill the tokens
> in it.

If there is one bucket per client and one thread per bucket, it would be
difficult to scale as the number of clients increase. How can we do this
better?

> The main thread will enqueue heals in a list in the bucket if there aren't
> enough tokens. Once the token filler detects some FOPS can be serviced,
> it will
> send a cond-broadcast to a dequeue thread which will process (stack
> wind) all
> the FOPS that have the required no. of tokens from all buckets.
>
> This is just a high level abstraction: requesting feedback on any aspect of
> this feature. what kind of mechanism is best between the client/bricks for
> tuning various parameters? What other requirements do you foresee?
>

I am in favor of having administrator defined policies or templates
(collection of policies) being used to provide the tuning parameter per
client or a set of clients. We could even have a default template per
use case etc. Is there a specific need to have this negotiation between
clients and servers?

Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=aQHnnoxK50Ebw77QHtp3ykjC976mJIt2qrIUzpqEViQ&s=Jitbldlbjwye6QI8V33ZoKtVt6-B64p2_-5piVlfXMQ&e=
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x)

2016-01-25 Thread Richard Wareing
Hey Hey Panith,

>Maybe give clients a second (or more) chance to "refresh" their locks - in the 
>sense, when a lock is about to be revoked, notify the client which can then 
>call for a refresh to conform it's locks holding validity. This would require 
>some maintainance work on the client to keep >track of locked regions.

So we've thought about this as well, however the approach I'd rather is that we 
(long term) eliminate any need for multi-hour locking.  This would put the 
responsibility on the SHD/rebalance/bitrot daemons to take out another lock 
request once in a while to signal to the POSIX locks translator that they are 
still there and alive.

The world we want to be in is that locks > N minutes is most _definitely_ a bug 
or broken client and should be revoked.  With this patch it's simply a 
heuristic to make a judgement call, in our world however we've seen that once 
you have 1000's of lock requests piled outit's only a matter of time before 
your entire cluster is going to collapse; so the "correctness" of the locking 
behavior or however much you might upset SHD/bitrot/rebalance is a completely 
secondary concern over the availability and stability of the cluster itself.

For folks that want to use this feature conservatively, they shouldn't revoke 
based on time, but rather based on (lock request) queue depth; if you are in a 
situation like I've described above it's almost certainly a bug or a situation 
not fully understood by developers.

Richard



From: Venky Shankar [yknev.shan...@gmail.com]
Sent: Sunday, January 24, 2016 9:36 PM
To: Pranith Kumar Karampuri
Cc: Richard Wareing; Gluster Devel
Subject: Re: [Gluster-devel] Feature: Automagic lock-revocation for 
features/locks xlator (v3.7.x)


On Jan 25, 2016 08:12, "Pranith Kumar Karampuri" 
mailto:pkara...@redhat.com>> wrote:
>
>
>
> On 01/25/2016 02:17 AM, Richard Wareing wrote:
>>
>> Hello all,
>>
>> Just gave a talk at SCaLE 14x today and I mentioned our new locks revocation 
>> feature which has had a significant impact on our GFS cluster reliability.  
>> As such I wanted to share the patch with the community, so here's the 
>> bugzilla report:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1301401
>>
>> =
>> Summary:
>> Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster 
>> instability and eventual complete unavailability due to failures in 
>> releasing entry/inode locks in a timely manner.
>>
>> Classic symptoms on this are increased brick (and/or gNFSd) memory usage due 
>> the high number of (lock request) frames piling up in the processes.  The 
>> failure-mode results in bricks eventually slowing down to a crawl due to 
>> swapping, or OOMing due to complete memory exhaustion; during this period 
>> the entire cluster can begin to fail.  End-users will experience this as 
>> hangs on the filesystem, first in a specific region of the file-system and 
>> ultimately the entire filesystem as the offending brick begins to turn into 
>> a zombie (i.e. not quite dead, but not quite alive either).
>>
>> Currently, these situations must be handled by an administrator detecting & 
>> intervening via the "clear-locks" CLI command.  Unfortunately this doesn't 
>> scale for large numbers of clusters, and it depends on the correct 
>> (external) detection of the locks piling up (for which there is little 
>> signal other than state dumps).
>>
>> This patch introduces two features to remedy this situation:
>>
>> 1. Monkey-unlocking - This is a feature targeted at developers (only!) to 
>> help track down crashes due to stale locks, and prove the utility of he lock 
>> revocation feature.  It does this by silently dropping 1% of unlock 
>> requests; simulating bugs or mis-behaving clients.
>>
>> The feature is activated via:
>> features.locks-monkey-unlocking 
>>
>> You'll see the message
>> "[] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY 
>> LOCKING (forcing stuck lock)!" ... in the logs indicating a request has been 
>> dropped.
>>
>> 2. Lock revocation - Once enabled, this feature will revoke a 
>> *contended*lock  (i.e. if nobody else asks for the lock, we will not revoke 
>> it) either by the amount of time the lock has been held, how many other lock 
>> requests are waiting on the lock to be freed, or some combination of both.  
>> Clients which are losing their locks will be notified by receiving EAGAIN 
>> (send back to their callback function).
>>
>> The feature is activated via these 

Re: [Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x)

2016-01-24 Thread Richard Wareing
Yup per domain would be useful, the patch itself currently honors domains as 
well.  So locks in a different domains will not be touched during revocation.

In our cases we actually prefer to pull the plug on SHD/DHT domains to ensure 
clients do not hang, this is important for DHT self heals which cannot be 
disabled via any option, we've found in most cases once we reap the lock 
another properly behaving client comes along and completes the DHT heal 
properly.

Richard


Sent from my iPhone

On Jan 24, 2016, at 6:42 PM, Pranith Kumar Karampuri 
mailto:pkara...@redhat.com>> wrote:



On 01/25/2016 02:17 AM, Richard Wareing wrote:
Hello all,

Just gave a talk at SCaLE 14x today and I mentioned our new locks revocation 
feature which has had a significant impact on our GFS cluster reliability.  As 
such I wanted to share the patch with the community, so here's the bugzilla 
report:

https://bugzilla.redhat.com/show_bug.cgi?id=1301401

=
Summary:
Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster instability 
and eventual complete unavailability due to failures in releasing entry/inode 
locks in a timely manner.

Classic symptoms on this are increased brick (and/or gNFSd) memory usage due 
the high number of (lock request) frames piling up in the processes.  The 
failure-mode results in bricks eventually slowing down to a crawl due to 
swapping, or OOMing due to complete memory exhaustion; during this period the 
entire cluster can begin to fail.  End-users will experience this as hangs on 
the filesystem, first in a specific region of the file-system and ultimately 
the entire filesystem as the offending brick begins to turn into a zombie (i.e. 
not quite dead, but not quite alive either).

Currently, these situations must be handled by an administrator detecting & 
intervening via the "clear-locks" CLI command.  Unfortunately this doesn't 
scale for large numbers of clusters, and it depends on the correct (external) 
detection of the locks piling up (for which there is little signal other than 
state dumps).

This patch introduces two features to remedy this situation:

1. Monkey-unlocking - This is a feature targeted at developers (only!) to help 
track down crashes due to stale locks, and prove the utility of he lock 
revocation feature.  It does this by silently dropping 1% of unlock requests; 
simulating bugs or mis-behaving clients.

The feature is activated via:
features.locks-monkey-unlocking 

You'll see the message
"[] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY LOCKING 
(forcing stuck lock)!" ... in the logs indicating a request has been dropped.

2. Lock revocation - Once enabled, this feature will revoke a *contended*lock  
(i.e. if nobody else asks for the lock, we will not revoke it) either by the 
amount of time the lock has been held, how many other lock requests are waiting 
on the lock to be freed, or some combination of both.  Clients which are losing 
their locks will be notified by receiving EAGAIN (send back to their callback 
function).

The feature is activated via these options:
features.locks-revocation-secs 
features.locks-revocation-clear-all [on/off]
features.locks-revocation-max-blocked 

Recommended settings are: 1800 seconds for a time based timeout (give clients 
the benefit of the doubt, or chose a max-blocked requires some experimentation 
depending on your workload, but generally values of hundreds to low thousands 
(it's normal for many ten's of locks to be taken out when files are being 
written @ high throughput).

I really like this feature. One question though, self-heal, rebalance domain 
locks are active until self-heal/rebalance is complete which can take more than 
30 minutes if the files are in TBs. I will try to see what we can do to handle 
these without increasing the revocation-secs too much. May be we can come up 
with per domain revocation timeouts. Comments are welcome.

Pranith

=

The patch supplied will patch clean the the v3.7.6 release tag, and probably to 
any 3.7.x release & master (posix locks xlator is rarely touched).

Richard






___
Gluster-devel mailing list
Gluster-devel@gluster.org<mailto:Gluster-devel@gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-devel<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwMD-g&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=v_luYOkWoUH_CM7VsPRV6d4_mdmpF424jNrBeM6o0u8&s=Ypdg_FxV2Ru9vzOgVSmH1tkB500c7d4IILHrIFCGKrc&e=>

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Tips and Tricks for Gluster Developer

2016-01-24 Thread Richard Wareing
Here's my tips:

1. General C tricks
- learn to use vim or emacs & read their manuals; customize to suite your style
- use vim w/ pathogen plugins for auto formatting (don't use tabs!) & syntax
- use ctags to jump around functions
- Use ASAN & valgrind to check for memory leaks and heap corruption
- learn to use "git bisect" to quickly find where regressions were introduced & 
revert them
- Use a window manager like tmux or screen

2. Gluster specific tricks
- Alias "ggrep" to grep through all Gluster source files for some string and 
show you the line numbers
- Alias "gvim" or "gemacs" to open any source file without full path, eg. "gvim 
afr.c"
- GFS specific gdb macros to dump out pretty formatting of various structs 
(Jeff Darcy has some of these IIRC)
- Write prove tests...for everything you write, and any bug you fix.  Make them 
deterministic (timing/races shouldn't matter).
- Bugs/races and/or crashes which are hard or impossible to repro often require 
the creation of a developer specific feature to simulate the failure and 
efficiently code/test a fix.  Example: "monkey-unlocking" in the lock 
revocation patch I just posted.
- That edge case you are ignoring because you think it's impossible/unlikely?  
We will find/hit it in 48hrs at large scale (seriously we will) handle it 
correctly or at a minimum write a (kernel style) "OOPS" log type message.

That's all I have off the top of my head.  I'll give example aliases in another 
reply.

Richard

Sent from my iPhone

> On Jan 22, 2016, at 6:14 AM, Raghavendra Talur  wrote:
> 
> HI All,
> 
> I am sure there are many tricks hidden under sleeves of many Gluster 
> developers.
> I realized this when speaking to new developers. It would be good have a 
> searchable thread of such tricks.
> 
> Just reply back on this thread with the tricks that you have and I promise I 
> will collate them and add them to developer guide.
> 
> 
> Looking forward to be amazed!
> 
> Thanks,
> Raghavendra Talur
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel&d=CwICAg&c=5VD0RTtNlTh3ycd41b3MUw&r=qJ8Lp7ySfpQklq3QZr44Iw&m=wVrGhYdkvCanDEZF0xOyVbFg0am_GxaoXR26Cvp7H2U&s=JOrY0up51BoZOq2sKaNJQHPzqKiUS3Bwgn7fr5VPXjw&e=
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x)

2016-01-24 Thread Richard Wareing
Hello all,

Just gave a talk at SCaLE 14x today and I mentioned our new locks revocation 
feature which has had a significant impact on our GFS cluster reliability.  As 
such I wanted to share the patch with the community, so here's the bugzilla 
report:

https://bugzilla.redhat.com/show_bug.cgi?id=1301401

=
Summary:
Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster instability 
and eventual complete unavailability due to failures in releasing entry/inode 
locks in a timely manner.

Classic symptoms on this are increased brick (and/or gNFSd) memory usage due 
the high number of (lock request) frames piling up in the processes.  The 
failure-mode results in bricks eventually slowing down to a crawl due to 
swapping, or OOMing due to complete memory exhaustion; during this period the 
entire cluster can begin to fail.  End-users will experience this as hangs on 
the filesystem, first in a specific region of the file-system and ultimately 
the entire filesystem as the offending brick begins to turn into a zombie (i.e. 
not quite dead, but not quite alive either).

Currently, these situations must be handled by an administrator detecting & 
intervening via the "clear-locks" CLI command.  Unfortunately this doesn't 
scale for large numbers of clusters, and it depends on the correct (external) 
detection of the locks piling up (for which there is little signal other than 
state dumps).

This patch introduces two features to remedy this situation:

1. Monkey-unlocking - This is a feature targeted at developers (only!) to help 
track down crashes due to stale locks, and prove the utility of he lock 
revocation feature.  It does this by silently dropping 1% of unlock requests; 
simulating bugs or mis-behaving clients.

The feature is activated via:
features.locks-monkey-unlocking 

You'll see the message
"[] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY LOCKING 
(forcing stuck lock)!" ... in the logs indicating a request has been dropped.

2. Lock revocation - Once enabled, this feature will revoke a *contended*lock  
(i.e. if nobody else asks for the lock, we will not revoke it) either by the 
amount of time the lock has been held, how many other lock requests are waiting 
on the lock to be freed, or some combination of both.  Clients which are losing 
their locks will be notified by receiving EAGAIN (send back to their callback 
function).

The feature is activated via these options:
features.locks-revocation-secs 
features.locks-revocation-clear-all [on/off]
features.locks-revocation-max-blocked 

Recommended settings are: 1800 seconds for a time based timeout (give clients 
the benefit of the doubt, or chose a max-blocked requires some experimentation 
depending on your workload, but generally values of hundreds to low thousands 
(it's normal for many ten's of locks to be taken out when files are being 
written @ high throughput).

=

The patch supplied will patch clean the the v3.7.6 release tag, and probably to 
any 3.7.x release & master (posix locks xlator is rarely touched).

Richard



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Feature: FOP Statistics JSON Dumps

2015-09-22 Thread Richard Wareing
Hey Ben,

So the UI for it is simply to read it from /var/lib/glusterd/stats.  For 
example for gNFSd you can simply do this:

cat /var/lib/glusterd/stats/glusterfs_nfsd.dump


To see the output.  The reason we favor this "procfs" style interface is that:

1. There are 0 depedencies on CLIs which can hang.
2. All dumps are independent of one another, gNFSd on that host is having 
issues, this should prevent glusterfsd from sending us stats.
3. The output can be sent to an analytics/alarm engine of your choice.  Or 
simply run grep w/ "watch" in a loop to watch "live" when doing debugging.

Since we have this feature...we actually never use "profile" at all actually: 
there's really no need since you have the data 24x7 on 5 second intervals. You 
only need to enable diagnostics.latency-measurement, diagnostics.count-fop-hits 
and set the diagnostics.ios-dump-interval to non-zero and the data will land in 
/var/lib/glusterd/stats/.dump .

Bug is updated w/ example output, but here's a teaser:


{
*SNIP*
"gluster.nfsd.inter.fop.removexattr.latency_ave_usec": "0.00",
"gluster.nfsd.inter.fop.removexattr.latency_min_usec": "0.00",
"gluster.nfsd.inter.fop.removexattr.latency_max_usec": "0.00",
"gluster.nfsd.inter.fop.opendir.per_sec": "2.60",
"gluster.nfsd.inter.fop.opendir.latency_ave_usec": "1658.92",
"gluster.nfsd.inter.fop.opendir.latency_min_usec": "715.00",
"gluster.nfsd.inter.fop.opendir.latency_max_usec": "7179.00",
"gluster.nfsd.inter.fop.fsyncdir.per_sec": "0.00",
"gluster.nfsd.inter.fop.fsyncdir.latency_ave_usec": "0.00",
"gluster.nfsd.inter.fop.fsyncdir.latency_min_usec": "0.00",
"gluster.nfsd.inter.fop.fsyncdir.latency_max_usec": "0.00",
"gluster.nfsd.inter.fop.access.per_sec": "43.19",
"gluster.nfsd.inter.fop.access.latency_ave_usec": "323.51",
"gluster.nfsd.inter.fop.access.latency_min_usec": "144.00",
"gluster.nfsd.inter.fop.access.latency_max_usec": "6639.00",
"gluster.nfsd.inter.fop.create.per_sec": "0.00",
*SNIP*
}

There's also aggregate counters which track from process birth to death which 
are exported as well.

Richard



From: Ben England [bengl...@redhat.com]
Sent: Tuesday, September 22, 2015 11:04 AM
To: Richard Wareing
Cc: gluster-devel@gluster.org
Subject: Re: [Gluster-devel] Feature: FOP Statistics JSON Dumps

Richard, what's great about your patch (besides lockless counters) is:

- JSON easier to parse (particularly in python).  Compare to parsing "gluster 
volume profile" output, which is much more difficult.  This will enable tools 
to display profiling data in a user-friendly way.  Would be nice if you 
attached a sample output to the bz 1261700.

- client side capture - io-stats translator is at the top of the translator 
stack so we would see latencies just like the application sees them.  "gluster 
volume profile" provides server-side latencies but this can be deceptive and 
fails to report "user experience" latencies.

I'm not that clear on the UI for it, would be nice if "gluster volume " command 
could be set up to automatically poll this data at a fixed rate like many other 
perf utilities (example: iostat), so that user could capture a Gluster profile 
over time with a single command; at present the support team has to give them a 
script to do it.  This would make it trivial for a user to share what their 
application is doing from a Gluster perspective, as well as how Gluster is 
performing from the client's perspective./usr/sbin/gluster utility can run 
on the client now since it is in gluster-cli RPM right?

So in other words it would be great to replace this:

gluster volume profile $volume_name start
gluster volume profile $volume_name info > /tmp/past
for min in `seq 1 $sample_count` ; do
  sleep $sample_interval
  gluster volume profile $volume_name info
done > gvp.log
gluster volume profile $volume_name stop

With this:

gluster volume profile $volume_name $sample_interval $sample_count > gvp.log

And be able to run this command on the client to use your patch there.

thx

-ben

- Original Message -
> From: "Richard Wareing" 
> To: gluster-devel@gluster.org
> Sent: Wednesday, September 9, 2015 10:24:54 PM
> Subject: [Gluster-devel] Feature: FOP Statistics JSON Dumps
>
> Hey all,
>
> I just uploaded a clean patch for our FOP statistics dump feature @
> https://bugzilla.redhat.com/show_bug.cgi?id=1261700 .
>
> Patches cleanly to v3.6.x/v3.7.x release branches, also includes io-stats
> support for intel arch atomic operations (ifdef'd for portability) such that
> you can collect data 24x7 with a negligible latency hit in the IO path.
> We've been using this for quite sometime and there appeared to have been
> some interest at the dev summit to have this in mainline; so here it is.
>
> Take a look, and I hope you find it useful.
>
> Richard
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Feature: Automagic split-brain resolution for v3.6.x

2015-09-10 Thread Richard Wareing
Ok, one more patch and then I'm taking a break from porting for a bit :).  This 
one we've been using for a few years (in some form or another) and really 
helped allow folks embrace GlusterFS; at our scale workflows cannot be 
interrupted for nearly any reason...using heuristics to resolve split-brain is 
almost always preferable to human intervention.

Here's the patch summary from the diff; bugzilla bug @ 
https://bugzilla.redhat.com/show_bug.cgi?id=1262161 :

=
Summary:
- Split-brain sucks, and generally they are trivial to resolve based on
  some very basic heuristics: time, size or majority.  This patch
  introduces cluster.favorite-child-by-[ctime|mtime|size|majority]
  options which do just this.
- It's important to note that although AFR2 has some un-split
  capabilities, they don't really handle the more common case (by our
  experience) where the change logs are truely in conflict.  This patch
  does handle these cases.
- Added A/B tests for the features such that they verify WITHOUT the
  feature enabled the file is indeed split-brained, then again test with the
  feature enabled to show it is readable and the correct file was
  chosen.  This should fix the problem where in v3.6 we falsely believed
  that split-brain resolution was implemented when in fact it was not.


Patches cleanly to release-3.6 branch, and with a little massaging this patch 
can be made to work with v3.7.x .

Enjoy!

Richard

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Feature: Tunable FOP sampling for v3.6.x/v3.7.x

2015-09-10 Thread Richard Wareing
Hello again,

Following up on the FOP statistics dump feature, here's our FOP sampling patch 
as well.  This feature allows you to sample a 1:N ratio of FOPs, such that they 
can be later analyzed to track down mis-behaving clients, calculate P99/P95 FOP 
service times, audit traffic and probably other things I'm forgetting to 
mention.


The patch can be had @ https://bugzilla.redhat.com/show_bug.cgi?id=1262092 (it 
does require the FOP stats dump patch to work, so patch that first!)


Here's the details from the patch commit description:



debug/io-stats: FOP sampling feature

- Using sampling feature you can record details about every Nth FOP.
  The fields in each sample are: FOP type, hostname, uid, gid, FOP priority,
  port and time taken (latency) to fufill the request.
- Implemented using a ring buffer which is not (m/c) allocated in the IO path,
  this should make the sampling process pretty cheap.
- DNS resolution done @ dump time not @ sample time for performance w/
  cache
- Metrics can be used for both diagnostics, traffic/IO profiling as well
  as P95/P99 calculations
- To control this feature there are two new volume options:
  diagnostics.fop-sample-interval - The sampling interval, e.g. 1 means
  sample every FOP, 100 means sample every 100th FOP
  diagnostics.fop-sample-buf-size - The size (in bytes) of the ring
  buffer used to store the samples.  In the even more samples
  are collected in the stats dump interval than can be held in this buffer,
  the oldest samples shall be discarded.  Samples are stored in the log
  directory under /var/log/glusterfs/samples.
- Uses DNS cache written by sshre...@fb.com (Thank-you!), the DNS cache
  TTL is controlled by the diagnostics.stats-dnscache-ttl-sec option
  and defaults to 24hrs.

===

Thanks go to David Hasson for reviewing the code at our end, and Shreyas 
Siravara for his (high performance) DNS cache implementation.  The direction we 
(and by "we" I really mean Shreyas) plan on taking this work is load 
shaping/throttling based on host prefixes, uids, gids etc since this patch 
exposes this information in a concise and manner which is out of the IO path.

Richard
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Next v3.6 maint release suggested commits

2015-09-10 Thread Richard Wareing
Greetings,

Just a heads up, there's a very nasty deadlock bug w/ multi-core e-poll in 
v3.6.x (latest branch) which appear to be due to DHT doing naughty things (like 
racing among other things).  So to fix these issues we had to pull in these 
(trivial cherry-pick) commits from master:

565ef0d826b356d188028410575da1f0fa9416b7
5a0aced9802d12f45cebf1b318e20676395570f9
b387c485c069c201ddabebe00c7176f960f4be32
2691a650f94ed9f82ecb8dda5d9b12f52147dc51
532566802b58d847d1ff1818ea857b583049ee76
50a92b2d72a43a940b0e38340d161fa7687a7104
df83769d4c6c6bb13df70751640b2a90af0064d7
9f8b4c2ef3523f91213a00edec863f322e92a6d5
68dfe137c7dfa08fb601257309f3db7e3ce6e4e8
e57a5eef4ea0da579986e9b61fec47a50ddf2b8a
2a3b7086957b095a5bbb5325aab0f765496e8491
393de0d7fd4309f4262a8747183f164e8f6d23c1
48aa51d49800bafbad5825165d4798cbbe592f5a
4fcf231aac51668742114320068b1570cee75667
8ac26cf2302dc452d7448a769b6b117dbdaaf05c
40da6554acabedebc2259a8c867159e41e1079c2
39bbf0ed97c3a1e8bf2966b2561f964013cde606
20a014332504efbc0204a591cad257abb167fce7
83b65187148e1dd145752c61660b8e5902e9a94f
83b65187148e1dd145752c61660b8e5902e9a94f
d60943ff2f6d58c129a66308f29986b471762210
6588d8f2a8975f032cb1567eef5ed35e5f992357
4ac9399be4f620abdaeee1054a48458c8238b907
a7e30beef3ab41985e7435dc1a90cde30e07dc47

(yes this is a lot of commits...but it fixes a lot of brokenness :) ).

So for the next maintenance release it might be wise to include these fixes to 
better the experience for v3.6.x users out there who do not maintain their own 
builds.  Without these multi-core epoll under high load prone to deadlocks 
(repro'able by rsync'ing [rsync -avR --inplace] a 100k or so small files from 
one tmpfs mount to another tmpfs backed GFS cluster on a development/test 
machine).  Not sure if it's worth filing a bug report since the issue has 
clearly been fixed in newer releases.

Richard

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Feature: FOP Statistics JSON Dumps

2015-09-09 Thread Richard Wareing
Hey all,

I just uploaded a clean patch for our FOP statistics dump feature @ 
https://bugzilla.redhat.com/show_bug.cgi?id=1261700 .

Patches cleanly to v3.6.x/v3.7.x release branches, also includes io-stats 
support for intel arch atomic operations (ifdef'd for portability) such that 
you can collect data 24x7 with a negligible latency hit in the IO path.  We've 
been using this for quite sometime and there appeared to have been some 
interest at the dev summit to have this in mainline; so here it is.

Take a look, and I hope you find it useful.

Richard

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GlusterD 2.0 status updates

2015-09-08 Thread Richard Wareing
@Kaushal:

WRT to thrift SSL and non-blocking IO are supported, I'll find out where the C 
implementation stands, it's possible patches haven't made it upstream for some 
reason.  It's also good to know there is no plan to move away from SUN-RPC for 
brick<->client comms.

On concurrency & python, I think this is where I'd make the argument that if 
the multi-threaded capabilities of the high-level language are a problemor 
even a concern, those features should be done in C as it's clearly a feature 
where performance is critical; as such doesn't meet that bar for using the 
high-level language which is to be used in cases where performance isn't an 
issue.  We should also consider that features or code-paths which are slow (but 
deemed acceptably so) due to the higher-order language choice at 20-50 bricks, 
may not be the case at 1000, or 10,000 bricks; so these things should be 
carefully considered (which so far they seem to be which is good).

To clear, I think Go is a great choice based on the reasons you've cited, so 
long as there's a plan in place to re-write the Python components and be 
comfortable with the limited pool of developers which are likely to contribute 
(which is admittedly already a problem given GlusterFS core is in C; though I 
suspect there are way more C coders out there than Go).  If however, this isn't 
realistically possible then I think it needs another look IMHO.

@Atin, on being language agnostic/pluggable, I think it makes sense from an 
_external_ (feature/plugin) POV, but within the core code-base things should be 
cohesive and done in a consistent manner (from language selection, RPC 
frameworks, testing, doc style etc).  Consider the nightmare of building a 
automated build/test infra for a project which uses 3 or 4 languages?  Library 
management?  Tracking down/fixing bugs in the core lang libs?  

Also, was there any reason why C++ (11/14) wasn't considered as well?  It's 
kind of a nice middle ground between Python/Go and C, as it has lots of the 
higher-level features, rich libraries, mature tool chains, massive developer 
pool, support both Thrift (_very_ mature) and Protocol Buffers.

Richard



From: Kaushal M [kshlms...@gmail.com]
Sent: Monday, September 07, 2015 5:20 AM
To: Richard Wareing
Cc: Atin Mukherjee; Gluster Devel
Subject: Re: [Gluster-devel] GlusterD 2.0 status updates

Hi Richard,
Thanks a lot for you feedback. I've done my replies inline.

On Sat, Sep 5, 2015 at 5:46 AM, Richard Wareing  wrote:
> Hey Atin (and the wider community),
>
> This looks interesting, though I have a couple questions:
>
> 1. Language choice - Why the divergence from Python (which I'm no fan of) 
> which is already heavily used in GlusterFS?  It seems a bit strange to me to 
> introduce yet another language into the GlusterFS code base.  Doing this will 
> make things less cohesive, harder to test, make it more difficult for 
> contributors to understand the code base and improve their coding skills to 
> be effective contributors.  I'm a bit concerned we are setting a precedent 
> that development will switch to the new flavor of the day.  If a decision has 
> been made to shift away from Python for the portions of GlusterFS where 
> performance isn't a concern, will the portions currently written in Python be 
> re-written as well?  I also question the wisdom of a language will a shallow 
> developer pool, and a less development process (somewhat of an ironic choice 
> IMHO).
>

One of our aims for GlusterD-2.0 was to switch to a higher level
language. While C is good for solving lower level performance critical
problems, it isn't best suited for the kind of management tasks we
want GlusterD to focus on. The choice of Go over Python as the higher
level language, was mainly driven by the following
- Go is easier to a hang of for a developer coming from a C
background. IMO for a developer new to both Go and Python, it's easier
to start producing working code in Go. The structure and syntax of the
language and the related tools make it easier.
- Go has built into the language support (think goroutines, channels)
for easily implementing the concurrency patterns that we have in the
current GlusterD codebase. This makes it easier for us think about
newer designs based on our understanding of the existing
implementation.
- We have concerns about the concurrency and threading capabilities of
Python. We have faced a lot of problems with doing concurrency and
threading in GlusterD (though this is mostly down to bad design).
Python has known issues with threading, which doesn't give us
confidence as python novices.
- Go has a pretty good standard library (possibly the best standard
library), which provides us with almost everything required. This
reduces the number of dependencie

Re: [Gluster-devel] GlusterD 2.0 status updates

2015-09-04 Thread Richard Wareing
Hey Atin (and the wider community),

This looks interesting, though I have a couple questions: 

1. Language choice - Why the divergence from Python (which I'm no fan of) which 
is already heavily used in GlusterFS?  It seems a bit strange to me to 
introduce yet another language into the GlusterFS code base.  Doing this will 
make things less cohesive, harder to test, make it more difficult for 
contributors to understand the code base and improve their coding skills to be 
effective contributors.  I'm a bit concerned we are setting a precedent that 
development will switch to the new flavor of the day.  If a decision has been 
made to shift away from Python for the portions of GlusterFS where performance 
isn't a concern, will the portions currently written in Python be re-written as 
well?  I also question the wisdom of a language will a shallow developer pool, 
and a less development process (somewhat of an ironic choice IMHO).

2. RPC framework - What's the reasoning behind using Protocol Buffers vs 
Thrift?  I'm admittedly biased here (since it's heavily used here at FB), 
however Thrift supports far more languages, has a larger user-base, features 
better data structure support, has exceptions and has a more open development 
process (it's an Apache project).  It's mentioned folks are "uncomfortable" 
with GLib, exactly why?  Has anyone done any latency benchmarks on the 
serialization/de-serialization to ensure we don't shoot ourselves in the foot 
by moving away from XDR for brick<->client/gNFSd communication?  The low 
latency communication between bricks & clients is to me a _critical_ component 
to GlusterFS's success; adding weight to the protocol or (worse) making it 
easier to add weight to me is unwise.

So far things are moving towards 3-4 languages (Python, C, Go, sprinkle of 
BASH) and 2 RPC frameworks.  No language or RPC mechanism is perfect, but the 
proficiency of the coder at the keyboard is _far_ more important.  IMHO we 
should focus on 1 low level high-performance language (C) and 1 higher level 
language for other components where high performance isn't required (geo-rep, 
glusterd etc), as it will encourage higher proficiency in the chosen languages 
and less fractured knowledge amongst developers.

My 2 cents.

Richard



From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] on 
behalf of Atin Mukherjee [amukh...@redhat.com]
Sent: Monday, August 31, 2015 10:04 PM
To: Gluster Devel
Subject: [Gluster-devel] GlusterD 2.0 status updates

Here is a quick summary of what we accomplished over last one month:

1. The skeleton of GlusterD 2.0 codebase is now available @ [1] and the
same is integrated with gerrithub.

2. Rest end points for basic commands like volume
create/start/stop/delete/info/list have been implemented. Needs little
bit of more polishing to strictly follow the heketi APIs

3. Team has worked on coming up with a cross language light weight RPC
framework using pbrpc and the same can be found at [2]. The same also
has pbcodec package which provides a protobuf based rpc.ClientCodec and
rpc.ServerCodec that can be used with rpc package in Go's standard library

4. We also worked on the first cut of volfile generation and its
integrated in the repository.


The plan for next month is as follows:

1. Focus on the documentation along with publishing the design document
2. Unit tests
3. Come up with the initial design & a basic prototype for transaction
framework.

[1] https://github.com/kshlm/glusterd2
[2] https://github.com/kshlm/pbrpc

Thanks,
Atin
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://urldefense.proofpoint.com/v1/url?u=http://www.gluster.org/mailman/listinfo/gluster-devel&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=eSXH2j44tGdWnHEbaIk1Tg%3D%3D%0A&m=%2Fqy%2B94CzpNhZn9QOWkfc%2FZkbPJPDiR9uYJNVtG%2BgZPA%3D%0A&s=68e118c111403736815ea0ddf1c756a6c66800a66cbc5e1d14e0586c24ceb695
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster IPv6 bugfixes (Bug 1117886)

2015-06-15 Thread Richard Wareing
Hey Nithin,

We have IPv6 going as well (v3.4.x & v3.6.x), so I might be able to help out 
here and perhaps combine our efforts.  We did something similar here, however 
we also tackled the NFS side of the house, which required a bunch of changes 
due to how port registration w/ portmapper changed in IPv6 vs IPv4.  You 
effectively have to use "libtirpc" to do all the port registrations with IPv6.

We can offer up our patches for this work and hopefully things can be combined 
such that end-users can simply do "vol set  transport-address-family 
" and voila they have whatever support they desire.

I'll see if we can get this posted to bug 1117886 this week.

Richard




From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] on 
behalf of Nithin Kumar Dabilpuram [nithind1...@yahoo.in]
Sent: Saturday, June 13, 2015 9:12 PM
To: gluster-devel@gluster.org
Subject: [Gluster-devel] Gluster IPv6 bugfixes (Bug 1117886)




Hi,

Can I contribute to this bug fix ? I've worked on Gluster IPv6 functionality 
bugs in 3.3.2 in my past organization and was able to successfully bring up 
gluster on IPv6 link local addresses as well.

Please find my work in progress patch. I'll raise gerrit review once testing is 
done. I was successfully able to create volumes with 3 peers and add bricks. 
I'll continue testing other basic functionality and see what needs to be 
modified. Any other suggestions ?

Brief info about the patch:
Here I'm trying to use "transport.address-family" option in 
/etc/glusterfs/glusterd.vol file and then propagate the same to server and 
client vol files and their translators.

In this way when user mentions "transport.address-family inet6" in its 
glusterd.vol file, all glusterd servers open AF_INET6 sockets and then the same 
information is stored in glusterd_volinfo and used when generating vol config 
files.

-thanks
Nithin


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster Coreutils

2015-06-15 Thread Richard Wareing
Hey all,

I'm Craig's manager for the duration of his internship @ FB, so I thought I'd 
better chime in here :).  As Craig mentioned, our project plan is to implement 
C-based CLI utilities similar to what we have for NFS CLI utilities (we'll be 
open sourcing this in the coming days so you can see what we have there).  Our 
time to completion is very aggressive for a project coded in C (~10 weeks or 
less), so we need to keep our project goals very focused (feature creep, 
architecture rabbit holes etc) to keep it on schedule and come out with a 
working "product" that scales to our use-cases.

As such, rather than have "too many cook in the kitchen", I'd propose the 
initial implementation of ls/tail/cat/put/mkdir/rm/stat/cp be implemented by 
Craig, and we'll put the code up on the public repo as early as possible for 
public review and comment.  We've definitely heard some great ideas from the 
list, and we've changed our approach to include:

1. Shell based interaction
2. Single binary (behavior changes based on symlink; any cons to this?)
3. Separate code repo/packaging (we'll probably go C99 because of this since we 
are no longer bound by the legacy C89 standards of the GFS repo).

On the language debate (Python vs Go vs C), rather than get into this debate 
(which has good arguments on all sides) I'd simply encourage anyone who feels 
strongly about implementing the CLI in a different language to simply do so!  
More choice and competition among tools is never a bad thing, as it will only 
make them both better.  Suffice it to say though, we feel strongly about the C 
based implementation so we are taking it on to provide parity with our NFS CLI 
utilities (which I'm working to get open-sourced in the coming weeks).

And finally, with respect to scaling: we feel reasonably confident we can scale 
the tools without any sort of proxy to work on the order of 100's of 
simultaneous CLI clients hitting a cluster.  We are going to experiment with 
some "proxy" approaches others to try to scale this to 1000's of CLI clients 
into a single cluster.  Though, the proxy will be optional as we feel the 
simplicity/reliability of a standalone client (hitting a standard load-balanced 
or DNS RR end point) will work well for vast majority of users (envision a "-p 
" option in the tools for those who need to really scale them).

Richard


From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] on 
behalf of Craig Cabrey [craigcab...@fb.com]
Sent: Monday, June 15, 2015 10:50 AM
To: Poornima Gurusiddaiah
Cc: gluster-devel@gluster.org
Subject: Re: [Gluster-devel] Gluster Coreutils

The initial versions of the commands will establish independent connections to 
a cluster node (specified by a Gluster URL scheme). Then I think I could evolve 
the code to implement the shell idea that was in the design doc. Then, if I 
have time left, I could try to tackle the daemon that would keep connections 
open (whether this would be a client side or a server side proxy is yet to be 
determined). I'm leaving that for the end since that would require the most 
design and I don't want to end up going down a rabbit hole at this point trying 
to fight DBUS, for example.

I have been reading through the document posted earlier to get a sense of the 
overall direction the project should move in. As for the language, I already 
have functioning code in C and that's a requirement of my project, so that's 
what I'm moving forward with.

Craig

> On Jun 15, 2015, at 12:04 AM, Poornima Gurusiddaiah  
> wrote:
>
> Hi Craig,
>
> That's cool! I was more interested in knowing how you plan to implement the 
> commands.
> To be specific, do you plan to connect/disconnect(glfs_init/glfs_fini) to the 
> gluster
> server for each command or persist the connection across commands?
>
> Regards,
> Poornima
>
> - Original Message -
>> From: "Craig Cabrey" 
>> To: "Joe Julian" 
>> Cc: gluster-devel@gluster.org
>> Sent: Monday, June 15, 2015 3:19:07 AM
>> Subject: Re: [Gluster-devel] Gluster Coreutils
>>
>> I've already started writing the utilities in C per my internship project.
>> I'll push these up when ready (most probably sometime this week) as a POC.
>>
>> Maybe then we can look into implementing with Python?
>>
>> Craig
>>
>>> On Jun 14, 2015, at 2:47 PM, Joe Julian  wrote:
>>>
>>> I was thinking the other way around. Write it in python then optimize if
>>> it's necessary.
 On 06/14/2015 02:45 PM, chris holcombe wrote:
 Maybe we could write these in C and setup python bindings for them.
 Thoughts?  I'm down with writing them in C.  I could use more practice.

> On 06/14/2015 02:36 PM, Joe Julian wrote:
> I would prefer python.
>
>> On 06/14/2015 11:18 AM, Niels de Vos wrote:
>>> On Sat, Jun 13, 2015 at 06:45:45PM +0530, M S Vishwanath Bhat wrote:
>>> On 12 June 2015 at 23:59, chris holcombe 
>>> wrote:
>>>
 Yeah I have this r