Re: [Gluster-devel] NetBSD regression tests: reviews required

2014-12-03 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> This is a friendly reminder that I sill have the following pending:
> 
> > http://review.gluster.com/9071
> > http://review.gluster.com/9075
> > http://review.gluster.com/9074
> > http://review.gluster.com/9216  [2]
> > http://review.gluster.com/9217
> > http://review.gluster.com/9219
> > http://review.gluster.com/9220
`
And let me add an important one that fixes complete failures of
triggered netbsd regression:
http://review.gluster.org/9232

This improves Linus stat(1) emulation so that it can handle multiple
files. The first test in regression test suite now uses that, and what
is currently in tree just badly hang.


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] gluster buildrpms does not work for Centos 6.5

2014-12-03 Thread Prakash Madiraju
On both master and release-3.6 branch, I am not able to build gluster rpms
on CentOS 6.5.

>>cd extras/LinuxRPM
>>make glusterrpms

+ cp -pr extras/clear_xattrs.sh
../glusterfs/extras/LinuxRPM/rpmbuild/BUILDROOT/glusterfs-3.6.1-0.23.git471292e.el6.x86_64/usr/share/doc/glusterfs-server-3.6.1
+ exit 0


RPM build errors:
File not found:
/extras/LinuxRPM/rpmbuild/BUILDROOT/glusterfs-3.6.1-0.23.git471292e.el6.x86_64/etc/init.d/glusterd
make: *** [rpms] Error 1

Any help ??

Thanks
Prakash
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Introducing gdash - A simple GlusterFS dashboard

2014-12-03 Thread Aravinda

Hi All,

I created a small local installable web app called gdash, a simple 
dashboard for GlusterFS.


"gdash is a super-young project, which shows GlusterFS volume 
information about local, remote clusters. This app is based on 
GlusterFS's capability of executing gluster volume info and gluster 
volume status commands for a remote server using --remote-host option."


It is very easy to install using pip or easy_install.

Check my blog post for more in detail(with screenshots).
http://aravindavk.in/blog/introducing-gdash/

Comments and Suggestions Welcome.

--
regards
Aravinda
http://aravindavk.in


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Glusterd 'Management Volume' proposal

2014-12-03 Thread Jeff Darcy
> 1) With the current scheme in glusterd the O(N^2) is because the
> configuration is replicated to every peer in the cluster, correct?

No, the O(n^2) behavior is for the probe/heartbeat stuff.  Config
replication is only O(n) but it's problematic because it doesn't
handle partitions and consistency very well.

> - We do have the limitation now that some clients _may_ not have got the
> latest graph (one of the configuration items here), with the new
> proposal, is there any thought regarding resolving the same? Is it
> required? I assume brick nodes, have this strict enforcement today as
> well as in the future.

What I would expect is that the *servers* hear about updates through
some sort of "watch" mechanism, then each is responsible for notifying
its own clients.  Note that a client which is connected to multiple
servers might therefore get multiple notifications for the same event,
so we need to recognize that a "change" to the same graph as before is
a no-op and respond accordingly (which I believe we already do).

> 2) With a >1000 node setup, is it intended that we have a cascade
> functionality to handle configuration changes? I.e there are a defined
> set of _watchers_ to the configuration cluster, and each in turn serve a
> set of peers for their _watch_ functionality?
> 
> This maybe an overkill (i.e requiring cascading), but is it required
> when we consider cases like Geo-rep or tiers in different data centers
> etc. that need configuration updates and all of them watching the
> configuration cluster maybe a problem requiring attention?

Cascading seems like overkill as long as we're talking about simple
notification and not some more complex sort of thing like 2PC.  A
single config server notifying 1000 other servers directly shouldn't
be all that big a deal.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] small-file performance feature sub-pages

2014-12-03 Thread Ben England
I've expanded specification of two proposals for improving small-file 
performance as new feature pages, referenced at the bottom of this list (not in 
priority order I hope?).  Could we possibly review these proposals at a 
gluster.org meeting this month?

http://www.gluster.org/community/documentation/index.php/Features#Proposed_Features.2FIdeas

new feature pages under this page are:

http://www.gluster.org/community/documentation/index.php/Features/stat-xattr-cache
 - proposed enhancement to POSIX translator for small-file performance
http://www.gluster.org/community/documentation/index.php/Features/composite-operations
 - changes to reduce round trips for small-file performance

Specifically, the stat-xattr-cache proposal does not require Gluster 4.0 - it 
could be implemented today.  These pages are referenced by Features/Planning40 
page and also by the Features/Feature_Smallfile_Perf page.

comments and feedback are appreciated.  There have been other related proposals 
from Rudra Siva concerning round-trip reduction in 
http://supercolony.gluster.org/pipermail/gluster-devel/2014-November/042741.html
 .

-ben
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] snapshot restore and USS

2014-12-03 Thread Vijay Bellur

On 12/01/2014 05:36 PM, Raghavendra Bhat wrote:

What we can do is, while sending lookup on .snaps (again, say "ls
/dir/.snaps") within the dict add a key, which snapview-server can look
for. That key is kinda hint from snapview-client to the snapview-server
that the parent gfid of this particular lookup call exists and valid
one. When snapview-server gets lookup as part of resolution from
protocol/server on the parent gfid, it can look at the dict for the key.
If the key is set, then simply return success to that lookup.

With the above way we can handle many situations such as this:
Entering .snaps from a directory which is created after taking the
latest snapshot.

Please provide feedback on the above approach (the hint being set in the
dict).


Looks good to me.

Regards,
Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Glusterd 'Management Volume' proposal

2014-12-03 Thread Shyam
Top posting as these are mostly queries, than a comment on MV as 
described below.


1) With the current scheme in glusterd the O(N^2) is because the 
configuration is replicated to every peer in the cluster, correct?


- In the new approach (either MV or otherwise), the idea is to maintain 
a configuration cluster, or a set of nodes that have configuration 
related information in them, correct?


- The rest of the peers get the latest configuration as this changes, 
(which is the watch functionality that Jeff brings out) this part of the 
requirement is not covered in the proposal. Would help if this is 
elaborated as well.


- We do have the limitation now that some clients _may_ not have got the 
latest graph (one of the configuration items here), with the new 
proposal, is there any thought regarding resolving the same? Is it 
required? I assume brick nodes, have this strict enforcement today as 
well as in the future.


2) With a >1000 node setup, is it intended that we have a cascade 
functionality to handle configuration changes? I.e there are a defined 
set of _watchers_ to the configuration cluster, and each in turn serve a 
set of peers for their _watch_ functionality?


This maybe an overkill (i.e requiring cascading), but is it required 
when we consider cases like Geo-rep or tiers in different data centers 
etc. that need configuration updates and all of them watching the 
configuration cluster maybe a problem requiring attention?


Onto MV proposal,
- Using a smaller, pure replicate, gluster volume, sans a few xlators, 
with locking enforced by the consumers of the same seems like a good way 
to solve the replication, consistency and hence availability of the 
configuration information.


- And as you mention, a POSIX-y interface and an application on top of 
the same, seems heavy weight for a key-value store, that a configuration 
volume presents.


- We still need watcher functionality and possibly cascading support

I am _not_ well aware of the internals of etcd (or other frameworks 
being discussed) to compare what we can leverage from the same and what 
functionality it lacks, or the production worthiness of the code.


Going by your initial statements on the same, the concern seems to be 
dependency on another component, in terms of releases and required bug 
fixes. I would go on to state that if the infrastructure is production 
ready, then managing the dependency would be relatively easier. The real 
challenge would be how much effort needs to be spent understanding the 
internals and if we need to do the same, for us to be able to support 
this in gluster deployments. Any clues or ideas on this, to help make a 
decision?


Shyam

On 11/19/2014 02:22 AM, Krishnan Parthasarathi wrote:

All,

We have been thinking of many approaches to address some of Glusterd's 
correctness
(during failures and at scale) and scalability concerns. A recent email thread 
on
Glusterd-2.0 was along these lines. While that discussion is still valid, we 
have been
considering dogfooding as a viable option to solve our problems. This is not 
the first
time this has been mentioned but for various reasons didn't really take off. 
The following
proposal solves Glusterd's requirement for a distributed (consistent) store 
using a GlusterFS
volume. Then who manages that GlusterFS volume? To find answers for that and 
more
read further.

[The following content is also available here: 
https://gist.github.com/krisis/945e45e768ef1c4e446d
Please keep the discussions on the mailing list and _not_ in github, for 
tractibility
reasons.]


##Abstract

Glusterd, the management daemon for GlusterFS, maintains volume and cluster
configuration store using an home-grown replication algorithm. Some shortcomings
are as follows.

- Involves O(N^2) (in number of nodes) network messages to replicate
   configuration changes for every command

- Doesn't rely on quorum and not resilient to network partitions

- Recovery of nodes that come back online can choke the network at scale

The thousand node glusterd proposal[1], one of the more mature proposals
addressing the above problems, recommends use of a consistent distributed
stores like consul/etcd for maintaining the volume and cluster configuration.
While the technical merits of this approach make it compelling the operational
challenges like coordinating between the two communities for releases and
bug-fixes could get out of hand.  An alternate approach[2] is to use a
replicated GlusterFS volume as the distributed store instead. The remainder of
this email explains how a GlusterFS volume could be used to store configuration
information.


##Technical details

We will refer to the replicated GlusterFS volume used for storing configuration
as the Management volume (MV). The following section describes how MV would be
managed.


###MV management

To begin with we can restrict the MV to a pure replicated volume with a maximum
of 3 bricks on 3 different nodes[3]. The brick path can be 

Re: [Gluster-devel] NetBSD regression tests: reviews required

2014-12-03 Thread Emmanuel Dreyfus
On Mon, Dec 01, 2014 at 05:49:54AM +0100, Emmanuel Dreyfus wrote:
> Here is the latest list of NetBSD fixes for regression tests:

Hi

This is a friendly reminder that I sill have the following pending:

> http://review.gluster.com/9071
> http://review.gluster.com/9075
> http://review.gluster.com/9074
> http://review.gluster.com/9216  [2]
> http://review.gluster.com/9217
> http://review.gluster.com/9219
> http://review.gluster.com/9220
> 
> [2] Here I fix the symptom rather than the cause. Hints are welcome to
> help fixing the cause, but perhaps the symptom fix could be merged as an
> interim solution so that glustershd stops crashing during the test.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious regression of tests/basic/mgmt_v3-locks.t

2014-12-03 Thread Atin Mukherjee


On 12/03/2014 07:36 PM, Justin Clift wrote:
> On Tue, 02 Dec 2014 10:05:36 +0530
> Atin Mukherjee  wrote:
> 
>> Its on my radar, I am in progress of analysing it. The last patch set
>> was on the clean up part of the test cases, I felt the changes could
>> have solved it, but I am afraid it didn't. I tried to reproduce it
>> multiple times in my local set up but couldn't. Initial analysis finds
>> out that one of glusterd node got disconnect when multiple volume set
>> transactions were in progress, reason is still unknown.  I will keep
>> you posted once I find any significant details.
> 
> The worrying thing (to me), is that this could be a bug that's happening
> for people in real world usage, and not just something happening in our
> regression testing.
Agreed Justin. It looks like a bug only.
I found something significant now. I wrote a small script to execute two
volume set commands in parallel (for two different volumes) in a loop of
1000 iterations and I figured out glusterd hung after few iterations.
Glusterd was not dead but it didn't respond at all.

Following is the backtrace:

Thread 6 (Thread 0x7f6f6ecf5700 (LWP 30417)):
#0  0x7f6f76d67fbd in nanosleep () from /lib64/libpthread.so.0
#1  0x7f6f77a04014 in gf_timer_proc (ctx=0x1396010) at timer.c:170
#2  0x7f6f76d60f33 in start_thread () from /lib64/libpthread.so.0
#3  0x7f6f766a7ded in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f6f6e4f4700 (LWP 30418)):
#0  0x7f6f76d684f1 in sigwait () from /lib64/libpthread.so.0
#1  0x004079e7 in glusterfs_sigwaiter (arg=) at
glusterfsd.c:1728
#2  0x7f6f76d60f33 in start_thread () from /lib64/libpthread.so.0
#3  0x7f6f766a7ded in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f6f6dcf3700 (LWP 30419)):
#0  0x7f6f76d6759d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f6f76d63194 in _L_lock_874 () from /lib64/libpthread.so.0
#2  0x7f6f76d63093 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x7f6f766e2612 in _dl_addr () from /lib64/libc.so.6
#4  0x7f6f766b67d5 in backtrace_symbols_fd () from /lib64/libc.so.6
#5  0x7f6f779fdc88 in gf_backtrace_fillframes (
buf=buf@entry=0x13eb280 "(-->
/usr/local/lib/libglusterfs.so.0(synctask_yield+0x2c)[0x7f6f77a21cac]
(--> /usr/local/lib/libglusterfs.so.0(+0x57dc9)[0x7f6f77a21dc9] (-->
/usr/local/lib/libglusterfs.so.0(synclock_lock+0x16)[0x7"...) at
common-utils.c:3335
---Type  to continue, or q  to quit---
#6  0x7f6f77a03631 in gf_backtrace_save (
buf=buf@entry=0x13eb280 "(-->
/usr/local/lib/libglusterfs.so.0(synctask_yield+0x2c)[0x7f6f77a21cac]
(--> /usr/local/lib/libglusterfs.so.0(+0x57dc9)[0x7f6f77a21dc9] (-->
/usr/local/lib/libglusterfs.so.0(synclock_lock+0x16)[0x7"...) at
common-utils.c:3391
#7  0x7f6f77a21cac in synctask_yield (task=task@entry=0x13eadf0) at
syncop.c:296
#8  0x7f6f77a21dc9 in __synclock_lock (lock=lock@entry=0x13e0138) at
syncop.c:760
#9  0x7f6f77a24586 in synclock_lock (lock=lock@entry=0x13e0138) at
syncop.c:784
#10 0x7f6f6ca3242d in glusterd_big_locked_cbk (req=0x13e34ec,
iov=0x0, count=0,
myframe=0x7f6f759ff24c, fn=0x7f6f6ca766d0 <_gd_syncop_stage_op_cbk>)
at glusterd-rpc-ops.c:207
#11 0x7f6f777be9ff in rpc_clnt_submit (rpc=rpc@entry=0x13e3260,
prog=prog@entry=0x7f6f6cce9ca0 , procnum=procnum@entry=3,
cbkfn=cbkfn@entry=0x7f6f6ca75330 ,
proghdr=proghdr@entry=0x17eb970, proghdrcount=proghdrcount@entry=1,
progpayload=progpayload@entry=0x0,
progpayloadcount=progpayloadcount@entry=0,
iobref=iobref@entry=0x7f6f4bd4a9c0, frame=frame@entry=0x7f6f759ff24c,
rsphdr=rsphdr@entry=0x0, rsphdr_count=rsphdr_count@entry=0,
rsp_payload=rsp_payload@entry=0x0,
rsp_payload_count=rsp_payload_count@entry=0,
rsp_iobref=rsp_iobref@entry=0x0) at rpc-clnt.c:1601
#12 0x7f6f6ca76470 in gd_syncop_submit_request (rpc=0x13e3260,
req=req@entry=0x7f6f60239830, local=local@entry=0x17ebaa0,
cookie=cookie@entry=0x13e2770, prog=0x7f6f6cce9ca0 ,
procnum=procnum@entry=3, cbkfn=cbkfn@entry=0x7f6f6ca75330
,
---Type  to continue, or q  to quit---
xdrproc=0x7f6f775a4910 ) at
glusterd-syncop.c:198
#13 0x7f6f6ca773d6 in gd_syncop_mgmt_stage_op
(peerinfo=peerinfo@entry=0x13e2770,
args=args@entry=0x17ebaa0,
my_uuid=my_uuid@entry=0x13deff8
"y\205mKLMF\340\241\366\340\037#\227$x/var/lib/glusterd",
recv_uuid=recv_uuid@entry=0x17eba90 "", op=op@entry=11,
dict_out=dict_out@entry=0x7f6f754045e4,
op_ctx=op_ctx@entry=0x7f6f754045e4)
at glusterd-syncop.c:749
#14 0x7f6f6ca77d4b in gd_stage_op_phase (peers=peers@entry=0x13defe0,
op=, op_ctx=op_ctx@entry=0x7f6f754045e4,
req_dict=0x7f6f754045e4,
op_errstr=op_errstr@entry=0x17ec2a8, npeers=npeers@entry=1) at
glusterd-syncop.c:1169
#15 0x7f6f6ca790b8 in gd_sync_task_begin
(op_ctx=op_ctx@entry=0x7f6f754045e4,
req=req@entry=0x13da238) at glusterd-syncop.c:1619
#16 0x7f6f6ca7928c in glusterd_op_begin_synctask
(req=req@entry=0x13da238,
   

Re: [Gluster-devel] Volume management proposal (4.0)

2014-12-03 Thread Jeff Darcy
> As I read this I assume this is to ease administration, and not to ease
> the code complexity mentioned above, right?
> 
> The code complexity needs to be eased, but I would assume that is a by
> product of this change.

Correct.  The goal is an easy-to-understand way for *users* to create
and administer volumes that address the complexity of multiple storage
types and workloads.  Cleaning up the volgen mess is just a (welcome)
side effect.

> > (B) Each volume has a graph representing steps 6a through 6c above (i.e.
> > up to DHT).  Only primary volumes have a (second) graph representing 6d
> > and 7 as well.
> 
> Do we intend to break this up into multiple secondary volumes, i.e an
> admin can create a pure replicate secondary volume(s) and then create a
> further secondary volume from these adding, say DHT?

Yes, absolutely.  Once this is implemented, I expect to see multi-level
hierarchies quite often.  The most common use case would probably be for
tiering plus some sort of segregation by user/workload.  For example:

   tenant -+- tier -+- DHT + AFR/NSR on SSDs
   ||
   |+- tier -+- DHT + AFR/NSR on disks
   | |
   | +- DHT + EC on disks
   |
   +- tier -+- DHT + AFR/NSR
|
+- DHT + EC

Here we'd have five secondary volumes using DHT plus something else.  A
user could set options on them, add bricks to them, rebalance them, and
so on.  The three "tier" volumes are also secondary, composed from the
first five.  They would almost have to set options separately on each
one to define different tiering policies.  Finally we have the "tenant"
volume, which segregates by user/workload and is  composed of the top
two tier volumes.  This is the only one that gets a full
performance-translator stack pushed on top, the only one that can be
explicitly started/stopped, and the only one that shows up in volume
status by default.
   
> I ask this for 2 reasons,
> If we bunch up everything till 6c, we may not reduce admin complexity
> when creating volumes that involve multiple tiers, so we should/could
> allow creating secondary volumes and then further secondary volumes.
> 
> If we do _not_ bunch up then we would have several secondary volumes,
> then the settings (as I think about it) for each secondary volume
> becomes a bit more non-intuitive. IOW, we are dealing with a chain of
> secondary volumes and each with its own name, and would initiate admin
> operations (like rebalance) on possibly each of these. Not sure if I am
> portraying the complexity that I see well here.

Yes, there is still some complexity.  For example, a "rebalance" on a
DHT volume really does rebalance.  A "rebalance" on a "tenant" volume is
more of a reassignment/migration.  Both are valuable.  A user might wish
to do them separately, so it's important that we expose both *somehow*.
Exposing the DHT subtree as a secondary volume seems like an intuitive
way to do that, but there are others.

> Maybe a brief example of how this works would help clarify some thoughts.

Besides the above, here's a SWAG of what the CLI commands might look
like:

# Create the three "base" secondary volumes for userA.
volume create userA-fast replica 2 host1:brick1 ...
volume create userA-medium replica 2 host2:brick2 ...
volume create userA-slow disperse 8 host3:brick3 ...

# Combine those into userA's full config.
volume create userA-lower tier userA-medium userA-slow
volume create userA tier userA-fast userA-lower

# Now create user B's setup.
volume create userB-fast replica 2 host4:brick4 ...
volume create userB-slow disperse 8 host5:brick5 ...
volume create userB tier userB-fast userB-slow

# Combine them all into one volume and start the whole thing.
volume create allusers tenant userA userB
volume start allusers

So much for creation.  What about administrative actions later?

# Add some space to user A's slow tier.
volume add-brick userA-slow host6:brick6
volume rebalance userA-slow

# Reallocate space between user A and user B.
volume set allusers quota-userA 40%
volume set allusers quota-userB 60%
volume rebalance allusers

Does that help?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Volume management proposal (4.0)

2014-12-03 Thread Shyam

On 12/02/2014 10:07 AM, Jeff Darcy wrote:

I've been thinking and experimenting around some of the things we need
in this area to support 4.0 features, especially data classification

http://www.gluster.org/community/documentation/index.php/Features/data-classification

Before I suggest anything, a little background on how brick and volume
management *currently* works.

(1) Users give us bricks, specified by host:path pairs.

(2) We assign each brick a unique ID, and create a "hidden" directory
structure in .glusterfs to support our needs.

(3) When bricks are combined into a volume, we create a bunch of
volfiles.

(4) There is one volfile per brick, consisting of a linear "stack" of
translators from storage/posix (which interacts with the local file
system) up to protocol/server (which listens for connections from
clients).

(5) When the volume is started, we start one glusterfsd process for each
brick volfile.

(6) There is also a more tree-like volfile for clients, constructed as
follows:

(6a) We start with a protocol/client translator for each brick.

(6b) We combine bricks into N-way sets using AFR, EC, etc.

(6c) We combine those sets using DHT.

(6d) We push a bunch of (mostly performance-related) translators on top.

(7) When a volume is mounted, we fetch the volume and instantiate all of
the translators described there, plus mount/fuse to handle the local
file system interface.  For GFAPI it's the same except for mount/fuse.

(8) There are also volfiles for NFS, self-heal daemons, quota daemons,
snapshots, etc.  I'm going to ignore those for now.

The code for all of this is in glusterd-volgen.c, but I don't recommend
looking at it for long because it's one of the ugliest hairballs I've
ever seen.  In fact, you'd be hard pressed to recognize the above
sequence of steps in that code.  Pieces that belong together are
splattered all over.  Pieces that should remain separate are mashed
together.  Pieces that should use common code use copied code instead.
As a prerequisite for adding new functionality, what's already there
needs to be heavily refactored so it makes some sense.

So . . . about that new functionality.  The core idea of data
classification is to apply step 6c repeatedly, with variants of DHT that
do tiering or various other kinds of intelligent placement instead of
the hash-based random placement we do now.  "NUFA" and "switch" are
already examples of this.  In fact, their needs drove some of the code
structure that makes data classification (DC) possible.

The trickiest question with DC has always been how the user specifies
these complex placement policies, which we then turn into volfiles.  In
the interests of maximizing compatibility with existing scripts and user
habits, what I propose is that we do this by allowing the user to
combine existing volumes into a new higher-level volume.  This is
similar to how the tiering prototype already works, except that
"combining" volumes is more general than "attaching" a cache volume in
that specific context.  There are also some other changes we should make
to do this right.


As I read this I assume this is to ease administration, and not to ease 
the code complexity mentioned above, right?


The code complexity needs to be eased, but I would assume that is a by 
product of this change.




(A) Each volume has an explicit flag indicating whether it is a
"primary" volume to be mounted etc. directly by users or a "secondary"
volume incorporated into another.

(B) Each volume has a graph representing steps 6a through 6c above (i.e.
up to DHT).  Only primary volumes have a (second) graph representing 6d
and 7 as well.


Do we intend to break this up into multiple secondary volumes, i.e an 
admin can create a pure replicate secondary volume(s) and then create a 
further secondary volume from these adding, say DHT?


I ask this for 2 reasons,
If we bunch up everything till 6c, we may not reduce admin complexity 
when creating volumes that involve multiple tiers, so we should/could 
allow creating secondary volumes and then further secondary volumes.


If we do _not_ bunch up then we would have several secondary volumes, 
then the settings (as I think about it) for each secondary volume 
becomes a bit more non-intuitive. IOW, we are dealing with a chain of 
secondary volumes and each with its own name, and would initiate admin 
operations (like rebalance) on possibly each of these. Not sure if I am 
portraying the complexity that I see well here.




(C) The graph/volfile for a primary volume might contain references to
secondary volumes.  These references are resolved at the same time that
6d and 7 are applied, yielding a complete graph without references.

(D) Secondary volumes may not be started and stopped by the user.
Instead, a secondary volume is automatically started or stopped along
with its primary.

(E) The user must specify an explicit option to see the status of
secondary volumes.  Without this option, secondary volumes are hidden
and

Re: [Gluster-devel] Spurious regression of tests/basic/mgmt_v3-locks.t

2014-12-03 Thread Justin Clift
On Tue, 02 Dec 2014 10:05:36 +0530
Atin Mukherjee  wrote:

> Its on my radar, I am in progress of analysing it. The last patch set
> was on the clean up part of the test cases, I felt the changes could
> have solved it, but I am afraid it didn't. I tried to reproduce it
> multiple times in my local set up but couldn't. Initial analysis finds
> out that one of glusterd node got disconnect when multiple volume set
> transactions were in progress, reason is still unknown.  I will keep
> you posted once I find any significant details.

The worrying thing (to me), is that this could be a bug that's happening
for people in real world usage, and not just something happening in our
regression testing.

Until we've figured it out / understand the root cause, we don't know.

:(

+ Justin

-- 
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Volume management proposal (4.0)

2014-12-03 Thread Jeff Darcy
> > (E) The user must specify an explicit option to see the status of
> > secondary volumes.  Without this option, secondary volumes are hidden
> > and status for their constituent bricks will be shown as though they
> > were (directly) part of the corresponding primary volume.
>
> IIUC, secondary volumes are internal representations and do not get
> exposed to the user, then why do we need to provide an explicit option
> for the status? Correct me if my understanding is wrong.

They are exposed to the user, but only in a limited way.  For example,
the user can still set options on a secondary volume, add or remove
bricks, initiate a rebalance, and so on.  What they can't do is start
or mount it separately from its primary.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] question on glustershd

2014-12-03 Thread Emmanuel Dreyfus
On Wed, Dec 03, 2014 at 05:31:10AM -0500, Krutika Dhananjay wrote:
> We (AFR team) had a discussion about this and came to the conclusion that the 
> code doing tryinodelk()s in metadata self-heal can be safely removed, for 
> which I will be sending a patch. 
> 
> With this patch (once it is out), could you run this test a couple of times 
> again and let us know if the inodelk collision logs are still appearing? 

Sure!

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] question on glustershd

2014-12-03 Thread Krutika Dhananjay
Emmanuel, 

We (AFR team) had a discussion about this and came to the conclusion that the 
code doing tryinodelk()s in metadata self-heal can be safely removed, for which 
I will be sending a patch. 

With this patch (once it is out), could you run this test a couple of times 
again and let us know if the inodelk collision logs are still appearing? 

-Krutika 

- Original Message -

> From: "Emmanuel Dreyfus" 
> To: "Krutika Dhananjay" 
> Cc: "Gluster Devel" 
> Sent: Wednesday, December 3, 2014 3:15:23 PM
> Subject: Re: [Gluster-devel] question on glustershd

> On Wed, Dec 03, 2014 at 04:02:13AM -0500, Krutika Dhananjay wrote:
> > What was the test that led to this?

> In tests/basic/afr/entry-self-heal.t the print_pending_heals test line 258
> spots the problem. It always report spb_heal spb_me_heal.

> --
> Emmanuel Dreyfus
> m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] question on glustershd

2014-12-03 Thread Emmanuel Dreyfus
On Wed, Dec 03, 2014 at 04:02:13AM -0500, Krutika Dhananjay wrote:
> What was the test that led to this? 

In tests/basic/afr/entry-self-heal.t the print_pending_heals test line 258
spots the problem. It always report spb_heal spb_me_heal. 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] question on glustershd

2014-12-03 Thread Krutika Dhananjay
What was the test that led to this? 
-Krutika 

- Original Message -

> From: "Emmanuel Dreyfus" 
> To: "Krutika Dhananjay" 
> Cc: "Emmanuel Dreyfus" , "Gluster Devel"
> 
> Sent: Wednesday, December 3, 2014 2:27:21 PM
> Subject: Re: [Gluster-devel] question on glustershd

> On F_WRLCKed, Dec 03, 2014 at 01:39:56AM -0500, Krutika Dhananjay wrote:
> > Come to think of it, it does not really matter whether the two bricks are
> > on the same node or not.
> > In either case, there may not be a lock contention between healers
> > associated with different bricks, irrespective of whether they are part of
> > the same SHD or SHDs on different nodes.

> The traces I have been collecting suggest the two healers locks the same
> inodes Here is what happens when gluster volume heal full is invoked:
> two inodes, each of them locking on each subvolume.

> [afr-self-heald.c:699:afr_shd_full_healer]
> 0-patchy-replicate-0: starting full sweep on subvol patchy-client-0
> [afr-self-heald.c:699:afr_shd_full_healer]
> 0-patchy-replicate-0: starting full sweep on subvol patchy-client-1
> [afr-self-heal-metadata.c:328:afr_selfheal_metadata]
> 0-XXXmanu: afr_selfheal_tryinodelk 3fb88af1-fe9b-421a-a197-3bf2fc88768b
> [client.c:1672:client_inodelk]
> 0-XXXmanu: INODELK patchy-replicate-0:self-heal patchy-client-0 F_WRLCK
> [client.c:1672:client_inodelk]
> 0-XXXmanu: INODELK patchy-replicate-0:self-heal patchy-client-1 F_WRLCK
> [afr-self-heal-metadata.c:328:afr_selfheal_metadata]
> 0-XXXmanu: afr_selfheal_tryinodelk 3fb88af1-fe9b-421a-a197-3bf2fc88768b
> [client.c:1672:client_inodelk]
> 0-XXXmanu: INODELK patchy-replicate-0:self-heal patchy-client-0 F_WRLCK
> [client.c:1672:client_inodelk]
> 0-XXXmanu: INODELK patchy-replicate-0:self-heal patchy-client-1 F_WRLCK
> --
> Emmanuel Dreyfus
> m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] question on glustershd

2014-12-03 Thread Emmanuel Dreyfus
On F_WRLCKed, Dec 03, 2014 at 01:39:56AM -0500, Krutika Dhananjay wrote:
> Come to think of it, it does not really matter whether the two bricks are on 
> the same node or not. 
> In either case, there may not be a lock contention between healers associated 
> with different bricks, irrespective of whether they are part of the same SHD 
> or SHDs on different nodes. 

The traces I have been collecting suggest the two healers locks the same
inodes  Here is what happens when gluster volume heal full is invoked:
two inodes, each of them locking on each subvolume.

[afr-self-heald.c:699:afr_shd_full_healer]
0-patchy-replicate-0: starting full sweep on subvol patchy-client-0
[afr-self-heald.c:699:afr_shd_full_healer]
0-patchy-replicate-0: starting full sweep on subvol patchy-client-1
[afr-self-heal-metadata.c:328:afr_selfheal_metadata] 
0-XXXmanu: afr_selfheal_tryinodelk 3fb88af1-fe9b-421a-a197-3bf2fc88768b
[client.c:1672:client_inodelk]
0-XXXmanu: INODELK patchy-replicate-0:self-heal patchy-client-0 F_WRLCK
[client.c:1672:client_inodelk]
0-XXXmanu: INODELK patchy-replicate-0:self-heal patchy-client-1 F_WRLCK
[afr-self-heal-metadata.c:328:afr_selfheal_metadata]
0-XXXmanu: afr_selfheal_tryinodelk 3fb88af1-fe9b-421a-a197-3bf2fc88768b
[client.c:1672:client_inodelk]
0-XXXmanu: INODELK patchy-replicate-0:self-heal patchy-client-0 F_WRLCK
[client.c:1672:client_inodelk]
0-XXXmanu: INODELK patchy-replicate-0:self-heal patchy-client-1 F_WRLCK
-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] New wiki page with applications that have support for Gluster

2014-12-03 Thread Niels de Vos
Hi all,

I've just started a page that contains some applications that have
support for Gluster. The page is far from complete, but this is a wiki
so I hope others are interested to extend the page too.

http://www.gluster.org/community/documentation/index.php/Native_Gluster_support_in_Applications

Please have a look at it, and add your favorite applications. I hope
that this page can evolve to a useful resource for existing and future
Gluster users.

Comments and ideas are very much welcome.

Thanks,
Niels


pgpRnMUFtIWP1.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel