Re: [Gluster-devel] regarding inode-unref on root inode

2014-06-25 Thread Raghavendra Bhat

On Tuesday 24 June 2014 08:17 PM, Pranith Kumar Karampuri wrote:

Does anyone know why inode_unref is no-op for root inode?

I see the following code in inode.c

 static inode_t *
 __inode_unref (inode_t *inode)
 {
 if (!inode)
 return NULL;

 if (__is_root_gfid(inode-gfid))
 return inode;
 ...
}


I think its done with the intention that, root inode should *never* ever 
get removed from the active inodes list. (not even accidentally). So 
unref on root-inode is a no-op. Dont know whether there are any other 
reasons.


Regards,
Raghavendra Bhat



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] glusterfs-3.5.1 released

2014-06-25 Thread Lalatendu Mohanty

On 06/24/2014 03:45 PM, Gluster Build System wrote:


SRC: http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz

This release is made off jenkins-release-73

-- Gluster Build System
___
Gluster-users mailing list
gluster-us...@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users
RPMs for el5-7 (RHEL, CentOS, etc.) are available at 
download.gluster.org [1].


[1] http://download.gluster.org/pub/gluster/glusterfs/LATEST/

Thanks,
Lala
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] regarding inode-unref on root inode

2014-06-25 Thread Pranith Kumar Karampuri


On 06/25/2014 11:52 AM, Raghavendra Bhat wrote:

On Tuesday 24 June 2014 08:17 PM, Pranith Kumar Karampuri wrote:

Does anyone know why inode_unref is no-op for root inode?

I see the following code in inode.c

 static inode_t *
 __inode_unref (inode_t *inode)
 {
 if (!inode)
 return NULL;

 if (__is_root_gfid(inode-gfid))
 return inode;
 ...
}


I think its done with the intention that, root inode should *never* 
ever get removed from the active inodes list. (not even accidentally). 
So unref on root-inode is a no-op. Dont know whether there are any 
other reasons.

Thanks, That helps.

Pranith.


Regards,
Raghavendra Bhat



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories

2014-06-25 Thread Anders Blomdell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2014-06-24 22:26, Shyamsundar Ranganathan wrote:
 - Original Message -
 From: Anders Blomdell anders.blomd...@control.lth.se
 To: Niels de Vos nde...@redhat.com
 Cc: Shyamsundar Ranganathan srang...@redhat.com, Gluster Devel 
 gluster-devel@gluster.org, Susant Palai
 spa...@redhat.com
 Sent: Tuesday, June 24, 2014 4:09:52 AM
 Subject: Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on 
 directories

 On 2014-06-23 12:03, Niels de Vos wrote:
 On Tue, Jun 17, 2014 at 11:49:26AM -0400, Shyamsundar Ranganathan wrote:
 You maybe looking at the problem being fixed here, [1].

 On a lookup attribute mismatch was not being healed across
 directories, and this patch attempts to address the same. Currently
 the version of the patch does not heal the S_ISUID and S_ISGID bits,
 which is work in progress (but easy enough to incorporate and test
 based on the patch at [1]).

 On a separate note, add-brick just adds a brick to the cluster, the
 lookup is where the heal (or creation of the directory across all sub
 volumes in DHT xlator) is being done.

 I assume that this is not a regression between 3.5.0 and 3.5.1? If that
 is the case, we can pull the fix in 3.5.2 because 3.5.1 really should
 not get delayed much longer.
 No, it does not work in 3.5.0 either :-(
 
 I ran these tests using your scripts and observed similar behavior and need 
 to dig into this a little further to understand how to make this work 
 reliably.
This might be a root cause, probably should be resolved first:

https://bugzilla.redhat.com/show_bug.cgi?id=1113050

 
 
 
 The proposed patch does not work as intended, with the following hieararchy
 
7550:   0 /mnt/gluster
   27770:1000 /mnt/gluster/test
   2755 1000:1000 /mnt/gluster/test/dir1
   2755 1000:1000 /mnt/gluster/test/dir1/dir2
 
 In the (approx 25%) of cases where my test-script does trigger a
 self heal on disk2, 10% ends up with (giving access error on client):
 
  00:   0 /data/disk2/gluster/test
755 1000:1000 /data/disk2/gluster/test/dir1
755 1000:1000 /data/disk2/gluster/test/dir1/dir2
 or
 
   27770:1000 /data/disk2/gluster/test
  00:   0 /data/disk2/gluster/test/dir1
755 1000:1000 /data/disk2/gluster/test/dir1/dir2
 
 or
 
   27770:1000 /data/disk2/gluster/test
   2755 1000:1000 /data/disk2/gluster/test/dir1
  00:   0 /data/disk2/gluster/test/dir1/dir2
 
 
 and 73% ends up with either partially healed directories
 (/data/disk2/gluster/test/dir1/dir2 or
  /data/disk2/gluster/test/dir1 missing) or the sgid bit
 [randomly] set on some of the directories.
 
 Since I don't even understand how to reliably trigger
 a self-heal of the directories, I'm currently clueless
 to the reason for this behaviour.
 
 Soo, I think that the comment from susant in
 http://review.gluster.org/#/c/6983/3/xlators/cluster/dht/src/dht-common.c:
 
   susant palaiJun 13 9:04 AM
 
I think we dont have to worry about that.
Rebalance does not interfere with directory SUID/GID/STICKY bits.
 
 unfortunately is wrong :-(, and I'm on too deep water to understand how to
 fix this at the moment.
 

 Currently in the test case rebalance is not run, so the above comment in 
 relation to rebalance is sort of different that what is observed. Just a note.
I stand corrected :-) 
So far only self-heal has interfered.
 
 
 
 N.B: with 00777 flags on the /mnt/gluster/test directory
 I have not been able to trigger any unreadable directories
 
 /Anders
 

 Thanks,
 Niels


 Shyam

 [1] http://review.gluster.org/#/c/6983/

 - Original Message -
 From: Anders Blomdell anders.blomd...@control.lth.se
 To: Gluster Devel gluster-devel@gluster.org
 Sent: Tuesday, June 17, 2014 10:53:52 AM
 Subject: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on
  directories

 With a glusterfs-3.5.1-0.3.beta2.fc20.x86_64 with a reverted
 3dc56cbd16b1074d7ca1a4fe4c5bf44400eb63ff (due to local lack of IPv4
 addresses), I get
 weird behavior if I:

 1. Create a directory with suid/sgid/sticky bit set (/mnt/gluster/test)
 2. Make a subdirectory of #1 (/mnt/gluster/test/dir1)
 3. Do an add-brick

 Before add-brick

755 /mnt/gluster
   7775 /mnt/gluster/test
   2755 /mnt/gluster/test/dir1

 After add-brick

755 /mnt/gluster
   1775 /mnt/gluster/test
755 /mnt/gluster/test/dir1

 On the server it looks like this:

   7775 /data/disk1/gluster/test
   2755 /data/disk1/gluster/test/dir1
   1775 /data/disk2/gluster/test
755 /data/disk2/gluster/test/dir1

 Filed as bug:

   https://bugzilla.redhat.com/show_bug.cgi?id=1110262

 If somebody can point me to where the logic of add-brick is placed, I can
 give
 it a shot (a find/grep on mkdir didn't immediately point me to the right
 place).



/Anders

- -- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. 

Re: [Gluster-devel] Data classification proposal

2014-06-25 Thread Krishnan Parthasarathi

- Original Message -
  For the short-term, wouldn't it be OK to disallow adding bricks that
  is not a multiple of group-size?
 
 In the *very* short term, yes.  However, I think that will quickly
 become an issue for users who try to deploy erasure coding because those
 group sizes will be quite large.  As soon as we implement tiering, our
 very next task - perhaps even before tiering gets into a release -
 should be to implement automatic brick splitting.  That will bring other
 benefits as well, such as variable replication levels to handle the
 sanlock case, or overlapping replica sets to spread a failed brick's
 load over more peers.
 

OK. Do you have some initial ideas on how we could 'split' bricks? I ask this
to see if I can work on splitting bricks while the data classification format is
being ironed out.

thanks,
Krish
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Data classification proposal

2014-06-25 Thread Xavier Hernandez
On Wednesday 25 June 2014 08:35:05 Jeff Darcy wrote:
  For the short-term, wouldn't it be OK to disallow adding bricks that
  is not a multiple of group-size?
 
 In the *very* short term, yes.  However, I think that will quickly
 become an issue for users who try to deploy erasure coding because those
 group sizes will be quite large.  As soon as we implement tiering, our
 very next task - perhaps even before tiering gets into a release -
 should be to implement automatic brick splitting.  That will bring other
 benefits as well, such as variable replication levels to handle the
 sanlock case, or overlapping replica sets to spread a failed brick's
 load over more peers.

If I understand correctly the proposed data-classification architecture, each 
server will have a number of bricks that will be dynamically modified as 
needed: as more data-classifying conditions are defined, a new layer of 
translators will be added (a new DHT or AFR, or something else) and some or 
all existing bricks will be split to accommodate the new and, maybe, 
overlapping condition.

How space will be allocated to each new sub-brick ? some sort of thin-
provisioning or will it be distributed evenly on each split ?

If using thin-provisioning, it will be hard to determine real available space. 
If using a fixed amount, we can get to scenarios where a file cannot be 
written even if there seems to be enough free space. This can already happen 
today if using very big files on almost full bricks. I think brick splitting 
can accentuate this.

Also, the addition of multiple layered DHT translators, as it's implemented 
today, could add a lot more of latency, specially on directory listings.

Another problem I see is that splitting bricks will require a rebalance, which 
is a costly operation. It doesn't seem right to require a so expensive 
operation every time you add a new condition on an already created volume.

Maybe I've missed something important ?

Thanks,

Xavi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Data classification proposal

2014-06-25 Thread Jeff Darcy
 If I understand correctly the proposed data-classification
 architecture, each server will have a number of bricks that will be
 dynamically modified as needed: as more data-classifying conditions
 are defined, a new layer of translators will be added (a new DHT or
 AFR, or something else) and some or all existing bricks will be split
 to accommodate the new and, maybe, overlapping condition.

Correct.

 How space will be allocated to each new sub-brick ? some sort of thin-
 provisioning or will it be distributed evenly on each split ?

That's left to the user.  The latest proposal, based on discussion of
the first, is here:

https://docs.google.com/presentation/d/1e8tuh9DKNi9eCMrdt5vetppn1D3BiJSmfR7lDW2wRvA/edit?usp=sharing

That has an example of assigning percentages to the sub-bricks created
by a rule (i.e. a subvolume in a potentially multi-tiered
configuration).  Other possibilities include relative weights used to
determine percentages, or total thin provisioning where sub-bricks
compete freely for available space.  It's certainly a fruitful area for
discussion.

 If using thin-provisioning, it will be hard to determine real
 available space.  If using a fixed amount, we can get to scenarios
 where a file cannot be written even if there seems to be enough free
 space. This can already happen today if using very big files on almost
 full bricks. I think brick splitting can accentuate this.

Is this really common outside of test environments, given the sizes of
modern disks and files?  Even in cases where it might happen, doesn't
striping address it?

We have a whole bunch of problems in this area.  If multiple bricks are
on the same local file system, their capacity will be double-counted.
If a second local file system is mounted over part of a brick, the
additional space won't be counted at all.  We do need a general solution
to this, but I don't think that solution needs to be part of data
classification unless there's a specific real-world scenario that DC
makes worse.

 Also, the addition of multiple layered DHT translators, as it's
 implemented today, could add a lot more of latency, specially on
 directory listings.

With http://review.gluster.org/#/c/7702/ this should be less of a
problem.  Also, lookups across multiple tiers are likely to be rare in
most use cases.  For example, for the name-based filtering (sanlock)
case, a given file should only *ever* be in one tier so only that tier
would need to be searched.  For the activity-based tiering case, the
vast majority of lookups will be for hot files which are (not
accidentally) in the first tier.  The only real problem is with *failed*
lookups, e.g. during create.  We can address that by adding stubs
(similar to linkfiles) in the upper tier, but I'd still want to wait
until it's proven necessary.  What I would truly resist is any solution
that involves building tier awareness directly into (one instance of)
DHT.  Besides requiring a much larger development effort in the present,
it would throw away the benefit of modularity and hamper other efforts
in the future.  We need tiering and brick splitting *now*, especially as
a complement to erasure coding which many won't be able to use
otherwise.  As far as I can tell, stacking translators is the fastest
way to get there.

 Another problem I see is that splitting bricks will require a
 rebalance, which is a costly operation. It doesn't seem right to
 require a so expensive operation every time you add a new condition on
 an already created volume.

Yes, rebalancing is expensive, but that's no different for split bricks
than whole ones.  Any time you change the definition of what should go
where, you'll have to move some data into compliance and that's
expensive.  However, such operations are likely to be very rare.  It's
highly likely that most uses of this feature will consist of a simple
two-tier setup defined when the volume is created and never changed
thereafter, so the only rebalancing would be within a tier - i.e. the
exact same thing we do today in homogeneous volumes (maybe even slightly
better).  The only use case I can think of that would involve *frequent*
tier-config changes is multi-tenancy, but adding a new tenant should
only affect new data and not require migration of old data.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Weekly GlusterFS Community meeting minutes

2014-06-25 Thread Justin Clift
Thanks to everyone who attended and participated in our
Weekly Community Meeting today. :)

The points which stand out as most important/interesting are:

  * GlusterFS 3.4.5 beta1 tarball and rpms to be created
soon (next few days likely)

This is pretty much GlusterFS 3.4.4-2 with the
memory leak fix backported by Martin Svec

  * GlusterFS 3.6 feature freeze date is not far away
now - 5th July

  * Automatic NetBSD build testing is almost working

  * Automatic FreeBSD build testing will be set up in
next few weeks

  * James Shubin has been working on btrfs pieces for
puppet-gluster.  He'd really like people to test
it out:

  https://github.com/purpleidea/puppet-gluster/tree/feat/btrfs

:)

Meeting minutes:

  
http://meetbot.fedoraproject.org/gluster-meeting/2014-06-25/gluster-meeting.2014-06-25-15.10.html

Full meeting logs:

  
http://meetbot.fedoraproject.org/gluster-meeting/2014-06-25/gluster-meeting.2014-06-25-15.10.log.html

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Reviewing patches early

2014-06-25 Thread Jeff Darcy
Justin asked me, as the group's official Grumpy Old Man, to send a note
reminding people about the importance of reviewing patches early.  Here
it is.  As I see it, we've historically had two problems with reviews.

(1) Patches that don't get reviewed at all.

(2) Patches that have to be re-worked continually due to late reviews.

We've made a lot of progress on (1), especially with the addition of
more maintainers, so this is about (2).  As a patch gets older, it
becomes increasingly likely that it will be rebased and regression tests
will have to be re-run because of merge conflicts.  This isn't a problem
for features to which Red Hat has graciously assigned more than one
developer, as they review each others' work and the patch gets merged
quickly (sometimes before other interested parties have even had a
chance to see it in the queue but that's a different problem).  However,
it creates a problem for *every other patch*, which might now have to
rebased etc. - even those that are older and more important to users and
up against tighter deadlines.  This priority inversion can often be
avoided if people who intend to review a patch would do so sooner, so
that all of the review re-work can be done before new merge conflicts
are created.  Given the differences in time zones throughout our group,
each round of such unnecessary work can cost an entire day, leading to
even more potential for further merge conflicts.  It's a vicious cycle
that we need to break.  Please, get all of those complaints about tabs
and spaces and variable names in *early*, and help us keep the
improvements flowing to our users.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-release-3.4 released

2014-06-25 Thread Gluster Build System


SRC: 
http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-release-3.4.tar.gz

This release is made off jenkins-release-78

-- Gluster Build System
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories

2014-06-25 Thread Shyamsundar Ranganathan
Hi Anders,

There are multiple problems that I see in the test provided, here is answering 
one of them and the reason why this occurs. It does get into the code and 
functions a bit, but bottom line is that on a code path the setattr that DHT 
does, misses setting the SGID bit causing the problem observed.

- When directory is healed on a newly added brick it loses the SGID mode bit

This is happening due to 2 reasons, 

mkdir does not honor the SGID mode bit [1]. So when initially creating the 
directory when there is a single brick, an strace of the mkdir command shows an 
fchmod which actually changes the mode of the file to add the SGID bit to it.

In DHT we get into dht_lookup_dir_cbk, as a part of the lookup when creating 
the new directory .../dir2, as the graph has changed due to a brick addition 
(otherwise we would have gone into revalidate path where the previous fix was 
made). Here we call the function, dht_selfheal_directory which would create the 
missing directories, with the expected attributes.

DHT winds a call to mkdir as a part of the dht_selfheal_directory (in 
dht_selfheal_dir_mkdir where it winds a call to mkdir for all subvolumes that 
have the directory missing) with the right mode bits (in this case with the 
SGID bit). As the POSIX layer on the brick calls mkdir, the SGID bit is not set 
for the newly created directory due to [1].

Further to calling mkdir DHT now winds an setattr to set the mode bits 
straight, but ends up using the mode bits that are returned in the iatt (stat) 
information by the just concluded mkdir wind, which has the SGID bit missing, 
as mkdir returns the stat information from posix_mkdir, by doing a stat post 
mkdir. Hence we never end up setting the SGID bit in the setattr part of DHT.

Rectification of the problem would be in (need to close out some more analysis) 
dht_selfheal_dir_mkdir_cbk, where we need to pass to the subsequent 
dht_selfheal_dir_setattr the right mode bits to set on the directories.

I will provide a patch for the above issue, post testing out the same with the 
provided script, possibly tomorrow. This would make the directory equal on all 
the bricks, and further discrepancies from the mount point or on the backed 
should not be seen.

One of the other problems seems to stem from which stat information we pick in 
DHT to return for the mount, the above fix would take care of that issue as 
well, but still something that needs some understanding and possible correction.

[1] see NOTES in, man 2 mkdir

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Bug#751888: glusterfs-server: creating symlinks generates errors

2014-06-25 Thread Justin Clift
On 20/06/2014, at 2:32 PM, Matteo Checcucci wrote:
 On 06/20/2014 03:05 PM, Ravishankar N wrote:
 Yes, just sent a patch for review on  master
 :http://review.gluster.org/#/c/8135/
 Once it gets accepted, will back-port it to the 3.5 branch
 I am looking forward to seeing it back-ported and integrated in the debian 
 package.

Btw, the backport of this was merged into the release-3.5 branch
yesterday.  It'll be in GlusterFS 3.5.2.

Hope that helps. :)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Ignore the rackspace-regression-2GB-triggered queue in Jenkins for now

2014-06-25 Thread Justin Clift
On 25/06/2014, at 11:19 PM, Justin Clift wrote:
 There's a new rackspace-regression-2GB-triggered in Jenkins on 
 build.gluster.org.
 
 Please ignore it for now.  I'm just experimenting with having Gerrit
 automatically trigger regression tests.


This seems to be working ok, so I've enabled it.

Failure/success now IS going into Gerrit.  If this turns out to
work ok, we won't need to manually trigger new regression tests. :)

If any weirdness seems to happen, feel free to disable this and
manually trigger regression tests like normal. :)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Reviewing patches early

2014-06-25 Thread Justin Clift
On 26/06/2014, at 1:40 AM, Pranith Kumar Karampuri wrote:
snip
 While I agree with everything you said. Complaining about tabs/spaces should 
 be done by a script. Something like http://review.gluster.com/#/c/5404

+1

And we can use a git trigger to reject future patches that have tabs in
them.

For bonus points, we should put info on the wiki on how to configure our
editors to do spaces properly.  eg .vimrc settings, and that kind of thing

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Reviewing patches early

2014-06-25 Thread Justin Clift
On 26/06/2014, at 2:12 AM, Pranith Kumar Karampuri wrote:
 On 06/26/2014 06:19 AM, Justin Clift wrote:
 On 26/06/2014, at 1:40 AM, Pranith Kumar Karampuri wrote:
 snip
 While I agree with everything you said. Complaining about tabs/spaces 
 should be done by a script. Something like 
 http://review.gluster.com/#/c/5404
 +1
 
 And we can use a git trigger to reject future patches that have tabs in
 them.
 We can probably do it at the time of './rfc.sh'. It probably is easier as 
 well? Have the script in the repo. Run it against the patches that are to be 
 submitted.


Whatever works. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel