Re: [Gluster-devel] Troubleshooting and Diagnostic tools for Gluster

2016-01-27 Thread Raghavendra Bhat
I have a script written to analyze the log message of gluster process.

It actually scans the log file and identifies the log messages with ERROR
and WARNING levels.
It lists the functions (with either ERROR or WARNING logs) and their
percentage of occcurance.

It also lists the MSGIDs for ERROR and WARNING logs and their percentage of
occurance.

A sample o/p the script:

[root@hal9000 ~]# ./log_analyzer.sh /var/log/glusterfs/mnt-glusterfs.log
Number Percentage Function
7 0.49 __socket_rwv
4 0.28 mgmt_getspec_cbk
4 0.28 gf_timer_call_after
3 0.21 rpc_clnt_reconfig
2 0.14 fuse_thread_proc
2 0.14 fini
2 0.14 cleanup_and_exit
1 0.07 _ios_dump_thread
1 0.07 fuse_init
1 0.07 fuse_graph_setup

= Error Functions 

7 0.49 __socket_rwv
2 0.14 cleanup_and_exit

Number Percentage MSGID
958 67.99 109066
424 30.09 109036
3 0.21 114057
3 0.21 114047
3 0.21 114046
3 0.21 114035
3 0.21 114020
3 0.21 114018
3 0.21 108031
2 0.14 101190
1 0.07 7962
1 0.07 108006
1 0.07 108005
1 0.07 108001
1 0.07 100030

= Error MSGIDs 

1 0.07 108006
1 0.07 108001

It can be found here.

https://github.com/raghavendrabhat/threaded-io/blob/master/log_analyzer.sh.

Do you think it can be added to the repo?

Regards,
Raghavendra

On Wed, Jan 27, 2016 at 3:44 AM, Aravinda  wrote:

> Hi,
>
> I am happy to share the `glustertool` project, which is a
> infrastructure for adding more tools for Gluster.
>
> https://github.com/aravindavk/glustertool
>
> Following tools available with the initial release.(`glustertool
>  [ARGS..]`)
>
> 1. gfid - To get GFID of given path(Mount or Backend)
> 2. changelogparser - To parse the Gluster Changelog
> 3. xtime - To get Xtime from brick backend
> 4. stime - To get Stime from brick backend
> 5. volmark - To get Volmark details from Gluster mount
>
> rpm/deb packages are not yet available, install this using `sudo
> python setup.py install`
>
> Once installed, run `glustertool list` to see list of tools available.
> `glustertool doc TOOLNAME` shows documentation about the tool and
> `glustertool  --help` shows the usage of the tool.
>
> More tools can be added to this collection easily using `newtool`
> utility available in this repo.
>
> # ./newtool 
>
> Read more about adding tools here
> https://github.com/aravindavk/glustertool/blob/master/CONTRIBUTING.md
>
> You can create an issue in github requesting more tools for Gluster
> https://github.com/aravindavk/glustertool/issues
>
> Comments & Suggestions Welcome
>
> regards
> Aravinda
>
> On 10/23/2015 11:42 PM, Vijay Bellur wrote:
>
>> On Friday 23 October 2015 04:16 PM, Aravinda wrote:
>>
>>> Hi Gluster developers,
>>>
>>> In this mail I am proposing troubleshooting documentation and
>>> Gluster Tools infrastructure.
>>>
>>> Tool to search in documentation
>>> ===
>>> We recently added message Ids to each error messages in Gluster. Some
>>> of the error messages are self explanatory. But some error messages
>>> requires manual intervention to fix the issue. How about identifying
>>> the error messages which requires more explanation and creating
>>> documentation for the same. Even though the information about some
>>> errors available in documentation, it is very difficult to search and
>>> relate to the error message. It will be very useful if we create a
>>> tool which looks for documentation and tells us exactly what to do.
>>>
>>> For example,(Illustrativepurpose only)
>>> glusterdoc --explain GEOREP0003
>>>
>>>  SSH configuration issue. This error is seen when Pem keys from all
>>>  master nodes are not distributed properly to Slave
>>>  nodes. Use Geo-replication create command with force option to
>>>  redistribute the keys. If issue stillpersists, look for any errors
>>>  while running hook scripts inGlusterd log file.
>>>
>>>
>>> Note: Inspired from rustc --explain command
>>> https://twitter.com/jaredforsyth/status/626960244707606528
>>>
>>> If we don't know the message id, we can still search from the
>>> available documentation like,
>>>
>>>  glusterdoc --search 
>>>
>>> These commands can be programmatically consumed, for example
>>> `--json` will return the output in JSON format. This enables UI
>>> developers to automatically show help messages when they display
>>> errors.
>>>
>>> Gluster Tools infrastructure
>>> 
>>> Are our Gluster log files sufficient for root causing the issues? Is
>>> that error caused due to miss configuration? Geo-replication status is
>>> showing faulty. Where to find the reason for Faulty?
>>>
>>> Sac(surs AT redhat.com) mentioned that heis working on gdeploy and many
>>> developers
>>> are using their owntools. How about providing common infrastructure(say
>>> gtool/glustertool) to host all these tools.
>>>
>>>
>> Would this be a repository with individual tools being git submodules or
>> something similar? Is there also a plan to bundle the set of tools into a
>> binary package?
>>
>> Looks like a g

Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-27 Thread Raghavendra Bhat
There is already a patch submitted for moving TBF part to libglusterfs. It
is under review.
http://review.gluster.org/#/c/12413/


Regards,
Raghavendra

On Mon, Jan 25, 2016 at 2:26 AM, Venky Shankar  wrote:

> On Mon, Jan 25, 2016 at 11:06:26AM +0530, Ravishankar N wrote:
> > Hi,
> >
> > We are planning to introduce a throttling xlator on the server (brick)
> > process to regulate FOPS. The main motivation is to solve complaints
> about
> > AFR selfheal taking too much of CPU resources. (due to too many fops for
> > entry
> > self-heal, rchecksums for data self-heal etc.)
> >
> > The throttling is achieved using the Token Bucket Filter algorithm (TBF).
> > TBF
> > is already used by bitrot's bitd signer (which is a client process) in
> > gluster to regulate the CPU intensive check-sum calculation. By putting
> the
> > logic on the brick side, multiple clients- selfheal, bitrot, rebalance or
> > even the mounts themselves can avail the benefits of throttling.
>
>   [Providing current TBF implementation link for completeness]
>
>
> https://github.com/gluster/glusterfs/blob/master/xlators/features/bit-rot/src/bitd/bit-rot-tbf.c
>
> Also, it would be beneficial to have the core TBF implementation as part of
> libglusterfs so as to be consumable by the server side xlator component to
> throttle dispatched FOPs and for daemons to throttle anything that's
> outside
> "brick" boundary (such as cpu, etc..).
>
> >
> > The TBF algorithm in a nutshell is as follows: There is a bucket which is
> > filled
> > at a steady (configurable) rate with tokens. Each FOP will need a fixed
> > amount
> > of tokens to be processed. If the bucket has that many tokens, the FOP is
> > allowed and that many tokens are removed from the bucket. If not, the
> FOP is
> > queued until the bucket is filled.
> >
> > The xlator will need to reside above io-threads and can have different
> > buckets,
> > one per client. There has to be a communication mechanism between the
> client
> > and
> > the brick (IPC?) to tell what FOPS need to be regulated from it, and the
> no.
> > of
> > tokens needed etc. These need to be re configurable via appropriate
> > mechanisms.
> > Each bucket will have a token filler thread which will fill the tokens in
> > it.
> > The main thread will enqueue heals in a list in the bucket if there
> aren't
> > enough tokens. Once the token filler detects some FOPS can be serviced,
> it
> > will
> > send a cond-broadcast to a dequeue thread which will process (stack wind)
> > all
> > the FOPS that have the required no. of tokens from all buckets.
> >
> > This is just a high level abstraction: requesting feedback on any aspect
> of
> > this feature. what kind of mechanism is best between the client/bricks
> for
> > tuning various parameters? What other requirements do you foresee?
> >
> > Thanks,
> > Ravi
>
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] distributed files/directories and [cm]time updates

2016-01-26 Thread Raghavendra Bhat
Hi Xavier,

There is a patch sent for review which implements the metadata cache in the
posix layer.  What the changes do is this:

Whenever there is a fresh lookup on a object (file/directory/symlink),
posix xlator saves the stat attributes of that object in its cache.
As of now, whenever there is a fop on a object, posix tries to build HANDLE
of the object by looking into gfid based backend (i.e. .glusterfs
directory) and doing stat to check if the gfid exists. The patch makes
chages to posix to check into its own cache first and return if it can find
the attributes. If not, then look into actual gfid backend.

But as of now, there is no cache invalidation. Whenever there is a
setattr() fop to change the attributes of a object, the new stat info is
saved in the cache once the fop is successful on disk.

The patch can be found here. (http://review.gluster.org/#/c/12157/).

Regards,
Raghavendra

On Tue, Jan 26, 2016 at 2:51 AM, Xavier Hernandez 
wrote:

> Hi Pranith,
>
> On 26/01/16 03:47, Pranith Kumar Karampuri wrote:
>
>> hi,
>>Traditionally gluster has been using ctime/mtime of the
>> files/dirs on the bricks as stat output. Problem we are seeing with this
>> approach is that, software which depends on it gets confused when there
>> are differences in these times. Tar especially gives "file changed as we
>> read it" whenever it detects ctime differences when stat is served from
>> different bricks. The way we have been trying to solve it is to serve
>> the stat structures from same brick in afr, max-time in dht. But it
>> doesn't avoid the problem completely. Because there is no way to change
>> ctime at the moment(lutimes() only allows mtime, atime), there is little
>> we can do to make sure ctimes match after self-heals/xattr
>> updates/rebalance. I am wondering if anyone of you solved these problems
>> before, if yes how did you go about doing it? It seems like applications
>> which depend on this for backups get confused the same way. The only way
>> out I see it is to bring ctime to an xattr, but that will need more iops
>> and gluster has to keep updating it on quite a few fops.
>>
>
> I did think about this when I was writing ec at the beginning. The idea
> was that the point in time at which each fop is executed were controlled by
> the client by adding an special xattr to each regular fop. Of course this
> would require support inside the storage/posix xlator. At that time, adding
> the needed support to other xlators seemed too complex for me, so I decided
> to do something similar to afr.
>
> Anyway, the idea was like this: for example, when a write fop needs to be
> sent, dht/afr/ec sets the current time in a special xattr, for example
> 'glusterfs.time'. It can be done in a way that if the time is already set
> by a higher xlator, it's not modified. This way DHT could set the time in
> fops involving multiple afr subvolumes. For other fops, would be afr who
> sets the time. It could also be set directly by the top most xlator (fuse),
> but that time could be incorrect because lower xlators could delay the fop
> execution and reorder it. This would need more thinking.
>
> That xattr will be received by storage/posix. This xlator will determine
> what times need to be modified and will change them. In the case of a
> write, it can decide to modify mtime and, maybe, atime. For a mkdir or
> create, it will set the times of the new file/directory and also the mtime
> of the parent directory. It depends on the specific fop being processed.
>
> mtime, atime and ctime (or even others) could be saved in a special posix
> xattr instead of relying on the file system attributes that cannot be
> modified (at least for ctime).
>
> This solution doesn't require extra fops, So it seems quite clean to me.
> The additional I/O needed in posix could be minimized by implementing a
> metadata cache in storage/posix that would read all metadata on lookup and
> update it on disk only at regular intervals and/or on invalidation. All
> fops would read/write into the cache. This would even reduce the number of
> I/O we are currently doing for each fop.
>
> Xavi
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] glusterfs-3.6.7 released

2015-12-02 Thread Raghavendra Bhat
Hi,

glusterfs-3.6.7 has been released and the packages for RHEL/Fedora/Centos
can be found here.http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/

Requesting people running 3.6.x to please try it out and let us know if
there are any issues.

This release supposedly fixes the bugs listed below since 3.6.6 was made
available. Thanks to all who submitted patches, reviewed the changes.

1283690 - core dump in protocol/client:client_submit_request
1283144 - glusterfs does not register with rpcbind on restart
1277823 - [upgrade] After upgrade from 3.5 to 3.6, probing a new 3.6
node is moving the peer to rejected state
1277822 - glusterd: probing a new node(>=3.6) from 3.5 cluster is
moving the peer to rejected state

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] netbsd failures in 3.6 release

2015-11-04 Thread Raghavendra Bhat
Hi,

We have been observing netbsd failures in 3.6 branch since few months and I
have been merging the patches by ignoring netbsd failures. Last few 3.6
releases were made without considering 3.6 failures. IIRC there was a
discussion about it back then when netbsd tests started failing and it was
discussed that we shall ignore 3.6 netbsd errors. I am not sure if it was
discussed over IRC or as part of some patch (over gerrit).

Emmanuel? Do you recollect any discussions about it?

But I think it would be better to discuss about it here and see what can be
done. Please provide feedback.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] REMINDER: Weekly gluster community meeting to start in 30 minutes

2015-10-14 Thread Raghavendra Bhat
Hi All,

In 30 minutes from now we will have the regular weekly Gluster
Community meeting.

Meeting details:
- location: #gluster-meeting on Freenode IRC
- date: every Wednesday
- time: 12:00 UTC, 14:00 CEST, 17:30 IST
(in your terminal, run: date -d "12:00 UTC")
- agenda: https://public.pad.fsfe.org/p/gluster-community-meetings

Currently the following items are listed:
* Roll Call
* Status of last week's action items
* Gluster 3.7
* Gluster 3.8
* Gluster 3.6
* Gluster 3.5
* Gluster 4.0
* Open Floor
- bring your own topic!

The last topic has space for additions. If you have a suitable topic to
discuss, please add it to the agenda.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] REMINDER: Weekly gluster community meeting to start in 30 minutes

2015-09-30 Thread Raghavendra Bhat
Hi All,

In 30 minutes from now we will have the regular weekly Gluster
Community meeting.

Meeting details:
- location: #gluster-meeting on Freenode IRC
- date: every Wednesday
- time: 12:00 UTC, 14:00 CEST, 17:30 IST
(in your terminal, run: date -d "12:00 UTC")
- agenda: https://public.pad.fsfe.org/p/gluster-community-meetings

Currently the following items are listed:
* Roll Call
* Status of last week's action items
* Gluster 3.7
* Gluster 3.8
* Gluster 3.6
* Gluster 3.5
* Gluster 4.0
* Open Floor
- bring your own topic!

The last topic has space for additions. If you have a suitable topic to
discuss, please add it to the agenda.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-09-28 Thread Raghavendra Bhat
Hi Oleksandr,

You are right. The description should have said it as the limit on the
number of inodes in the lru list of the inode cache. I have sent a patch
for that.
http://review.gluster.org/#/c/12242/

Regards,
Raghavendra Bhat


On Thu, Sep 24, 2015 at 1:44 PM, Oleksandr Natalenko <
oleksa...@natalenko.name> wrote:

> I've checked statedump of volume in question and haven't found lots of
> iobuf as mentioned in that bugreport.
>
> However, I've noticed that there are lots of LRU records like this:
>
> ===
> [conn.1.bound_xl./bricks/r6sdLV07_vd0_mail/mail.lru.1]
> gfid=c4b29310-a19d-451b-8dd1-b3ac2d86b595
> nlookup=1
> fd-count=0
> ref=0
> ia_type=1
> ===
>
> In fact, there are 16383 of them. I've checked "gluster volume set help"
> in order to find something LRU-related and have found this:
>
> ===
> Option: network.inode-lru-limit
> Default Value: 16384
> Description: Specifies the maximum megabytes of memory to be used in the
> inode cache.
> ===
>
> Is there error in description stating "maximum megabytes of memory"?
> Shouldn't it mean "maximum amount of LRU records"? If no, is that true,
> that inode cache could grow up to 16 GiB for client, and one must lower
> network.inode-lru-limit value?
>
> Another thought: we've enabled write-behind, and the default
> write-behind-window-size value is 1 MiB. So, one may conclude that with
> lots of small files written, write-behind buffer could grow up to
> inode-lru-limit×write-behind-window-size=16 GiB? Who could explain that to
> me?
>
> 24.09.2015 10:42, Gabi C write:
>
>> oh, my bad...
>> coulb be this one?
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1126831 [2]
>> Anyway, on ovirt+gluster w I experienced similar behavior...
>>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] glusterfs 3.6.6 released

2015-09-24 Thread Raghavendra Bhat
Hi,

glusterfs-3.6.6 has been released and the packages for RHEL/Fedora/Centos
can be found here.
http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/

Requesting people running 3.6.x to please try it out and let us know if
there are any issues.

This release supposedly fixes the bugs listed below since 3.6.5 was made
available. Thanks to all who submitted patches, reviewed the changes.

1259578 - [3.6.x] quota usage gets miscalculated when loc->gfid is NULL
1247972 - quota/marker: lk_owner is null while acquiring inodelk in rename
operation
1252072 - POSIX ACLs as used by a FUSE mount can not use more than 32 groups
1256245 - AFR: gluster v restart force or brick process restart doesn't
heal the files
1258069 - gNFSd: NFS mount fails with "Remote I/O error"
1173437 - [RFE] changes needed in snapshot info command's xml output.

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] [posix-compliance] unlink and access to file through open fd

2015-09-04 Thread Raghavendra Bhat

On 09/04/2015 12:43 PM, Raghavendra Gowdappa wrote:

All,

Posix allows access to file through open fds even if name associated with file 
is deleted. While this works for glusterfs for most of the cases, there are 
some corner cases where we fail.

1. Reboot of brick:
===

With the reboot of brick, fd is lost. unlink would've deleted both gfid and 
path links to file and we would loose the file. As a solution, perhaps we 
should create an hardlink to the file (say in .glusterfs) which gets deleted 
only when last fd is closed?

2. Graph switch:
=

The issue is captured in bz 1259995 [1]. Pasting the content from bz verbatim:
Consider following sequence of operations:
1. fd = open ("/mnt/glusterfs/file");
2. unlink ("/mnt/glusterfs/file");
3. Do a graph-switch, lets say by adding a new brick to volume.
4. migration of fd to new graph fails. This is because as part of migration we 
do a lookup and open. But, lookup fails as file is already deleted and hence 
migration fails and fd is marked bad.

In fact this test case is already present in our regression tests, though the 
test checks whether the fd is just marked as bad. But the expectation of filing 
this bug is that migration should succeed. This is possible since there is an 
fd opened on brick through old-graph and hence can be duped using dup syscall.

Of course the solution outlined here doesn't cover the case where file is not 
present on brick at all. For eg., a new brick was added to replica set and that 
new brick doesn't contain the file. Now, since the file is deleted, how do 
replica heals that file to another brick etc.

But atleast this can be solved for those cases where file was present on a 
brick and fd was already opened.


Du,

For this 2nd example (where the file is opened, unlinked and a graph 
swatch happens), there was a patch submitted long back.


http://review.gluster.org/#/c/5428/

Regards,
Raghavendra Bhat


3. Open-behind and unlink from a different client:
==

While open-behind handles unlink from the same client (through which open was 
performed), if unlink and open are done from two different clients, file is 
lost. I cannot think of any good solution for this.

I wanted to know whether these problems are real enough to channel our efforts 
to fix these issues. Comments are welcome in terms of solutions or other 
possible scenarios which can lead to this issue.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1259995

regards,
Raghavendra.
___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.6.5 released

2015-08-26 Thread Raghavendra Bhat


Hi,

glusterfs-3.6.5 has been released and the packages for RHEL/Fedora/Centos can 
be found here.
http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/

The Ubuntu packages can be found here:
https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.6.

Requesting people running 3.6.x to please try it out and let us know if there 
are any issues.

This release supposedly fixes the bugs listed below since 3.6.4 was made 
available. Thanks to all who submitted patches, reviewed the changes.

1247959 - Statfs is hung because of frame loss in quota
1247970 - huge mem leak in posix xattrop
1234096 - rmtab file is a bottleneck when lot of clients are accessing a volume 
through NFS
1254421 - glusterd fails to get the inode size for a brick
1247964 - Disperse volume: Huge memory leak of glusterfsd process
1218732 - gluster snapshot status --xml gives back unexpected non xml output
1250836 - [upgrade] After upgrade from 3.5 to 3.6 onwards version, bumping up 
op-version failed
1244117 - unix domain sockets on Gluster/NFS are created as fifo/pipe
1243700 - GlusterD crashes when management encryption is enabled
1235601 - tar on a glusterfs mount displays "file changed as we read it" even 
though the file was not changed

Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [release-3.6] compile error: 'GF_REPLACE_OP_START' undeclared

2015-08-18 Thread Raghavendra Bhat

On 08/18/2015 12:39 PM, Avra Sengupta wrote:

+ Adding Raghavendra Bhat.

When is the next GA planned on this branch? And can we take patches in 
this branch while this is being investigated.


Regards,
Avra



I am planning to make the release by the end of this week. I can accept 
the patches if it is fixing some critical bug. But it would be better if 
the issue being investigated is fixed.


Regards,
Raghavendra Bhat


On 08/18/2015 12:07 PM, Avra Sengupta wrote:
Still hitting this on freebsd and netbsd smoke runs on release 3.6 
branch. Are we merging patches on release 3.6 branch for now even 
with these failures. I have two such patches that need to be merged.


Regards,
Avra

On 07/06/2015 02:32 PM, Niels de Vos wrote:

On Mon, Jul 06, 2015 at 02:19:07PM +0530, Raghavendra Bhat wrote:

On 07/06/2015 01:39 PM, Niels de Vos wrote:

On Mon, Jul 06, 2015 at 12:09:28PM +0530, Raghavendra Bhat wrote:

On 07/06/2015 09:52 AM, Kaushal M wrote:

I checked on NetBSD-7.0_BETA and FreeBSD-10.1. I couldn't reproduce
this. I'll try on NetBSD-6 next.

~kaushal
I think it has to be included before 3.6.4 is made G.A. I can 
wait till the
fix for this issue is merged before making 3.6.4. Does it sound 
ok? Or

should I go ahead with 3.6.4 and make a quick 3.6.5 with this fix?

I only care about getting http://review.gluster.org/11335 merged :-)

This is a patch I promised to take into release-3.5. It would be 
nicer

to have this change included in the release-3.6 branch before I merge
the 3.5 backport. At the moment, 3.5.5 is waiting on this patch. 
But I
do not think you really need to delay 3.6.4 off for that one. It 
should
be fine if it lands in 3.6.5. (The compile error looks more like a 
3.6.4

blocker.)

Niels

Niels,

The patch you mentioned has received the acks and also has passed 
the linux

regression tests. But it seem to have failed netbsd regression tests.

Yes, at least the smoke tests on NetBSD and FreeBSD fail with the
compile error mentioned in the subject of this email :)

Thanks,
Niels



Regards,
Raghavendra Bhat


Regards,
Raghavendra Bhat

On Mon, Jul 6, 2015 at 8:38 AM, Kaushal M  
wrote:
Krutika hit this last week, and let us (GlusterD maintiners) 
know of
it. I volunteered to look into this, but couldn't find time. 
I'll do

it now.

~kaushal

On Sun, Jul 5, 2015 at 10:43 PM, Atin Mukherjee
 wrote:
I remember Krutika reporting it few days back. So it seems 
like its not

fixed yet. If there is no taker I will send a patch tomorrow.

-Atin
Sent from one plus one

On Jul 5, 2015 9:58 PM, "Niels de Vos"  wrote:

Hi,

it seems that the current release-3.6 branch does not compile on
FreedBSD and NetBSD (not sure why it compiles on CentOS-6). 
These errors

are thrown:

   --- glusterd_la-glusterd-op-sm.lo ---
 CC   glusterd_la-glusterd-op-sm.lo

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c: 


In function 'glusterd_op_start_rb_timer':

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:3685:19: 

error: 'GF_REPLACE_OP_START' undeclared (first use in this 
function)


/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:3685:19: 

note: each undeclared identifier is reported only once for 
each function it

appears in

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c: 


In function 'glusterd_bricks_select_status_volume':

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:5800:34: 


warning: unused variable 'snapd'
   *** [glusterd_la-glusterd-op-sm.lo] Error code 1


Could someone send a (pointer to the) backport that addresses 
this?


Thanks,
Niels


On Sun, Jul 05, 2015 at 08:59:32AM -0700, Gluster Build 
System (Code

Review) wrote:

Gluster Build System has posted comments on this change.

Change subject: nfs: make it possible to disable 
nfs.mount-rmtab
.. 




Patch Set 1: -Verified

Build Failed

http://build.gluster.org/job/compare-bug-version-and-git-branch/9953/ 
:

SUCCESS

http://build.gluster.org/job/freebsd-smoke/8551/ : FAILURE

http://build.gluster.org/job/smoke/19820/ : SUCCESS

http://build.gluster.org/job/netbsd6-smoke/7808/ : FAILURE

--
To view, visit http://review.gluster.org/11335
To unsubscribe, visit http://review.gluster.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40c4d8d754932f86fb2b1b2588843390464c773d
Gerrit-PatchSet: 1
Gerrit-Project: glusterfs
Gerrit-Branch: release-3.6
Gerrit-Owner: Niels de Vos 
Gerrit-Reviewer: Gluster Build System 


Gerrit-Reviewer: Kaleb KEITHLEY 
Gerrit-Reviewer: NetBSD Build System 


Gerrit-Reviewer: Niels de Vos 
Gerrit-Reviewer: Raghavendra Bhat 
Gerrit-Reviewer: jiffin tony Thottan 
Gerrit-HasComments: No

___
Gluster-devel mailing list
Gluster-d

Re: [Gluster-devel] v3.6.3 doesn't respect default ACLs?

2015-08-10 Thread Raghavendra Bhat

On 08/10/2015 09:56 PM, Niels de Vos wrote:

On Wed, Jul 29, 2015 at 04:00:48PM +0530, Raghavendra Bhat wrote:

On 07/27/2015 08:30 PM, Glomski, Patrick wrote:

I built a patched version of 3.6.4 and the problem does seem to be fixed
on a test server/client when I mounted with those flags (acl,
resolve-gids, and gid-timeout). Seeing as it was a test system, I can't
really provide anything meaningful as to the performance hit seen without
the gid-timeout option. Thank you for implementing it so quickly, though!

Is there any chance of getting this fix incorporated in the upcoming 3.6.5
release?

Patrick

I am planning to include this fix in 3.6.5. This fix is still under review.
Once it is accepted in master, it cab be backported to release-3.6 branch. I
will wait till then and make 3.6.5.

I dont think there is a tracker bug for 3.6.5 yet? Or at least I could
not find it by an alias.

https://bugzilla.redhat.com/show_bug.cgi?id=1252072 is used to get the
backport in release-3.6.x, please review and merge :-)

Thanks,
Niels


This is the 3.6.5 tracker bug. Will merge the patch once regression 
tests are passed.


https://bugzilla.redhat.com/show_bug.cgi?id=1250544.

Regards,
Raghavendra Bhat


Regards,
Raghavendra Bhat



On Thu, Jul 23, 2015 at 6:27 PM, Niels de Vos mailto:nde...@redhat.com>> wrote:

On Tue, Jul 21, 2015 at 10:30:04PM +0200, Niels de Vos wrote:
> On Wed, Jul 08, 2015 at 03:20:41PM -0400, Glomski, Patrick wrote:
> > Gluster devs,
> >
> > I'm running gluster v3.6.3 (both server and client side). Since my
> > application requires more than 32 groups, I don't mount with
ACLs on the
> > client. If I mount with ACLs between the bricks and set a
default ACL on
> > the server, I think I'm right in stating that the server
should respect
> > that ACL whenever a new file or folder is made.
>
> I would expect that the ACL gets in herited on the brick. When a new
> file is created without the default ACL, things seem to be
wrong. You
> mention that creating the file directly on the brick has the correct
> ACL, so there must be some Gluster component interfering.
>
> You reminded me on IRC about this email, and that helped a lot.
Its very
> easy to get distracted when trying to investigate things from the
> mailinglists.
>
> I had a brief look, and I think we could reach a solution. An
ugly patch
> for initial testing is ready. Well... it compiles. I'll try to
run some
> basic tests tomorrow and see if it improves things and does not
crash
> immediately.
>
> The change can be found here:
> http://review.gluster.org/11732
>
> It basically adds a "resolve-gids" mount option for the FUSE client.
> This causes the fuse daemon to call getgrouplist() and retrieve
all the
> groups for the UID that accesses the mountpoint. Without this
option,
> the behavior is not changed, and /proc/$PID/status is used to
get up to
> 32 groups (the $PID is the process that accesses the mountpoint).
>
> You probably want to also mount with "gid-timeout=N" where N is
seconds
> that the group cache is valid. In the current master branch this
is set
> to 300 seconds (like the sssd default), but if the groups of a used
> rarely change, this value can be increased. Previous versions had a
> lower timeout which could cause resolving the groups on almost each
> network packet that arrives (HUGE performance impact).
>
> When using this option, you may also need to enable
server.manage-gids.
> This option allows using more than ~93 groups on the bricks. The
network
> packets can only contain ~93 groups, when server.manage-gids is
enabled,
> the groups are not sent in the network packets, but are resolved
on the
> bricks with getgrouplist().

The patch linked above had been tested, corrected and updated. The
change works for me on a test-system.

A backport that you should be able to include in a package for 3.6 can
be found here: http://termbin.com/f3cj
Let me know if you are not familiar with rebuilding patched packages,
and I can build a test-version for you tomorrow.

On glusterfs-3.6, you will want to pass a gid-timeout mount option
too.
The option enables caching of the resolved groups that the uid belongs
too, if caching is not enebled (or expires quickly), you will probably
notice a preformance hit. Newer version of GlusterFS set the
timeout to
300 seconds (like the default timeout sssd uses).

Please test and let me know if this fixes your use case.

Thanks,
Niels


>
> Cheers,
> Niels
>
&

Re: [Gluster-devel] release schedule for glusterfs

2015-08-05 Thread Raghavendra Bhat

On 08/05/2015 05:57 PM, Humble Devassy Chirammal wrote:

Hi Ragavendra,

This LGTM . However Is there any guide line on :

How many beta releases hit for each minor release ? and the gap 
between these releases ?


--Humble



I am not sure about the beta releases. As per my understanding there are 
no beta releases happening in release-3.5 branch and also the latest 
release-3.7 branch. I was doing beta releases for release-3.6 branch. 
But I am also thinking of moving away from it and make 3.6.5 directly 
(and also future release-3.6 releases).


Regards,
Raghavendra Bhat



On Wed, Aug 5, 2015 at 5:12 PM, Raghavendra Bhat <mailto:rab...@redhat.com>> wrote:



Hi,

In previous community meeting it was discussed to come up with a
schedule for glusterfs releases. It was discussed that each of the
supported release branches (3.5, 3.6 and 3.7) will make a new
release every month.

The previous releases of them happened at below dates.

glusterfs-3.5.5 -> 9th July
glusterfs-3.6.4 -> 13th July
glusterfs-3.7.3 -> 29th July.

Is it ok to slightly align those dates? i.e. on 10th of every
month 3.5 based release would happen (in general the oldest
supported and most stable release branch). On 20th of every month
3.6 based release would happen (In general, the release branch
which is being stabilized). And on 30th of every month 3.7 based
release would happen (in general, the latest release branch).

Please provide feedback. Once a schedule is finalized we can put
that information in gluster.org <http://gluster.org>.

Regards,
Raghavendra Bhat

___
Gluster-devel mailing list
Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] release schedule for glusterfs

2015-08-05 Thread Raghavendra Bhat


Hi,

In previous community meeting it was discussed to come up with a 
schedule for glusterfs releases. It was discussed that each of the 
supported release branches (3.5, 3.6 and 3.7) will make a new release 
every month.


The previous releases of them happened at below dates.

glusterfs-3.5.5 -> 9th July
glusterfs-3.6.4 -> 13th July
glusterfs-3.7.3 -> 29th July.

Is it ok to slightly align those dates? i.e. on 10th of every month 3.5 
based release would happen (in general the oldest supported and most 
stable release branch). On 20th of every month 3.6 based release would 
happen (In general, the release branch which is being stabilized). And 
on 30th of every month 3.7 based release would happen (in general, the 
latest release branch).


Please provide feedback. Once a schedule is finalized we can put that 
information in gluster.org.


Regards,
Raghavendra Bhat

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] v3.6.3 doesn't respect default ACLs?

2015-07-29 Thread Raghavendra Bhat

On 07/27/2015 08:30 PM, Glomski, Patrick wrote:
I built a patched version of 3.6.4 and the problem does seem to be 
fixed on a test server/client when I mounted with those flags (acl, 
resolve-gids, and gid-timeout). Seeing as it was a test system, I 
can't really provide anything meaningful as to the performance hit 
seen without the gid-timeout option. Thank you for implementing it so 
quickly, though!


Is there any chance of getting this fix incorporated in the upcoming 
3.6.5 release?


Patrick


I am planning to include this fix in 3.6.5. This fix is still under 
review. Once it is accepted in master, it cab be backported to 
release-3.6 branch. I will wait till then and make 3.6.5.


Regards,
Raghavendra Bhat




On Thu, Jul 23, 2015 at 6:27 PM, Niels de Vos <mailto:nde...@redhat.com>> wrote:


On Tue, Jul 21, 2015 at 10:30:04PM +0200, Niels de Vos wrote:
> On Wed, Jul 08, 2015 at 03:20:41PM -0400, Glomski, Patrick wrote:
> > Gluster devs,
> >
> > I'm running gluster v3.6.3 (both server and client side). Since my
> > application requires more than 32 groups, I don't mount with
ACLs on the
> > client. If I mount with ACLs between the bricks and set a
default ACL on
> > the server, I think I'm right in stating that the server
should respect
> > that ACL whenever a new file or folder is made.
>
> I would expect that the ACL gets in herited on the brick. When a new
> file is created without the default ACL, things seem to be
wrong. You
> mention that creating the file directly on the brick has the correct
> ACL, so there must be some Gluster component interfering.
>
> You reminded me on IRC about this email, and that helped a lot.
Its very
> easy to get distracted when trying to investigate things from the
> mailinglists.
>
> I had a brief look, and I think we could reach a solution. An
ugly patch
> for initial testing is ready. Well... it compiles. I'll try to
run some
> basic tests tomorrow and see if it improves things and does not
crash
> immediately.
>
> The change can be found here:
> http://review.gluster.org/11732
>
> It basically adds a "resolve-gids" mount option for the FUSE client.
> This causes the fuse daemon to call getgrouplist() and retrieve
all the
> groups for the UID that accesses the mountpoint. Without this
option,
> the behavior is not changed, and /proc/$PID/status is used to
get up to
> 32 groups (the $PID is the process that accesses the mountpoint).
>
> You probably want to also mount with "gid-timeout=N" where N is
seconds
> that the group cache is valid. In the current master branch this
is set
> to 300 seconds (like the sssd default), but if the groups of a used
> rarely change, this value can be increased. Previous versions had a
> lower timeout which could cause resolving the groups on almost each
> network packet that arrives (HUGE performance impact).
>
> When using this option, you may also need to enable
server.manage-gids.
> This option allows using more than ~93 groups on the bricks. The
network
> packets can only contain ~93 groups, when server.manage-gids is
enabled,
> the groups are not sent in the network packets, but are resolved
on the
> bricks with getgrouplist().

The patch linked above had been tested, corrected and updated. The
change works for me on a test-system.

A backport that you should be able to include in a package for 3.6 can
be found here: http://termbin.com/f3cj
Let me know if you are not familiar with rebuilding patched packages,
and I can build a test-version for you tomorrow.

On glusterfs-3.6, you will want to pass a gid-timeout mount option
too.
The option enables caching of the resolved groups that the uid belongs
too, if caching is not enebled (or expires quickly), you will probably
notice a preformance hit. Newer version of GlusterFS set the
timeout to
300 seconds (like the default timeout sssd uses).

Please test and let me know if this fixes your use case.

Thanks,
Niels


>
> Cheers,
> Niels
>
> > Maybe an example is in order:
> >
> > We first set up a test directory with setgid bit so that our new
> > subdirectories inherit the group.
> > [root@gfs01a hpc_shared]# mkdir test; cd test; chown
pglomski.users .;
> > chmod 2770 .; getfacl .
> > # file: .
> > # owner: pglomski
> > # group: users
> > # flags: -s-
> > user::rwx
> > group::rwx
> > other::---

Re: [Gluster-devel] "gluster vol start" is failing when glusterfs is compiled with debug enable .

2015-07-21 Thread Raghavendra Bhat

On 07/22/2015 09:50 AM, Atin Mukherjee wrote:


On 07/22/2015 12:50 AM, Anand Nekkunti wrote:

Hi All
"gluster vol start" is failing when glusterfs is compiled with debug
enable .
Link: :https://bugzilla.redhat.com/show_bug.cgi?id=1245331

*brick start is failing with fallowing error:*
2015-07-21 19:01:59.408729] I [MSGID: 100030] [glusterfsd.c:2296:main]
0-/usr/local/sbin/glusterfsd: Started running /usr/local/sbin/glusterfsd
version 3.8dev (args: /usr/local/sbin/glusterfsd -s 192.168.0.4
--volfile-id VOL.192.168.0.4.tmp-BRICK1 -p
/var/lib/glusterd/vols/VOL/run/192.168.0.4-tmp-BRICK1.pid -S
/var/run/gluster/0a4faf3d8d782840484629176ecf307a.socket --brick-name
/tmp/BRICK1 -l /var/log/glusterfs/bricks/tmp-BRICK1.log --xlator-option
*-posix.glusterd-uuid=4ec09b0c-6043-40f0-bc1a-5cc312d49a78 --brick-port
49152 --xlator-option VOL-server.listen-port=49152)
[2015-07-21 19:02:00.075574] I [MSGID: 101190]
[event-epoll.c:627:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-07-21 19:02:00.078905] W [MSGID: 101095]
[xlator.c:189:xlator_dynload] 0-xlator: /usr/local/lib/libgfdb.so.0:
undefined symbol: gf_sql_str2sync_t
[2015-07-21 19:02:00.078947] E [MSGID: 101002] [graph.y:211:volume_type]
0-parser: Volume 'VOL-changetimerecorder', line 16: type
'features/changetimerecorder' is not valid or not found on this machine
[2015-07-21 19:02:00.079020] E [MSGID: 101019] [graph.y:319:volume_end]
0-parser: "type" not specified for volume VOL-changetimerecorder
[2015-07-21 19:02:00.079150] E [MSGID: 100026]
[glusterfsd.c:2151:glusterfs_process_volfp] 0-: failed to construct the
graph
[2015-07-21 19:02:00.079399] W [glusterfsd.c:1214:cleanup_and_exit]
(-->/usr/local/sbin/glusterfsd(mgmt_getspec_cbk+0x343) [0x40df64]
-->/usr/local/sbin/glusterfsd(glusterfs_process_volfp+0x1a2) [0x409b58]
-->/usr/local/sbin/glusterfsd(cleanup_and_exit+0x77) [0x407a6f] ) 0-:
received signum (0), shutting down

I am not able to hit this though.


This seems to be the case of inline functions being considerd as 
undefined symblols. There has been a discussion about it in the mailing 
list.


https://www.gluster.org/pipermail/gluster-devel/2015-June/045942.html

Regards,
Raghavendra Bhat



Thanks&Regards
Anand.N



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.6.4 released

2015-07-15 Thread Raghavendra Bhat


Hi,

glusterfs-3.6.4 has been released and the packages for 
RHEL/Fedora/Centos can be found here.

http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/

Requesting people running 3.6.x to please try it out and let us know if 
there are any issues.


This release supposedly fixes the bugs listed below since 3.6.3 was made 
available. Thanks to all who submitted patches, reviewed the changes.


1184626 - Community Repo RPMs don't include attr package as a dependency
1215421 - Fails to build on x32
1219967 - glusterfsd core dumps when cleanup and socket disconnect 
routines race

1138897 - NetBSD port
1218167 - [GlusterFS 3.6.3]: Brick crashed after setting up SSL/TLS in 
I/O access path with error: "E [socket.c:2495:socket_poller] 
0-tcp.gluster-native-volume-3G-1-server: error in polling loop"

1211840 - glusterfs-api.pc versioning breaks QEMU
1204140 - "case sensitive = no" is not honored when "preserve case = 
yes" is present in smb.conf
1230242 - `ls' on a directory which has files with mismatching gfid's 
does not list anything

1230259 -  Honour afr self-heal volume set options from clients
1122290 - Issues reported by Cppcheck static analysis tool
1227670 - wait for sometime before accessing the activated snapshot
1225745 - [AFR-V2] - afr_final_errno() should treat op_ret > 0 also as 
success

1223891 - readdirp return 64bits inodes even if enable-ino32 is set
1206429 - Maintainin local transaction peer list in op-sm framework
1217419 - DHT:Quota:- brick process crashed after deleting .glusterfs 
from backend

1225072 - OpenSSL multi-threading changes break build in RHEL5 (3.6.4beta1)
1215419 - Autogenerated files delivered in tarball
1224624 - cli: Excessive logging
1217423 - glusterfsd crashed after directory was removed from the mount 
point, while self-heal and rebalance  were running on 
the volume
1241785 - Gluster commands timeout on SSL enabled system, after adding 
new node to trusted storage pool

1241275 - Peer not recognized after IP address change
1234846 - GlusterD does not store updated peerinfo objects.
1238074 - protocol/server doesn't reconfigure auth.ssl-allow options
1233036 - Fix shd coredump from tests/bugs/glusterd/bug-948686.t

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] on patch #11553

2015-07-08 Thread Raghavendra Bhat

On 07/07/2015 12:30 PM, Raghavendra G wrote:

+ vijay mallikarjuna for quotad has similar concerns

+ Raghavendra Bhat for snapd might've similar concerns.


Snapd also uses protocol/server at the top of the graph. So the fix for 
protocol/server should be good enough.


Regards,
Raghavendra Bhat



On Tue, Jul 7, 2015 at 12:02 PM, Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


+gluster-devel

- Original Message -
> From: "Raghavendra Gowdappa" mailto:rgowd...@redhat.com>>
> To: "Krishnan Parthasarathi" mailto:kpart...@redhat.com>>
> Cc: "Nithya Balachandran" mailto:nbala...@redhat.com>>, "Anoop C S" mailto:achir...@redhat.com>>
> Sent: Tuesday, 7 July, 2015 11:32:01 AM
> Subject: on patch #11553
>
> KP,
>
> Though the crash because of lack of init while fops are in
progress is
> solved, concerns addressed by [1] are still valid. Basically
what we need to
> guarantee is that when is it safe to wind fops through a
particular subvol
> of protocol/server. So, if some xlators are doing things in
events like
> CHILD_UP (like trash), server_setvolume should wait for CHILD_UP
on a
> particular subvol before accepting a client. So, [1] is
necessary but
> following changes need to be made:
>
> 1. protocol/server _can_ have multiple subvol as children. In
that case we
> should track whether the exported subvol has received CHILD_UP
and only
> after a successful CHILD_UP on that subvol connections to that
subvol can be
> accepted.
> 2. It is valid (though not a common thing on brick process) that
some subvols
> can be up and some might be down. So, child readiness should be
localised to
> that subvol instead of tracking readiness at protocol/server level.
>
> So, please revive [1] and send it with corrections and I'll
merge it.
>
> [1] http://review.gluster.org/11553
>
> regards,
> Raghavendra.
___
Gluster-devel mailing list
Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-devel




--
Raghavendra G


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] healing of bad objects (marked by scrubber)

2015-07-07 Thread Raghavendra Bhat

Adding the correct gluster-devel id.

Regards,
Raghavendra Bhat

On 07/08/2015 11:38 AM, Raghavendra Bhat wrote:


Hi,

In bit-rot feature, the scrubber marks the corrupted (objects whose 
data has gone bad) as bad objects (via extended attribute). If the 
volume is a replicate volume and a object in one of the replicas goes 
bad. In this case, the client is able to see the data via the good 
copy present in the other replica. But as of now, the self-heal does 
not heal the bad objects.  So the method to heal the bad object is to 
remove the bad object directly from the backend and let self-heal take 
care of healing it from the good copy.


The above method has a problem. The bit-rot-stub xlator sitting in the 
brick graph, remembers an object as bad in its inode context (either 
when the object was being marked bad by scrubber, or during the first 
lookup of the object if it was already marked bad). Bit-rot-stub uses 
that info to block any read/write operations on such bad objects. So 
it blocks any kind of operation attempted by self-heal as well to 
correct the object (the object was deleted directly in the backend, so 
the in memory inode will still be present and considered valid).


There are 2 methods that I think can solve the issue.

1) In server_lookup_cbk, if the lookup of a object fails due to 
ENOENT  *AND*  the lookup is a revalidate lookup, then forget the 
inode associated with that object (not just unlinking the dentry, 
forget the inode as well iff there are no more dentries associated 
with the inode). Atleast this way, the inode would be forgotten, and 
later when self-heal wants to correct the object, it has to create a 
new object (the object was removed directly from the backend), which 
has to happen with the creation of a new in memory inode and 
read/write operations by self-heal daemon will not be blocked.

I have sent a patch for review for the above method:
http://review.gluster.org/#/c/11489/

OR

2) Do not block write operations coming on the bad object if the 
operation is coming from self-heal and allow it to completely heal the 
file and once healing is done, remove the bad-object information from 
the inode context.
The requests coming from self-heal demon can be identified by checking 
the pid of it (it has -ve pid). But if the self-heal happening from 
the glusterfs client itself, I am not sure whether self-heal happens 
with a -ve pid for the frame or the same pid as that of the frame of 
the original fop which triggered the self-heal. Pranith? Can you 
clarify this?


Please provide feedback.

Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [release-3.6] compile error: 'GF_REPLACE_OP_START' undeclared

2015-07-06 Thread Raghavendra Bhat

On 07/06/2015 01:39 PM, Niels de Vos wrote:

On Mon, Jul 06, 2015 at 12:09:28PM +0530, Raghavendra Bhat wrote:

On 07/06/2015 09:52 AM, Kaushal M wrote:

I checked on NetBSD-7.0_BETA and FreeBSD-10.1. I couldn't reproduce
this. I'll try on NetBSD-6 next.

~kaushal

I think it has to be included before 3.6.4 is made G.A. I can wait till the
fix for this issue is merged before making 3.6.4. Does it sound ok? Or
should I go ahead with 3.6.4 and make a quick 3.6.5 with this fix?

I only care about getting http://review.gluster.org/11335 merged :-)

This is a patch I promised to take into release-3.5. It would be nicer
to have this change included in the release-3.6 branch before I merge
the 3.5 backport. At the moment, 3.5.5 is waiting on this patch. But I
do not think you really need to delay 3.6.4 off for that one. It should
be fine if it lands in 3.6.5. (The compile error looks more like a 3.6.4
blocker.)

Niels


Niels,

The patch you mentioned has received the acks and also has passed the 
linux regression tests. But it seem to have failed netbsd regression tests.


Regards,
Raghavendra Bhat


Regards,
Raghavendra Bhat


On Mon, Jul 6, 2015 at 8:38 AM, Kaushal M  wrote:

Krutika hit this last week, and let us (GlusterD maintiners) know of
it. I volunteered to look into this, but couldn't find time. I'll do
it now.

~kaushal

On Sun, Jul 5, 2015 at 10:43 PM, Atin Mukherjee
 wrote:

I remember Krutika reporting it few days back. So it seems like its not
fixed yet. If there is no taker I will send a patch tomorrow.

-Atin
Sent from one plus one

On Jul 5, 2015 9:58 PM, "Niels de Vos"  wrote:

Hi,

it seems that the current release-3.6 branch does not compile on
FreedBSD and NetBSD (not sure why it compiles on CentOS-6). These errors
are thrown:

   --- glusterd_la-glusterd-op-sm.lo ---
 CC   glusterd_la-glusterd-op-sm.lo

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:
In function 'glusterd_op_start_rb_timer':

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:3685:19:
error: 'GF_REPLACE_OP_START' undeclared (first use in this function)

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:3685:19:
note: each undeclared identifier is reported only once for each function it
appears in

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:
In function 'glusterd_bricks_select_status_volume':

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:5800:34:
warning: unused variable 'snapd'
   *** [glusterd_la-glusterd-op-sm.lo] Error code 1


Could someone send a (pointer to the) backport that addresses this?

Thanks,
Niels


On Sun, Jul 05, 2015 at 08:59:32AM -0700, Gluster Build System (Code
Review) wrote:

Gluster Build System has posted comments on this change.

Change subject: nfs: make it possible to disable nfs.mount-rmtab
..


Patch Set 1: -Verified

Build Failed

http://build.gluster.org/job/compare-bug-version-and-git-branch/9953/ :
SUCCESS

http://build.gluster.org/job/freebsd-smoke/8551/ : FAILURE

http://build.gluster.org/job/smoke/19820/ : SUCCESS

http://build.gluster.org/job/netbsd6-smoke/7808/ : FAILURE

--
To view, visit http://review.gluster.org/11335
To unsubscribe, visit http://review.gluster.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40c4d8d754932f86fb2b1b2588843390464c773d
Gerrit-PatchSet: 1
Gerrit-Project: glusterfs
Gerrit-Branch: release-3.6
Gerrit-Owner: Niels de Vos 
Gerrit-Reviewer: Gluster Build System 
Gerrit-Reviewer: Kaleb KEITHLEY 
Gerrit-Reviewer: NetBSD Build System 
Gerrit-Reviewer: Niels de Vos 
Gerrit-Reviewer: Raghavendra Bhat 
Gerrit-Reviewer: jiffin tony Thottan 
Gerrit-HasComments: No

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [release-3.6] compile error: 'GF_REPLACE_OP_START' undeclared

2015-07-05 Thread Raghavendra Bhat

On 07/06/2015 09:52 AM, Kaushal M wrote:

I checked on NetBSD-7.0_BETA and FreeBSD-10.1. I couldn't reproduce
this. I'll try on NetBSD-6 next.

~kaushal


I think it has to be included before 3.6.4 is made G.A. I can wait till 
the fix for this issue is merged before making 3.6.4. Does it sound ok? 
Or should I go ahead with 3.6.4 and make a quick 3.6.5 with this fix?


Regards,
Raghavendra Bhat



On Mon, Jul 6, 2015 at 8:38 AM, Kaushal M  wrote:

Krutika hit this last week, and let us (GlusterD maintiners) know of
it. I volunteered to look into this, but couldn't find time. I'll do
it now.

~kaushal

On Sun, Jul 5, 2015 at 10:43 PM, Atin Mukherjee
 wrote:

I remember Krutika reporting it few days back. So it seems like its not
fixed yet. If there is no taker I will send a patch tomorrow.

-Atin
Sent from one plus one

On Jul 5, 2015 9:58 PM, "Niels de Vos"  wrote:

Hi,

it seems that the current release-3.6 branch does not compile on
FreedBSD and NetBSD (not sure why it compiles on CentOS-6). These errors
are thrown:

   --- glusterd_la-glusterd-op-sm.lo ---
 CC   glusterd_la-glusterd-op-sm.lo

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:
In function 'glusterd_op_start_rb_timer':

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:3685:19:
error: 'GF_REPLACE_OP_START' undeclared (first use in this function)

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:3685:19:
note: each undeclared identifier is reported only once for each function it
appears in

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:
In function 'glusterd_bricks_select_status_volume':

/home/jenkins/root/workspace/netbsd6-smoke/xlators/mgmt/glusterd/src/glusterd-op-sm.c:5800:34:
warning: unused variable 'snapd'
   *** [glusterd_la-glusterd-op-sm.lo] Error code 1


Could someone send a (pointer to the) backport that addresses this?

Thanks,
Niels


On Sun, Jul 05, 2015 at 08:59:32AM -0700, Gluster Build System (Code
Review) wrote:

Gluster Build System has posted comments on this change.

Change subject: nfs: make it possible to disable nfs.mount-rmtab
..


Patch Set 1: -Verified

Build Failed

http://build.gluster.org/job/compare-bug-version-and-git-branch/9953/ :
SUCCESS

http://build.gluster.org/job/freebsd-smoke/8551/ : FAILURE

http://build.gluster.org/job/smoke/19820/ : SUCCESS

http://build.gluster.org/job/netbsd6-smoke/7808/ : FAILURE

--
To view, visit http://review.gluster.org/11335
To unsubscribe, visit http://review.gluster.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40c4d8d754932f86fb2b1b2588843390464c773d
Gerrit-PatchSet: 1
Gerrit-Project: glusterfs
Gerrit-Branch: release-3.6
Gerrit-Owner: Niels de Vos 
Gerrit-Reviewer: Gluster Build System 
Gerrit-Reviewer: Kaleb KEITHLEY 
Gerrit-Reviewer: NetBSD Build System 
Gerrit-Reviewer: Niels de Vos 
Gerrit-Reviewer: Raghavendra Bhat 
Gerrit-Reviewer: jiffin tony Thottan 
Gerrit-HasComments: No

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] tests/bugs/snapshot/bug-1109889.t - snapd crash

2015-07-03 Thread Raghavendra Bhat

On 07/03/2015 03:37 PM, Atin Mukherjee wrote:

http://build.gluster.org/job/rackspace-regression-2GB-triggered/11898/consoleFull
has caused a crash in snapd with the following bt:


This seem to have crashed in server_setvolume (i.e. before the graph 
could be properly made available for i/o. snapview-server xlator is yet 
to come into the picture). But still I will try to reproduce it on my 
local setup and see what might be causing this.



Regards,
Raghavendra Bhat



#0  0x7f11e2ed3ded in gf_client_put (client=0x0, detached=0x0)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/client_t.c:294
#1  0x7f11d4eeac96 in server_setvolume (req=0x7f11c000195c)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/protocol/server/src/server-handshake.c:710
#2  0x7f11e2c1e05c in rpcsvc_handle_rpc_call (svc=0x7f11d001b160,
trans=0x7f11cac0, msg=0x7f11c0001810)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpcsvc.c:698
#3  0x7f11e2c1e3cf in rpcsvc_notify (trans=0x7f11cac0,
mydata=0x7f11d001b160, event=RPC_TRANSPORT_MSG_RECEIVED,
 data=0x7f11c0001810) at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpcsvc.c:792
#4  0x7f11e2c23ad7 in rpc_transport_notify (this=0x7f11cac0,
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f11c0001810)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-transport.c:538
#5  0x7f11d841787b in socket_event_poll_in (this=0x7f11cac0)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2285
#6  0x7f11d8417dd1 in socket_event_handler (fd=13, idx=3,
data=0x7f11cac0, poll_in=1, poll_out=0, poll_err=0)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2398
#7  0x7f11e2ed79ec in event_dispatch_epoll_handler
(event_pool=0x13bb040, event=0x7f11d4eb9e70)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:570
#8  0x7f11e2ed7dda in event_dispatch_epoll_worker (data=0x7f11d000dc10)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:673
#9  0x7f11e213e9d1 in start_thread () from ./lib64/libpthread.so.0
#10 0x7f11e1aa88fd in clone () from ./lib64/libc.so.6



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.6.4beta2 released

2015-07-02 Thread Raghavendra Bhat

Hi,

glusterfs-3.6.4beta1 has been released and the packages for 
RHEL/Fedora/Centos can be found here.

http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.6.4beta2/

Requesting people running 3.6.x to please try it out and let us know if 
there are any issues.


This release supposedly fixes the bugs listed below since 3.6.4beta1 was 
made available. Thanks to all who submitted patches, reviewed the changes.


1230242 - `ls' on a directory which has files with mismatching gfid's 
does not list anything

1230259 -  Honour afr self-heal volume set options from clients
1122290 - Issues reported by Cppcheck static analysis tool
1227670 - wait for sometime before accessing the activated snapshot
1225745 - [AFR-V2] - afr_final_errno() should treat op_ret > 0 also as 
success

1223891 - readdirp return 64bits inodes even if enable-ino32 is set
1206429 - Maintainin local transaction peer list in op-sm framework
1217419 - DHT:Quota:- brick process crashed after deleting .glusterfs 
from backend

1225072 - OpenSSL multi-threading changes break build in RHEL5 (3.6.4beta1)
1215419 - Autogenerated files delivered in tarball
1224624 - cli: Excessive logging
1217423 - glusterfsd crashed after directory was removed from the mount 
point, while self-heal and rebalance  were running on 
the volume



Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bad file access (bit-rot + AFR)

2015-06-29 Thread Raghavendra Bhat

On 06/27/2015 03:28 PM, Venky Shankar wrote:



On 06/27/2015 02:32 PM, Raghavendra Bhat wrote:

Hi,

There is a patch that is submitted for review to deny access to 
objects which are marked as bad by scrubber (i.e. the data of the 
object might have been corrupted in the backend).


http://review.gluster.org/#/c/11126/10
http://review.gluster.org/#/c/11389/4

The above  2 patch sets solve the problem of denying access to the 
bad objects (they have passed regression and received a +1 from 
venky). But in our testing we found that there is a race window 
(depending upon the scrubber frequency the race window can be larger) 
where there is a possibility of self-heal daemon healing the contents 
of the bad file before scrubber can mark it as bad.


I am not sure if the data truly gets corrupted in the backend, there 
is a chance of hitting this issue. But in our testing to simulate 
backend corruption we modify the contents of the file directly in the 
backend. Now in this case, before the scrubber can mark the object as 
bad, the self-heal daemon kicks in and heals the contents of the bad 
file to the good copy. Or before the scrubber marks the file as bad, 
if the client accesses it AFR finds that there is a mismatch in 
metadata (since we modified the contents of the file in the backend) 
and does data and metadata self-healing, thus copying the contents of 
the bad copy to good copy. And from now onwards the clients accessing 
that object always gets bad data.


I understand from Ravi (ranaraya@) that AFR-v2 would chose the 
"biggest" file as the source, provided that afr xattrs are "clean" 
(AFR-v1 would give back EIO). If a file is modified directly from the 
brick but leaves the size unchanged, contents can be served from 
either copy. For self-heal to detect anomalies, there needs to be 
verification (checksum/signature) at each stage of it's operation. But 
this might be too heavy on the I/O side. We could still cache mtime 
[but update on client I/O] after pre-check, but this still would not 
catch bit flips (unless a filesystem scrub is done).


Thoughts?



Yes. Even if wants to verify just before healing the file, the time 
taken to verify the checksum might be large if the file size is large. 
It might affect the self-heal performance.


Regards,
Raghavendra Bhat



Pranith?Do you have any solution for this? Venky and me are trying to 
come up with a solution for this.


But does this issue block the above patches in anyway? (Those 2 
patches are still needed to deny access to objects once they are 
marked as bad by scrubber).



Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] xattr creation failure in posix_lookup

2015-06-29 Thread Raghavendra Bhat


Hi,

In posix_lookup, it allocates a dict for storing the values of the 
extended attributes and other hint keys set into the xdata of call path 
(i.e. wind path) by higher xlators (such as quick-read, bit-rot-stub etc).


But if the creation of new dict fails, then a NULL dict is returned in 
the callback path. There might be many xlators for which the key-value 
information present in the dict might be very important for making 
certain decisions (Ex: In bit-rot-stub it tries to fetch an extended 
attribute which tells whether the object is bad or not. If the the key 
is present in the dict means the object is bad and the xlator updates 
the same in the inode context. Later when there is any read/modify 
operations on that object, the fop is failed instead of allowing to 
continue).


Now suppose in posix_lookup the dict creation fails, then posix simply 
proceeds with the lookup operation and if other stat operations 
succeeded, then lookup will return success with NULL dict.


if (xdata && (op_ret == 0)) {
xattr = posix_xattr_fill (this, real_path, loc, NULL, 
-1, xdata,

  &buf);
}

The above piece of code in posix_lookup creates a new dict called 
@xattr. The return value of posix_xattr_fill is not checked.


So in this case, as per the bit-rot-stub example mentioned above, there 
is a possibility that the object being looked up is a bad object (marked 
by the scrubber). And since lookup succeeded, but the bad-object xattr 
is not obtained in the callback (dict itself being NULL), bit-rot-stub 
xlator does not mark that object as bad and might allow further 
read/write requests coming, thus allowing bad data to be served.


There might be other xlators as well dependent upon the xattrs being 
returned in lookup.


Should we fail lookup if the dict creation fails?

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] bad file access (bit-rot + AFR)

2015-06-27 Thread Raghavendra Bhat

Hi,

There is a patch that is submitted for review to deny access to objects 
which are marked as bad by scrubber (i.e. the data of the object might 
have been corrupted in the backend).


http://review.gluster.org/#/c/11126/10
http://review.gluster.org/#/c/11389/4

The above  2 patch sets solve the problem of denying access to the bad 
objects (they have passed regression and received a +1 from venky). But 
in our testing we found that there is a race window (depending upon the 
scrubber frequency the race window can be larger) where there is a 
possibility of self-heal daemon healing the contents of the bad file 
before scrubber can mark it as bad.


I am not sure if the data truly gets corrupted in the backend, there is 
a chance of hitting this issue. But in our testing to simulate backend 
corruption we modify the contents of the file directly in the backend. 
Now in this case, before the scrubber can mark the object as bad, the 
self-heal daemon kicks in and heals the contents of the bad file to the 
good copy. Or before the scrubber marks the file as bad, if the client 
accesses it AFR finds that there is a mismatch in metadata (since we 
modified the contents of the file in the backend) and does data and 
metadata self-healing, thus copying the contents of the bad copy to good 
copy. And from now onwards the clients accessing that object always gets 
bad data.


Pranith?Do you have any solution for this? Venky and me are trying to 
come up with a solution for this.


But does this issue block the above patches in anyway? (Those 2 patches 
are still needed to deny access to objects once they are marked as bad 
by scrubber).



Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failure with test-case ./tests/basic/tier/tier.t

2015-06-26 Thread Raghavendra Bhat

On 06/26/2015 04:00 PM, Ravishankar N wrote:



On 06/26/2015 03:57 PM, Vijaikumar M wrote:

Hi

Upstream regression failure with test-case ./tests/basic/tier/tier.t

My patch# 11315 regression failed twice with 
test-case./tests/basic/tier/tier.t. Anyone seeing this issue with 
other patches?




Yes, one of my patches failed today too: 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11461/consoleFull


-Ravi


Even I had faced failure in tier.t couple of times.

Regards,
Raghavendra Bhat

http://build.gluster.org/job/rackspace-regression-2GB-triggered/11396/consoleFull 

http://build.gluster.org/job/rackspace-regression-2GB-triggered/11456/consoleFull 




Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Valgrind + glusterfs

2015-06-24 Thread Raghavendra Bhat

On 06/25/2015 09:57 AM, Pranith Kumar Karampuri wrote:

hi,
   Does anyone know why glusterfs hangs with valgrind?

Pranith


Yes. I have faced it too. It used work before. But recently its not 
working. glusterfs hangs when run with valgrind.

Not sure why it is hanging.


Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Bad file access in bit-rot-detection

2015-06-08 Thread Raghavendra Bhat


Hi,

As part of Bit-rot detection feature a file that has its data changed 
due to some backend errors is marked as a bad file by the scrubber (sets 
an extended attribute indicating its a bad file). Now, the access to the 
bad file has to be denied (to prevent wrong data being served).


In bit-rot-stub xlator (the xlator which does object versioning and 
sends notifications to BitD upon object modification) the check for 
whether the file is bad or not can be done in lookup where if the xattr 
is set, then the object can be marked as bad within its inode context as 
well. But the problem is what if the object was not marked as bad at the 
time of lookup and later it was marked bad. Now when a fop such as open, 
readv or writev comes, the fops should not be allowed. If its fuse 
client from which the file is being accessed, then probably its ok to 
rely only on lookups (to check if its bad or not), as fuse sends lookups 
before sending fops. But for NFS, once the lookup is done and filehandle 
is available further lookups are not sent. In that case relying only on 
lookup to check if its a bad file or not is suffecient.


Below 3 solutions in bit-rot-stub xlator seem to address the above issue.

1) Whenever a fop such as open, readv or writev comes, check in the 
inode context if its a bad file or not. If not, then send a getxattr of 
bad file xattr on that file. If its present, then set the bad file 
attribute in the inode context and fail the fop.


But for above operation, a getxattr call has to be sent downwards for 
almost each open or readv or writev. If the file is identified as bad, 
then getxattr might not be necessary. But for good files extra getxattr 
might affect the performance.


OR

2) Set a key in xdata whenever open, readv, or writev comes (in 
bit-rot-stub xlator) and send it downwards. The posix xlator can look 
into the xdata and if the key for bad file identification is present, 
then it can do getxattr as part of open or readv or writev itself and 
send the response back in xdata itself.


Not sure whether the above method is ok or not as it overloads open, 
readv and writev. Apart from that, the getxattr disk operation is still 
done.


OR

3) Once the file is identified as bad, the scrubber marks it as bad (via 
setxattr) by sending a call to to bit-rot-stub xlator. Bit-rot-stub 
xlator marks the file as bad in the inode context once it receives the 
notification from scrubber that a file is bad. This saves those getxattr 
calls being made from other fops (either in bit-rot-stub xlator or posix 
xlator).


But the trick is what if the inode gets forgotten or the brick restarts. 
But I think in that case, checking in lookup call is suffecient (as in 
both inode forgets and brick restarts, a lookup will definitely come if 
there is an accss to that file).


Please provide feedback on above 3 methods. If there are any other 
solutions which might solve this issue, they are welcome.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.6.4beta1 released

2015-05-27 Thread Raghavendra Bhat


Hi,

glusterfs-3.6.4beta1 has been released and can be found here.
http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.6.4beta1/

This release supposedly fixes the bugs listed below since 3.6.3 was made 
available. Thanks to all who submitted patches, reviewed the changes.



1184626 - Community Repo RPMs don't include attr package as a dependency
1215421 - Fails to build on x32
1219967 - glusterfsd core dumps when cleanup and socket disconnect 
routines race

1138897 - NetBSD port
1218167 - [GlusterFS 3.6.3]: Brick crashed after setting up SSL/TLS in 
I/O access path with error: "E [socket.c:2495:socket_poller] 
0-tcp.gluster-native-volume-3G-1-server: error in polling loop"

1211840 - glusterfs-api.pc versioning breaks QEMU
1204140 - "case sensitive = no" is not honored when "preserve case = 
yes" is present in smb.conf



Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression status

2015-05-08 Thread Raghavendra Bhat

On Thursday 07 May 2015 10:50 AM, Sachin Pandit wrote:


- Original Message -

From: "Vijay Bellur" 
To: "Pranith Kumar Karampuri" , "Gluster Devel" 
, "Rafi Kavungal
Chundattu Parambil" , "Aravinda" , "Sachin 
Pandit" ,
"Raghavendra Bhat" , "Kotresh Hiremath Ravishankar" 

Sent: Wednesday, May 6, 2015 10:53:01 PM
Subject: Re: [Gluster-devel] spurious regression status

On 05/06/2015 06:52 AM, Pranith Kumar Karampuri wrote:

hi,
Please backport the patches that fix spurious regressions to 3.7
as well. This is the status of regressions now:

   * ./tests/bugs/quota/bug-1035576.t (Wstat: 0 Tests: 24 Failed: 2)

   * Failed tests:  20-21

   *
   
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8329/consoleFull


   * ./tests/bugs/snapshot/bug-1112559.t: 1 new core files

   *
   
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8308/consoleFull

   * One more occurrence -

   * Failed tests:  9, 11

   *
   
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8430/consoleFull


Rafi - this seems to be a test unit contributed by you. Can you please
look into this one?



   * ./tests/geo-rep/georep-rsync-changelog.t (Wstat: 256 Tests: 3 Failed:
   0)

   * Non-zero exit status: 1

   *
   http://build.gluster.org/job/rackspace-regression-2GB-triggered/8168/console



Aravinda/Kotresh - any update on this? If we do not intend enabling
geo-replication tests in regression runs for now, this should go off the
list.


   * ./tests/basic/quota-anon-fd-nfs.t (failed-test: 21)

   * Happens in: master
 
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/8147/consoleFull)
 
<http://build.gluster.org/job/rackspace-regression-2GB-triggered/8147/consoleFull%29>

   * Being investigated by: ?


Sachin - does this happen anymore or should we move it off the list?

quota-anon-fd.t failure is consistent in NetBSD, whereas in linux
apart from test failure mentioned in etherpad I did not see this
failure again in the regression runs. However, I remember Pranith
talking about hitting this issue again.



   * tests/features/glupy.t

   * nuked tests 7153, 7167, 7169, 7173, 7212


Emmanuel's investigation should help us here. Thanks!


   * tests/basic/volume-snapshot-clone.t

   * http://review.gluster.org/#/c/10053/

   * Came back on April 9

   * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/


Rafi - does this happen anymore? If fixed due to subsequent commits, we
should look at dropping this test from is_bad_test() in run-tests.sh.


   * tests/basic/uss.t

   * https://bugzilla.redhat.com/show_bug.cgi?id=1209286

   * http://review.gluster.org/#/c/10143/

   * Came back on April 9

   * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6660/

   * ./tests/bugs/glusterfs/bug-867253.t (Wstat: 0 Tests: 9 Failed: 1)

   * Failed test:  8


Raghu - does this happen anymore? If fixed due to subsequent commits, we
should look at dropping this test from is_bad_test() in run-tests.sh.

-Vijay



As per the Jenkins output uss.t is failing in this test case

TEST stat $M0/.history/snap6/aaa

And its failing with the below error.

stat: cannot stat `/mnt/glusterfs/0/.history/snap6/aaa': No such file or 
directory


Its bit strange as before doing this check the file is created in the 
mount point and then the snapshot is taken. I am not sure whether its 
not able to reach the file itself or its parent directory (which 
represents the snapshot of the volume i.e. in this case its 
/mnt/glusterfs/0/.history/snap6).


So I have sent a patch to check for the parent directory (i.e. stat on 
it). It will help us get more information.

http://review.gluster.org/10671

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression status

2015-05-07 Thread Raghavendra Bhat

On Wednesday 06 May 2015 10:53 PM, Vijay Bellur wrote:

On 05/06/2015 06:52 AM, Pranith Kumar Karampuri wrote:

hi,
   Please backport the patches that fix spurious regressions to 3.7
as well. This is the status of regressions now:

  * ./tests/bugs/quota/bug-1035576.t (Wstat: 0 Tests: 24 Failed: 2)

  * Failed tests:  20-21

  * 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8329/consoleFull



  * ./tests/bugs/snapshot/bug-1112559.t: 1 new core files

  * 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8308/consoleFull


  * One more occurrence -

  * Failed tests:  9, 11

  * 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8430/consoleFull



Rafi - this seems to be a test unit contributed by you. Can you please 
look into this one?



  * ./tests/geo-rep/georep-rsync-changelog.t (Wstat: 256 Tests: 3 
Failed: 0)


  * Non-zero exit status: 1

  * 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8168/console





Aravinda/Kotresh - any update on this? If we do not intend enabling 
geo-replication tests in regression runs for now, this should go off 
the list.




  * ./tests/basic/quota-anon-fd-nfs.t (failed-test: 21)

  * Happens in: master
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/8147/consoleFull)
<http://build.gluster.org/job/rackspace-regression-2GB-triggered/8147/consoleFull%29>

  * Being investigated by: ?



Sachin - does this happen anymore or should we move it off the list?




  * tests/features/glupy.t

  * nuked tests 7153, 7167, 7169, 7173, 7212



Emmanuel's investigation should help us here. Thanks!



  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/




Rafi - does this happen anymore? If fixed due to subsequent commits, 
we should look at dropping this test from is_bad_test() in run-tests.sh.




  * tests/basic/uss.t

  * https://bugzilla.redhat.com/show_bug.cgi?id=1209286

  * http://review.gluster.org/#/c/10143/

  * Came back on April 9

  * 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/6660/


  * ./tests/bugs/glusterfs/bug-867253.t (Wstat: 0 Tests: 9 Failed: 1)

  * Failed test:  8



Raghu - does this happen anymore? If fixed due to subsequent commits, 
we should look at dropping this test from is_bad_test() in run-tests.sh.


-Vijay


I tried to reproduce the issue and it did not happen in my setup. So I 
am planning to get a slave machine and test it there.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.6.3 released

2015-04-27 Thread Raghavendra Bhat


Hi,

glusterfs-3.6.3 has been released and can be found here.
http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/.

This release supposedly fixes the bugs listed below since 3.6.2 was made 
available. Thanks to all who submitted patches, reviewed the changes.



1187526 - Disperse volume mounted through NFS doesn't list any 
files/directories
1188471 - When the volume is in stopped state/all the bricks are down 
mount of the volume hangs
1201484 - glusterfs-3.6.2 fails to build on Ubuntu Precise: 
'RDMA_OPTION_ID_REUSEADDR' undeclared

1202212 - Performance enhancement for RDMA
1189023 - Directories not visible anymore after add-brick, new brick 
dirs not part of old bricks

1202673 - Perf: readdirp in replicated volumes causes performance degrade
1203081 - Entries in indices/xattrop directory not removed appropriately
1203648 - Quota: Build ancestry in the lookup
1199936 - readv on /var/run/6b8f1f2526c6af8a87f1bb611ae5a86f.socket 
failed when NFS is disabled

1200297 - cli crashes when listing quota limits with xml output
1201622 - Convert quota size from n-to-h order before using it
1194141 - AFR : failure in self-heald.t
1201624 - Spurious failure of tests/bugs/quota/bug-1038598.t
1194306 - Do not count files which did not need index heal in the first 
place as successfully healed
1200258 - Quota: features.quota-deem-statfs is "on" even after disabling 
quota.

1165938 - Fix regression test spurious failures
1197598 - NFS logs are filled with system.posix_acl_access messages
1199577 - mount.glusterfs uses /dev/stderr and fails if the device does 
not exist

1197598 - NFS logs are filled with system.posix_acl_access messages
1188066 - logging improvements in marker translator
1191537 - With afrv2 + ext4, lookups on directories with large offsets 
could result in duplicate/missing entries

1165129 - libgfapi: use versioned symbols in libgfapi.so for compatibility
1179136 - glusterd: Gluster rebalance status returns failure
1176756 - glusterd: remote locking failure when multiple synctask 
transactions are run
1188064 - log files get flooded when removexattr() can't find a 
specified key or value

1165938 - Fix regression test spurious failures
1192522 - index heal doesn't continue crawl on self-heal failure
1193970 - Fix spurious ssl-authz.t regression failure (backport)
1138897 - NetBSD port
1184527 - Some newly created folders have root ownership although 
created by unprivileged user
1181977 - gluster vol clear-locks vol-name path kind all inode return IO 
error in a disperse volume

1159471 - rename operation leads to core dump
1173528 - Change in volume heal info command output
1186119 - tar on a gluster directory gives message "file changed as we 
read it" even though no updates to file in progress
1183716 - Force replace-brick lead to the persistent write(use dd) 
return Input/output error

1138897 - NetBSD port
1178590 - Enable quota(default) leads to heal directory's xattr failed.
1182490 - Internal ec xattrs are allowed to be modified
1187547 - self-heal-algorithm with option "full" doesn't heal sparse 
files correctly
1174170 - Glusterfs outputs a lot of warnings and errors when quota is 
enabled

1212684 - GlusterD segfaults when started with management SSL


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Raghavendra Bhat

On Thursday 02 April 2015 05:50 PM, Jeff Darcy wrote:

I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND
like other xlators which also use mem_get for local (such as AFR). I am
suspecting crypt not doing mem_put might be the reason for the bug
mentioned.

My understanding was that mem_put should be called automatically from
FRAME_DESTROY, which is itself called from STACK_DESTROY when the fop
completes (e.g. at FUSE or GFAPI).  On the other hand, I see that AFR
and others call mem_put themselves, without zeroing the local pointer.
In my (possibly no longer relevant) experience, freeing local myself
without zeroing the pointer would lead to a double free, and I don't
see why that's not the case here.  What am I missing?


As per my understanding, the xlators which get local by mem_get should 
be doing below things in callback funtion  just before unwinding:


1) save frame->local pointer (i.e. local = frame->local);
2) STACK_UNWIND
3) mem_put (local)

After STACK_UNWIND and before mem_put any reference to fd or inode or 
dict that might be present in the local should be unrefed (also any 
allocated resources that are present in local should be freed). So 
mem_put is done at last. To avoid double free in FRAME_DESTROY, 
frame->local is set to NULL before doing STACK_UNWIND.


I suspect not doing 1 of the above three operations (may be either 1st 
or 3rd) in crypt xlator might be the reason for the bug.


Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Raghavendra Bhat

On Thursday 02 April 2015 01:00 PM, Pranith Kumar Karampuri wrote:


On 04/02/2015 12:27 AM, Raghavendra Talur wrote:



On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift > wrote:


On 1 Apr 2015, at 10:57, Emmanuel Dreyfus mailto:m...@netbsd.org>> wrote:
> Hi
>
> crypt.t was recently broken in NetBSD regression. The glusterfs
returns
> a node with file type invalid to FUSE, and that breaks the test.
>
> After running a git bisect, I found the offending commit after
which
> this behavior appeared:
>8a2e2b88fc21dc7879f838d18cd0413dd88023b7
>mem-pool: invalidate memory on GF_FREE to aid debugging
>
> This means the bug has always been there, but this debugging aid
> caused it to be reliable.

Sounds like that commit is a good win then. :)

Harsha/Pranith/Lala, your names are on the git blame for crypt.c...
any ideas? :)


I found one issue that local is not allocated using GF_CALLOC and 
with a mem-type.

This is a patch which *might* fix it.

diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h 
b/xlators/encryption/crypt/src/crypt-mem-types.h

index 2eab921..c417b67 100644
--- a/xlators/encryption/crypt/src/crypt-mem-types.h
+++ b/xlators/encryption/crypt/src/crypt-mem-types.h
@@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ {
gf_crypt_mt_key,
gf_crypt_mt_iovec,
gf_crypt_mt_char,
+gf_crypt_mt_local,
gf_crypt_mt_end,
 };
diff --git a/xlators/encryption/crypt/src/crypt.c 
b/xlators/encryption/crypt/src/crypt.c

index ae8cdb2..63c0977 100644
--- a/xlators/encryption/crypt/src/crypt.c
+++ b/xlators/encryption/crypt/src/crypt.c
@@ -48,7 +48,7 @@ static crypt_local_t 
*crypt_alloc_local(call_frame_t *frame, xlator_t *this,

 {
crypt_local_t *local = NULL;
-   local = mem_get0(this->local_pool);
+local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local);
local is using the memory from pool earlier(i.e. with mem_get0()). 
Which seems ok to me. Changing it this way will include memory 
allocation in fop I/O path which is why xlators generally use the 
mem-pool approach.


Pranith


I think, crypt xlator should do a mem_put of local after doing 
STACK_UNWIND like other xlators which also use mem_get for local (such 
as AFR). I am suspecting crypt not doing mem_put might be the reason for 
the bug mentioned.


Regards,
Raghavendra Bat


if (!local) {
gf_log(this->name, GF_LOG_ERROR, "out of memory");
return NULL;


Niels should be able to recognize if this is sufficient fix or not.

Thanks,
Raghavendra Talur

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift


___
Gluster-devel mailing list
Gluster-devel@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-devel




--
*Raghavendra Talur *





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.6.3beta2 released

2015-04-01 Thread Raghavendra Bhat

Hi

glusterfs-3.6.3beta2 has been released and can be found here.
http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.6.3beta2/

This beta release supposedly fixes the bugs listed below since 
3.6.3beta1 was made available. Thanks to all who submitted the patches, 
reviewed the changes.



1187526 - Disperse volume mounted through NFS doesn't list any 
files/directories
1188471 - When the volume is in stopped state/all the bricks are down 
mount of the volume hangs
1201484 - glusterfs-3.6.2 fails to build on Ubuntu Precise: 
'RDMA_OPTION_ID_REUSEADDR' undeclared

1202212 - Performance enhancement for RDMA
1189023 - Directories not visible anymore after add-brick, new brick 
dirs not part of old bricks

1202673 - Perf: readdirp in replicated volumes causes performance degrade
1203081 - Entries in indices/xattrop directory not removed appropriately
1203648 - Quota: Build ancestry in the lookup
1199936 - readv on /var/run/6b8f1f2526c6af8a87f1bb611ae5a86f.socket 
failed when NFS is disabled

1200297 - cli crashes when listing quota limits with xml output
1201622 - Convert quota size from n-to-h order before using it
1194141 - AFR : failure in self-heald.t
1201624 - Spurious failure of tests/bugs/quota/bug-1038598.t
1194306 - Do not count files which did not need index heal in the first 
place as successfully healed
1200258 - Quota: features.quota-deem-statfs is "on" even after disabling 
quota.

1165938 - Fix regression test spurious failures
1197598 - NFS logs are filled with system.posix_acl_access messages
1199577 - mount.glusterfs uses /dev/stderr and fails if the device does 
not exist

1197598 - NFS logs are filled with system.posix_acl_access messages
1188066 - logging improvements in marker translator
1191537 - With afrv2 + ext4, lookups on directories with large offsets 
could result in duplicate/missing entries

1165129 - libgfapi: use versioned symbols in libgfapi.so for compatibility
1179136 - glusterd: Gluster rebalance status returns failure
1176756 - glusterd: remote locking failure when multiple synctask 
transactions are run
1188064 - log files get flooded when removexattr() can't find a 
specified key or value

1165938 - Fix regression test spurious failures
1192522 - index heal doesn't continue crawl on self-heal failure
1193970 - Fix spurious ssl-authz.t regression failure (backport)


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [PATCH ANNOUNCE] BitRot : Object signing

2015-02-19 Thread Raghavendra Bhat


Hi,

These are the patches.

http://review.gluster.org/#/c/9705/
http://review.gluster.org/#/c/9706/
http://review.gluster.org/#/c/9707/
http://review.gluster.org/#/c/9708/
http://review.gluster.org/#/c/9709/
http://review.gluster.org/#/c/9710/
http://review.gluster.org/#/c/9711/
http://review.gluster.org/#/c/9712/

Regards,
Raghavendra Bhat

On Thursday 19 February 2015 07:34 PM, Venky Shankar wrote:

Hi folks,

Listed below is the initial patchset for the upcoming bitrot detection 
feature targeted for GlusterFS 3.7. As of now, these set of patches 
implement object signing. Myself and Raghavendra (rabhat@) are working 
on pending items (scrubber, etc..) and would be sending those patches 
shortly. Since this is the initial patch set, it might be prone to 
bugs (as we speak rabhat@ is chasing a memory leak :-)).


There is an upcoming event on Google+ Hangout regarding bitrot on 
Tuesday, 24th March. The hangout session would cover implementation 
details (algorithm, flow, etc..) and would be beneficial for anyone 
from code reviewers, users or generally interested parties. Please 
plan attend if possible: http://goo.gl/dap9rF


As usual, comments/suggestions are more than welcome.

Thanks,
Venky (overclk on #freenode)



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.6.3beta1 released

2015-02-13 Thread Raghavendra Bhat

Hi

glusterfs-3.6.3beta1 has been released and can be found here.
http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.6.3beta1/

This beta release supposedly fixes the bugs listed below since 3.6.2 was 
made available. Thanks to all who submitted the patches, reviewed the 
changes.


1138897 - NetBSD port
1184527 - Some newly created folders have root ownership although 
created by unprivileged user

1181977 - gluster vol clear
1159471 - rename operation leads to core dump
1173528 - Change in volume heal info command output
1186119 - tar on a gluster directory gives message "file changed as we 
read it" even though no updates to file in progress

1183716 - Force replace
1178590 - Enable quota(default) leads to heal directory's xattr failed.
1182490 - Internal ec xattrs are allowed to be modified
1187547 - self-heal-algorithm with option "full" doesn't heal sparse 
files correctly
1174170 - Glusterfs outputs a lot of warnings and errors when quota is 
enabled
1186119 - tar on a gluster directory gives message "file changed as we 
read it" even though no updates to file in progress


Regards,
Raghavendra Bhat

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.6.2 volume heal

2015-02-02 Thread Raghavendra Bhat

On Monday 02 February 2015 09:07 PM, David F. Robinson wrote:
I upgraded one of my bricks from 3.6.1 to 3.6.2 and I can no longer do 
a 'gluster volume heal homegfs info'.  It hangs and never returns any 
information.
I was trying to ensure that gfs01a had finished healing before 
upgrading the other machines (gfs01b, gfs02a, gfs02b) in my 
configuration (see below).

'gluster volume homegfs statistics' still works fine.
Do I need to upgrade my other bricks to get the 'gluster volume heal 
homegfs info' working?  Or, should I fix this issue before upgrading 
my other machines?

Volume Name: homegfs
Type: Distributed-Replicate
Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
Options Reconfigured:
performance.io-thread-count: 32
performance.cache-size: 128MB
performance.write-behind-window-size: 128MB
server.allow-insecure: on
network.ping-timeout: 10
storage.owner-gid: 100
geo-replication.indexing: off
geo-replication.ignore-pid-check: on
changelog.changelog: on
changelog.fsync-interval: 3
changelog.rollover-time: 15
server.manage-gids: on


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


CCing Pranith, the maintainer of replicate. In the meantime can you 
please provide the logs from the machine where you have upgraded?


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] netbsd build failure

2015-01-23 Thread Raghavendra Bhat


Hi Emmanuel,

You have mentioned that patch http://review.gluster.org/#/c/9469/ breaks 
the build on netbsd. Where is it failing?
I tried to check the link for netbsd tests for the patch 
(http://build.gluster.org/job/netbsd6-smoke/2431/). But I got the below 
error.



 Status Code: 404

Exception:
Stacktrace:

(none)


Regards,
Raghavendra Bhat

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] managing of THIS

2015-01-21 Thread Raghavendra Bhat


Hi,

In glusterfs at the time of the process coming up, it creates 5 pthread 
keys (for saving THIS, syncop, uuid buf, lkowner buf and syncop ctx).  
Even gfapi does the same thing in its glfs_new function. But with User 
Serviceable Snapshots (where a glusterfs process spawns multiple gfapi 
instances for a snapshot) this will lead to more and more consumption of 
pthread keys. In fact the old keys are lost (as same variables are used 
for creating the keys) and eventually the process will run out of 
pthread keys after 203 snapshots (maximum allowed number is 1024 per 
process.). So to avoid it, pthread keys creation can be done only once 
(using pthread_once in which, the globals_init function is called).


But now a new problem arises. Say, from snapview-server xlator glfs_new 
was called (or glfs_init etc). Now gfapi wants calls THIS for some of 
its operations such properly accounting the memory within the xlator 
while allocating a new structure. But when gfapi calls THIS it gets 
snapview-server's pointer. Since snapview-server does not know about 
gfapi's internal structure, it asserts at the time of allocating.


For now, a patch has been sent to handle the issue by turning off 
memory-accounting for snapshot daemon.

(http://review.gluster.org/#/c/9430).

But if memory-accounting has to be turned on for snapshot daemon, then 
the above problem has to be fixed.

2 ways that can be used for fixing the issue are:

1) Add the datastructures that are used by gfapi to libglusterfs (and 
hence their mem-types as well), so that any xlator that is calling gfapi 
functions (such as snapview-server as of now) will be aware of the 
memory types used by gfapi and hence will not cause problems, when 
memory accounting has to be done as part of allocations and frees.


OR

2) Properly manage THIS by introducing a new macro similar to STACK_WIND 
(for now it can be called STACK_API_WIND). The macro will be much 
simpler than STACK_WIND as it need not create new frames before handing 
over the call to the next layer. Before handing over the call to gfapi 
(any call, such as glfs_new or fops such as glfs_h_open), saves THIS in 
a variable and calls the gfapi function given as an argument. After the 
function returns it again sets THIS back the value before the gfapi 
function was called.


Ex:

STACK_API_WIND (this, fn, ret, params)
do {
xlator_t *old_THIS = NULL;

old_THIS = this;
ret = fn (params);
THIS = old_THIS;
} while (0);

a caller (as of now snapview-server xlator) would call the macro like this
STACK_API_WIND (this, glfs_h_open, glfd, fs, object, flags);


Please provide feedback and any suggestions or solutions to handle the 
mentioned issue are welcome.


Regards,
Raghavendra Bhat

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.6.2beta2

2015-01-16 Thread Raghavendra Bhat


Hi

glusterfs-3.6.2beta2 has been released and can be found here.
http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.6.2beta2/


This beta release supposedly fixes the bugs listed below 3.6.2beta1 was 
made available. Thanks to all who submitted the patches, reviewed the 
changes.



 1180404 - nfs server restarts when a snapshot is deactivated
 1180411 - CIFS:[USS]: glusterfsd OOM killed when 255 snapshots were 
browsed at CIFS mount and Control+C is issued
 1180070 - [AFR] getfattr on fuse mount gives error : Software caused 
connection abort

 1175753 - [readdir-ahead]: indicate EOF for readdirp
 1175752 - [USS]: On a successful lookup, snapd logs are filled with 
Warnings "dict OR key (entry-point) is NULL"

 1175749 - glusterfs client crashed while migrating the fds
 1179658 - Add brick fails if parent dir of new brick and existing 
brick is same and volume was accessed using libgfapi and smb.
 1146524 - glusterfs.spec.in - synch minor diffs with fedora dist-git 
glusterfs.spec
 1175744 - [USS]: Unable to access .snaps after snapshot restore after 
directories were deleted and recreated
 1175742 - [USS]: browsing .snaps directory with CIFS fails with 
"Invalid argument"
 1175739 - [USS]: Non root user who has no access to a directory, from 
NFS mount, is able to access the files under .snaps under that directory
 1175758 - [USS] : Rebalance process tries to connect to snapd and in 
case when snapd crashes it might affect rebalance process
 1175765 - USS]: When snapd is crashed gluster volume stop/delete 
operation fails making the cluster in inconsistent state

 1173528 - Change in volume heal info command output
 1166515 - [Tracker] RDMA support in glusterfs
 1166505 - mount fails for nfs protocol in rdma volumes
 1138385 - [DHT:REBALANCE]: Rebalance failures are seen with error 
message " remote operation failed: File exists"

 1177418 - entry self-heal in 3.5 and 3.6 are not compatible
 1170954 - Fix mutex problems reported by coverity scan
 1177899 - nfs: ls shows "Permission denied" with root-squash
 1175738 - [USS]: data unavailability for a period of time when USS is 
enabled/disabled
 1175736 - [USS]:After deactivating a snapshot trying to access the 
remaining activated snapshots from NFS mount gives 'Invalid argument' error

 1175735 - [USS]: snapd process is not killed once the glusterd comes back
 1175733 - [USS]: If the snap name is same as snap-directory than cd to 
virtual snap directory fails
 1175756 - [USS] : Snapd crashed while trying to access the snapshots 
under .snaps directory
 1175755 - SNAPSHOT[USS]:gluster volume set for uss doesnot check any 
boundaries
 1175732 - [SNAPSHOT]: nouuid is appended for every snapshoted brick 
which causes duplication if the original brick has already nouuid
 1175730 - [USS]: creating file/directories under .snaps shows wrong 
error message
 1175754 - [SNAPSHOT]: before the snap is marked to be deleted if the 
node goes down than the snaps are propagated on other nodes and glusterd 
hungs

 1159484 - ls -alR can not heal the disperse volume
 1138897 - NetBSD port
 1175728 - [USS]: All uss related logs are reported under 
/var/log/glusterfs, it makes sense to move it into subfolder

 1170548 - [USS] : don't display the snapshots which are not activated
 1170921 - [SNAPSHOT]: snapshot should be deactivated by default when 
created
 1175694 - [SNAPSHOT]: snapshoted volume is read only but it shows rw 
attributes in mount

 1161885 - Possible file corruption on dispersed volumes
 1170959 - EC_MAX_NODES is defined incorrectly
 1175645 - [USS]: Typo error in the description for USS under "gluster 
volume set help"

 1171259 - mount.glusterfs does not understand -n option

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] bit rot

2015-01-06 Thread Raghavendra Bhat


Hi,

As per the design dicussion it was mentioned that, there will be one 
BitD running per node which will take care of all the bricks of all the 
volumes running on that node. But, here once thing that becomes 
important is doing graph changes for the BitD process upon 
enabling/disabling of bit-rot functionality for the volumes. With more 
and more graph changes, there is more chance of BitD running out of 
memory (as of now the older graphs in glusterfs are not cleaned up).


So for now it will be better to have one BitD per volume per node. In 
this case, there will not be graph changes in BitD. It will be started 
for a volume upon enabling bit-rot functionality for that volume and 
will be brought down when bit-rot is disabled for a volume.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] handling statfs call in USS

2015-01-05 Thread Raghavendra Bhat

On Monday 29 December 2014 01:19 PM, RAGHAVENDRA TALUR wrote:

On Sun, Dec 28, 2014 at 5:03 PM, Vijay Bellur  wrote:

On 12/24/2014 02:30 PM, Raghavendra Bhat wrote:


Hi,

I have a doubt. In user serviceable snapshots as of now statfs call is
not implemented. There are 2 ways how statfs can be handled.

1) Whenever snapview-client xlator gets statfs call on a path that
belongs to snapshot world, it can send the
statfs call to the main volume itself, with the path and the inode being
set to the root of the main volume.

OR

2) It can redirect the call to the snapshot world (the snapshot demon
which talks to all the snapshots of that particular volume) and send
back the reply that it has obtained.


Each entry in .snaps can be thought of as a specially mounted read-only
filesystem and doing a statfs in such a filesystem should generate
statistics associated with that. So approach 2. seems more appropriate.

I agree with Vijay here. Treating each entry in .snaps as a specially mounted
read-only filesystem will be required to send proper error codes to Samba.


Yeah makes sense. But one challenge is if someone does statfs on .snaps 
directory itself, then
what should be done? Because .snaps is a virtual directory. I can think 
of 2 ways
1) Make snapview-server xlator return 0s when it receives statfs on 
.snaps so that the o/p is similar the one that is obtained when statfs 
is done on /proc

OR if the above o/p is not right,
2) If statfs comes on .snaps, then wind the call to regular volume 
itself. Anything beyond .snaps will be sent to the snapshot world.


Regards,
Raghavendra Bhat


-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] 3.6.2beta1

2014-12-25 Thread Raghavendra Bhat

On Friday 26 December 2014 12:22 PM, Raghavendra Bhat wrote:


Hi,

glusterfs-3.6.2beta1 has been released and the rpms can be found here.


Regards,
Raghavendra Bhat
___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Oops. Sorry. Missed the link

 http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.6.2beta1/


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] 3.6.2beta1

2014-12-25 Thread Raghavendra Bhat


Hi,

glusterfs-3.6.2beta1 has been released and the rpms can be found here.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] handling statfs call in USS

2014-12-24 Thread Raghavendra Bhat


Hi,

I have a doubt. In user serviceable snapshots as of now statfs call is 
not implemented. There are 2 ways how statfs can be handled.


1) Whenever snapview-client xlator gets statfs call on a path that 
belongs to snapshot world, it can send the
statfs call to the main volume itself, with the path and the inode being 
set to the root of the main volume.


OR

2) It can redirect the call to the snapshot world (the snapshot demon 
which talks to all the snapshots of that particular volume) and send 
back the reply that it has obtained.


Please provide feedback.

Regards,
Raghavendra Bhat

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] explicit lookup of inods linked via readdirp

2014-12-23 Thread Raghavendra Bhat

On Thursday 18 December 2014 12:58 PM, Raghavendra Gowdappa wrote:


- Original Message -

From: "Raghavendra Bhat" 
To: "Gluster Devel" 
Cc: "Anand Avati" 
Sent: Thursday, December 18, 2014 12:31:41 PM
Subject: [Gluster-devel] explicit lookup of inods linked via readdirp


Hi,

In fuse I saw, that as part of resolving a inode, an explicit lookup is
done on it if the inode is found to be linked via readdirp (At the time
of linking in readdirp, fuse sets a flag in the inode context). It is
done because,  many xlators such as afr depend upon lookup call for many
things such as healing.

Yes. But the lookup is a nameless lookup and hence is not sufficient enough. 
Some of the functionalities that get affected AFAIK are:
1. dht cannot create/heal directories and their layouts.
2. afr cannot identify gfid mismatch of a file across its subvolumes, since to 
identify a gfid mismatch we need a name.

 From what I heard, afr relies on crawls done by self-heal daemon for 
named-lookups. But dht is worst hit in terms of maintaining directory structure 
on newly added bricks (this problem is  slightly different, since we don't hit 
this because of nameless lookup after readdirp. Instead it is because of a lack 
of named-lookup on the file after a graph switch. Neverthless I am clubbing 
both because a named lookup would've solved the issue). I've a feeling that 
different components have built their own way of handling what is essentially 
same issue. Its better we devise a single comprehensive solution.


But that logic is not there in gfapi. I am thinking of introducing that
mechanism in gfapi as well, where as part of resolve it checks if the
inode is linked from readdirp. And if so it will do an explicit lookup
on that inode.

As you've mentioned a lookup gives a chance to afr to heal the file. So, its 
needed in gfapi too. However you've to speak to afr folks to discuss whether 
nameless lookup is sufficient enough.


As per my understanding, this change in gfapi creates same chances as 
that of fuse. When I tried with fuse, where I had a file that need to be 
healed, doing ls, and cat file actually triggered a selfheal on it. So 
even with gfapi, the change creates same chances of healing as that of fuse.


Regards,
Raghavendra Bhat



NOTE: It can be done in NFS server as well.

Dht in NFS setup is also hit because of lack of named-lookups resulting in 
non-healing of directories on newly added brick.


Please provide feedback.

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] patches for 3.6.2

2014-12-22 Thread Raghavendra Bhat

On Tuesday 23 December 2014 11:09 AM, Atin Mukherjee wrote:

Can you please take in http://review.gluster.org/#/c/9328/ for 3.6.2?

~Atin

On 12/19/2014 02:05 PM, Raghavendra Bhat wrote:

Hi,

glusterfs-3.6.2beta1 has been released. I am planning to make 3.6.2
before end of this year. If there are some patches that has to go in for
3.6.2, please send them by EOD 23-12-2014 (i.e. coming Tuesday) so that
I can make a 3.6.2 release sooner.

As of now, these are the bugs in new or assigned state.
https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&classification=Community&f1=blocked&list_id=3106878&o1=substring&product=GlusterFS&query_format=advanced&v1=1163723



Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Sure. Will do it.

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] patches for 3.6.2

2014-12-19 Thread Raghavendra Bhat


Hi,

glusterfs-3.6.2beta1 has been released. I am planning to make 3.6.2 
before end of this year. If there are some patches that has to go in for 
3.6.2, please send them by EOD 23-12-2014 (i.e. coming Tuesday) so that 
I can make a 3.6.2 release sooner.


As of now, these are the bugs in new or assigned state.
https://bugzilla.redhat.com/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&classification=Community&f1=blocked&list_id=3106878&o1=substring&product=GlusterFS&query_format=advanced&v1=1163723


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.6.1 issue

2014-12-18 Thread Raghavendra Bhat

On Tuesday 16 December 2014 10:59 PM, David F. Robinson wrote:
Gluster 3.6.1 seems to be having an issue creating symbolic links.  To 
reproduce this issue, I downloaded the file 
dakota-6.1-public.src_.tar.gz from

https://dakota.sandia.gov/download.html
# gunzip dakota-6.1-public.src_.tar.gz
# tar -xf dakota-6.1-public.src_.tar
# cd dakota-6.1.0.src/examples/script_interfaces/TankExamples/DakotaList
# ls -al
*_### Results from my old storage system (non gluster)_*
corvidpost5:TankExamples/DakotaList> ls -al
total 12
drwxr-x--- 2 dfrobins users  112 Dec 16 12:12 ./
drwxr-x--- 6 dfrobins users  117 Dec 16 12:12 ../
*lrwxrwxrwx 1 dfrobins users   25 Dec 16 12:12 EvalTank.py -> 
../tank_model/EvalTank.py*
lrwxrwxrwx 1 dfrobins users   24 Dec 16 12:12 FEMTank.py -> 
../tank_model/FEMTank.py*

-rwx--x--- 1 dfrobins users  734 Nov  7 11:05 RunTank.sh*
-rw--- 1 dfrobins users 1432 Nov  7 11:05 dakota_PandL_list.in
-rw--- 1 dfrobins users 1860 Nov  7 11:05 dakota_Ponly_list.in
*_### Results from gluster (broken links that have no permissions)_*
corvidpost5:TankExamples/DakotaList> ls -al
total 5
drwxr-x--- 2 dfrobins users  166 Dec 12 08:43 ./
drwxr-x--- 6 dfrobins users  445 Dec 12 08:43 ../
*-- 1 dfrobins users0 Dec 12 08:43 EvalTank.py
-- 1 dfrobins users0 Dec 12 08:43 FEMTank.py*
-rwx--x--- 1 dfrobins users  734 Nov  7 11:05 RunTank.sh*
-rw--- 1 dfrobins users 1432 Nov  7 11:05 dakota_PandL_list.in
-rw--- 1 dfrobins users 1860 Nov  7 11:05 dakota_Ponly_list.in
===
David F. Robinson, Ph.D.
President - Corvid Technologies
704.799.6944 x101 [office]
704.252.1310 [cell]
704.799.7974 [fax]
david.robin...@corvidtec.com <mailto:david.robin...@corvidtec.com>
http://www.corvidtechnologies.com


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Hi David,

Can you please provide the log files? You can find them in 
/var/log/glusterfs.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] explicit lookup of inods linked via readdirp

2014-12-17 Thread Raghavendra Bhat


Hi,

In fuse I saw, that as part of resolving a inode, an explicit lookup is 
done on it if the inode is found to be linked via readdirp (At the time 
of linking in readdirp, fuse sets a flag in the inode context). It is 
done because,  many xlators such as afr depend upon lookup call for many 
things such as healing.


But that logic is not there in gfapi. I am thinking of introducing that 
mechanism in gfapi as well, where as part of resolve it checks if the 
inode is linked from readdirp. And if so it will do an explicit lookup 
on that inode.


NOTE: It can be done in NFS server as well.

Please provide feedback.

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] telldir/seekdir portability fixes

2014-12-17 Thread Raghavendra Bhat

On Wednesday 17 December 2014 04:22 PM, Emmanuel Dreyfus wrote:

Raghavendra Bhat  wrote:


I tried to push the above patch. But it failed with merge conflict. Can
you please rebase and sent it?

Done, it is passing regression tests right now.



I have pushed the change.

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] telldir/seekdir portability fixes

2014-12-17 Thread Raghavendra Bhat

On Wednesday 17 December 2014 02:21 PM, Emmanuel Dreyfus wrote:

Hello

Any chance http://review.gluster.org/9071 gets merged (and
http://review.gluster.org/9084 for release-3.6)? It has been waiting for
review for more than a month now.
I tried to push the above patch. But it failed with merge conflict. Can 
you please rebase and sent it?


Regards,
Raghavendra Bhat


This is the remaining of a fix that has been partially done in
http://review.gluster.org/8933, and that one has been operating without
a hitch for a while.

Without the fix, self heal breaks on NetBSD if it needs to iterate on a
directory (that is: content is more than 128k). That is a big roadblock.



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] key for getting real file name

2014-12-01 Thread Raghavendra Bhat


Hi,

As per my understanding samba calls get_real_filename API to get the 
actual case sensitive name of the entry. And get_real_filename calls 
getxattr call with a key get_real_filename:.


When I checked the glusterfs plugin's code in samba it used the key 
"user.glusterfs.get_real_filename:" key. In glusterfs (posix and DHT) 
upon getting getxattr call, to check if its get_real_filename request we 
make use of the key "glusterfs.get_real_filename:".


They are not same. Is it supposed to be like that or should we use the 
same key in both the places? (i.e. samba plugin and glusterfs)



Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] snapshot restore and USS

2014-12-01 Thread Raghavendra Bhat

On Monday 01 December 2014 04:51 PM, Raghavendra G wrote:



On Fri, Nov 28, 2014 at 6:48 PM, RAGHAVENDRA TALUR 
mailto:raghavendra.ta...@gmail.com>> wrote:


On Thu, Nov 27, 2014 at 2:59 PM, Raghavendra Bhat
mailto:rab...@redhat.com>> wrote:
> Hi,
>
> With USS to access snapshots, we depend on last snapshot of the
volume (or
> the latest snapshot) to resolve some issues.
> Ex:
> Say there is a directory called "dir" within the root of the
volume and USS
> is enabled. Now when .snaps is accessed from "dir" (i.e.
/dir/.snaps), first
> a lookup is sent on /dir which snapview-client xlator passes
onto the normal
> graph till posix xlator of the brick. Next the lookup comes on
/dir/.snaps.
> snapview-client xlator now redirects this call to the snap
daemon (since
> .snaps is a virtual directory to access the snapshots). The
lookup comes to
> snap daemon with parent gfid set to the gfid of "/dir" and the
basename
> being set to ".snaps". Snap daemon will first try to resolve the
parent gfid
> by trying to find the inode for that gfid. But since that gfid
was not
> looked up before in the snap daemon, it will not be able to find
the inode.
> So now to resolve it, snap daemon depends upon the latest
snapshot. i.e. it
> tries to look up the gfid of /dir in the latest snapshot and if
it can get
> the gfid, then lookup on /dir/.snaps is also successful.

From the user point of view, I would like to be able to enter into the
.snaps anywhere.
To be able to do that, we can turn the dependency upside down, instead
of listing all
snaps in the .snaps dir, lets just show whatever snapshots had
that dir.


Currently readdir in snap-view server is listing _all_ the snapshots. 
However if you try to do "ls" on a snapshot which doesn't contain this 
directory (say dir/.snaps/snap3), I think it returns ESTALE/ENOENT. 
So, to get what you've explained above, readdir(p) should filter out 
those snapshots which doesn't contain this directory (to do that, it 
has to lookup dir on each of the snapshots).


Raghavendra Bhat explained the problem and also a possible solution to 
me in person. There are some pieces missing in the problem description 
as explained in the mail (but not in the discussion we had). The 
problem explained here occurs  when you restore a snapshot (say snap3) 
where the directory got created, but deleted before next snapshot. So, 
directory doesn't exist in snap2 and snap4, but exists only in snap3. 
Now, when you restore snap3, "ls" on dir/.snaps should show nothing. 
Now, what should be result of lookup (gfid-of-dir, ".snaps") should be?


1. we can blindly return a virtual inode, assuming there is atleast 
one snapshot contains dir. If fops come on specific snapshots (eg., 
dir/.snaps/snap4), they'll anyways fail with ENOENT (since dir is not 
present on any snaps).
2. we can choose to return ENOENT if we figure out that dir is not 
present on any snaps.


The problem we are trying to solve here is how to achieve 2. One 
simple solution is to lookup for  on all the snapshots 
and if every lookup fails with ENOENT, we can return ENOENT. The other 
solution is to just lookup in snapshots before and after (if both are 
present, otherwise just in latest snapshot). If both fail, then we can 
be sure that no snapshots contain that directory.


Rabhat, Correct me if I've missed out anything :).




If a readdir on .snaps entered from a non root directory has to show the 
list of only those snapshots where the directory (or rather gfid of the 
directory) is present, then the way to achieve will be bit costly.


When readdir comes on .snaps entered from a non root directory (say ls 
/dir/.snaps), following operations have to be performed
1) In a array we have the names of all the snapshots. So, do a nameless 
lookup on the gfid of /dir on all the snapshots
2) Based on which snapshots have sent success to the above lookup, build 
a new array or list of snapshots.

3) Then send the above new list as the readdir entries.

But the above operation it costlier. Because, just to serve one readdir 
request we have to make a lookup on each snapshot (if there are 256 
snapshots, then we have to make 256 lookup calls via network).


One more thing is resource usage. As of now any snapshot will be initied 
(i.e. via gfapi a connection is established with the corresponding 
snapshot volume, which is equivalent to a mounted volume.) when that 
snapshot is accessed (from fops point of view a lookup comes on the 
snapshot entry, say "ls /dir/.snaps/snap1"). Now to serve readdir all 
the snapshots will be  accessed and all the snapshots are initialized. 
This means there can be 256 instances of gfapi co

[Gluster-devel] snapshot restore and USS

2014-11-27 Thread Raghavendra Bhat

Hi,

With USS to access snapshots, we depend on last snapshot of the volume 
(or the latest snapshot) to resolve some issues.

Ex:
Say there is a directory called "dir" within the root of the volume and 
USS is enabled. Now when .snaps is accessed from "dir" (i.e. 
/dir/.snaps), first a lookup is sent on /dir which snapview-client 
xlator passes onto the normal graph till posix xlator of the brick. Next 
the lookup comes on /dir/.snaps. snapview-client xlator now redirects 
this call to the snap daemon (since .snaps is a virtual directory to 
access the snapshots). The lookup comes to snap daemon with parent gfid 
set to the gfid of "/dir" and the basename being set to ".snaps". Snap 
daemon will first try to resolve the parent gfid by trying to find the 
inode for that gfid. But since that gfid was not looked up before in the 
snap daemon, it will not be able to find the inode. So now to resolve 
it, snap daemon depends upon the latest snapshot. i.e. it tries to look 
up the gfid of /dir in the latest snapshot and if it can get the gfid, 
then lookup on /dir/.snaps is also successful.


But, there can be some confusion in the case of snapshot restore. Say 
there are 5 snapshots (snap1, snap2, snap3, snap4, snap5) for a volume 
vol. Now say the volume is restored to snap3. If there was a directory 
called
"/a" at the time of taking snap3 and was later removed, then after 
snapshot restore accessing .snaps from that directory (in fact all the 
directories which were present while taking snap3) might cause problems. 
Because now the original volume is nothing but the snap3 and snap daemon 
when gets the lookup on "/a/.snaps", it tries to find the gfid of "/a" 
in the latest snapshot (which is snap5) and if a was removed after 
taking snap3, then the lookup of "/a" in snap5 fails and thus the lookup 
of "/a/.snaps" will also fail.



Possible Solution:
One of the possible solution that can be helpful in this case is, 
whenever glusterd sends the list of snapshots to snap daemon after 
snapshot restore, send the list in such a way that the snapshot which is 
previous to the restored snapshot is sent as the latest snapshot (in the 
example above, since snap3 is restored, glusterd should send snap2 as 
the latest snapshot to snap daemon).


But in the above solution also, there is a problem. If there are only 2 
snapshots (snap1, snap2) and the volume is restored to the first 
snapshot (snap1), there is no previous snapshot to look at. And glusterd 
will send only one name in the list which is snap2 but it is in a future 
state than the volume.


A patch has been submitted for the review to handle this 
(http://review.gluster.org/#/c/9094/).
And in the patch because of the above confusions snapd tries to consult 
the adjacent snapshots  of the restored snapshot to resolve the gfids. 
As per the 5 snapshots example, it tries to look at snap2 and snap4 
(i.e. look into snap2 first, if it fails then look into snap4). If there 
is no previous snapshot, then look at the next snapshot (2 snapshots 
example). If there is no next snapshot, then look at the previous snapshot.


Please provide feed back about how this issue can be handled.

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Wanted: maintainer for release-3.6!

2014-10-28 Thread Raghavendra Bhat

On Monday 27 October 2014 06:34 PM, Vijay Bellur wrote:

Hi All,

As we move closer to the release of 3.6.0, we are looking to add a 
release maintainer for the 3.6.0 development branch (release-3.6). 
Primary requirements for being this release maintainer would include:


1. Ensure regular minor releases of 3.6
2. Actively manage patches in the release-3.6 queue in conjunction 
with component maintainers

3. Be very passionate about quality of release-3.6

In short, the release maintainer gets to influence the destiny of 3.6 
to a very great extent :).


Please let me know if you are interested in taking up this opportunity.

Thanks,
Vijay


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


I am in.

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] if/else coding style :-)

2014-10-13 Thread Raghavendra Bhat

On Monday 13 October 2014 05:31 PM, Pranith Kumar Karampuri wrote:

hi,
 Why are we moving away from this coding style?:
if (x) {
/*code*/
} else {
/* code */
}

Pranith


For me the script that checks the coding style (checkpatch.pl - which is 
present in the extras directory within the glusterfs repo) gave the 
above style as an error when I executed rfc.sh for submitting the patch. 
And the patch will not be submitted to gerrit if errors are reported by 
the script.



Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] documentation on inode and dentry management

2014-09-23 Thread Raghavendra Bhat


Hi,

I have sent a patch to add the info on how glusterfs manages inodes and 
dentries.

http://review.gluster.org/#/c/8815/

Please review it and provide feedback to improve it.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Patches to be merged before 3.6 branching

2014-07-14 Thread Raghavendra Bhat

On Monday 14 July 2014 07:33 PM, Vijay Bellur wrote:

Hi All,

I intend creating the 3.6 branch tomorrow. After that, the branch will 
be restricted to bug fixes only. If you have any major patches to be 
reviewed and merged for release-3.6, please update this thread.


Thanks,
Vijay


I have 2 patches for review.

[1] has got +1 and passed the regression. It has to be merged if it 
looks fine.

[2] has passed the regression tests but need to be reviewed.

[1]: http://review.gluster.org/#/c/8230/
[2]: http://review.gluster.org/#/c/8150/

Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Reviews for User Serviceable Snapshot related patches

2014-07-08 Thread Raghavendra Bhat


Hi,

I have submitted 2 patches for improving User Serviceable Snapshots 
experience. Both of them have passed regression tests.


[1]  It contains changes which improves snapview-server's compatibility 
with NFS server. Changes include these:
   A) Make snapview-server xlator handle listxattr calls properly so 
that nfs can get the acls list.
   B) Changes in NFS server to link inodes to the inode table as part 
of readdirp, so that the virtual inodes (along with the gfids) generated 
as part of readdirp are not lost.


[2] Contains changes done in snapview-server xlator to register a rpc 
callback with glusterd, so that glusterd can notify snapview-server 
whenever new snapshots are created and snapview-server can configure its 
list of snapshots properly, instead of asking glusterd the list of 
snapshots at regular intervals.


Please review those patches and provide feedback.

[1] http://review.gluster.org/#/c/8230/
[2] http://review.gluster.org/#/c/8150/


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] inode linking in GlusterFS NFS server

2014-07-07 Thread Raghavendra Bhat

On Tuesday 08 July 2014 01:21 AM, Anand Avati wrote:
On Mon, Jul 7, 2014 at 12:48 PM, Raghavendra Bhat <mailto:rab...@redhat.com>> wrote:



Hi,

As per my understanding nfs server is not doing inode linking in
readdirp callback. Because of this there might be some errors
while dealing with virtual inodes (or gfids). As of now meta,
gfid-access and snapview-server (used for user serviceable
snapshots) xlators makes use of virtual inodes with random gfids.
The situation is this:

Say User serviceable snapshot feature has been enabled and there
are 2 snapshots ("snap1" and "snap2"). Let /mnt/nfs be the nfs
mount. Now the snapshots can be accessed by entering .snaps
directory.  Now if snap1 directory is entered and *ls -l* is done
(i.e. "cd /mnt/nfs/.snaps/snap1" and then "ls -l"),  the readdirp
fop is sent to the snapview-server xlator (which is part of a
daemon running for the volume), which talks to the corresponding
snapshot volume and gets the dentry list. Before unwinding it
would have generated random gfids for those dentries.

Now nfs server upon getting readdirp reply, will associate the
gfid with the filehandle created for the entry. But without
linking the inode, it would send the readdirp reply back to nfs
client. Now next time when nfs client makes some operation on one
of those filehandles, nfs server tries to resolve it by finding
the inode for the gfid present in the filehandle. But since the
inode was not linked in readdirp, inode_find operation fails and
it tries to do a hard resolution by sending the lookup operation
on that gfid to the normal main graph. (The information on whether
the call should be sent to main graph or snapview-server would be
present in the inode context. But here the lookup has come on a
gfid with a newly created inode where the context is not there
yet. So the call would be sent to the main graph itself). But
since the gfid is a randomly generated virtual gfid (not present
on disk), the lookup operation fails giving error.

As per my understanding this can happen with any xlator that deals
with virtual inodes (by generating random gfids).

I can think of these 2 methods to handle this:
1)  do inode linking for readdirp also in nfs server
2)  If lookup operation fails, snapview-client xlator (which
actually redirects the fops on snapshot world to snapview-server
by looking into the inode context) should check if the failed
lookup is a nameless lookup. If so, AND the gfid of the inode is
NULL AND lookup has come from main graph, then instead of
unwinding the lookup with failure, send it to snapview-server
which might be able to find the inode for the gfid (as the gfid
was generated by itself, it should be able to find the inode for
that gfid unless and until it has been purged from the inode table).


Please let me know if I have missed anything. Please provide feedback.



That's right. NFS server should be linking readdirp_cbk inodes just 
like FUSE or protocol/server. It has been OK without virtual gfids 
thus far.


I did the changes to link inodes in readdirp_cbk in nfs server. It seems 
to work fine. Should we need the second change also? (i.e chage in the 
snapview-client to redirect the fresh nameless lookups to 
snapview-server). With nfs server linking the inodes in readdirp, I 
think second change might not be needed.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] inode linking in GlusterFS NFS server

2014-07-07 Thread Raghavendra Bhat


Hi,

As per my understanding nfs server is not doing inode linking in 
readdirp callback. Because of this there might be some errors while 
dealing with virtual inodes (or gfids). As of now meta, gfid-access and 
snapview-server (used for user serviceable snapshots) xlators makes use 
of virtual inodes with random gfids. The situation is this:


Say User serviceable snapshot feature has been enabled and there are 2 
snapshots ("snap1" and "snap2"). Let /mnt/nfs be the nfs mount. Now the 
snapshots can be accessed by entering .snaps directory.  Now if snap1 
directory is entered and *ls -l* is done (i.e. "cd 
/mnt/nfs/.snaps/snap1" and then "ls -l"),  the readdirp fop is sent to 
the snapview-server xlator (which is part of a daemon running for the 
volume), which talks to the corresponding snapshot volume and gets the 
dentry list. Before unwinding it would have generated random gfids for 
those dentries.


Now nfs server upon getting readdirp reply, will associate the gfid with 
the filehandle created for the entry. But without linking the inode, it 
would send the readdirp reply back to nfs client. Now next time when nfs 
client makes some operation on one of those filehandles, nfs server 
tries to resolve it by finding the inode for the gfid present in the 
filehandle. But since the inode was not linked in readdirp, inode_find 
operation fails and it tries to do a hard resolution by sending the 
lookup operation on that gfid to the normal main graph. (The information 
on whether the call should be sent to main graph or snapview-server 
would be present in the inode context. But here the lookup has come on a 
gfid with a newly created inode where the context is not there yet. So 
the call would be sent to the main graph itself). But since the gfid is 
a randomly generated virtual gfid (not present on disk), the lookup 
operation fails giving error.


As per my understanding this can happen with any xlator that deals with 
virtual inodes (by generating random gfids).


I can think of these 2 methods to handle this:
1)  do inode linking for readdirp also in nfs server
2)  If lookup operation fails, snapview-client xlator (which actually 
redirects the fops on snapshot world to snapview-server by looking into 
the inode context) should check if the failed lookup is a nameless 
lookup. If so, AND the gfid of the inode is NULL AND lookup has come 
from main graph, then instead of unwinding the lookup with failure, send 
it to snapview-server which might be able to find the inode for the gfid 
(as the gfid was generated by itself, it should be able to find the 
inode for that gfid unless and until it has been purged from the inode 
table).



Please let me know if I have missed anything. Please provide feedback.

Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failure (bug-1112559.t)

2014-07-04 Thread Raghavendra Bhat


Hi,

I think the regression test bug-1112559.t is causing some spurious 
failures. I see some regression jobs being failed due to it.



Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] regarding inode_link/unlink

2014-07-04 Thread Raghavendra Bhat

On Friday 04 July 2014 05:39 PM, Pranith Kumar Karampuri wrote:


On 07/04/2014 04:28 PM, Raghavendra Gowdappa wrote:


- Original Message -

From: "Pranith Kumar Karampuri" 
To: "Gluster Devel" , "Anand Avati" 
, "Brian Foster"
, "Raghavendra Gowdappa" , 
"Raghavendra Bhat" 

Sent: Friday, July 4, 2014 3:44:29 PM
Subject: regarding inode_link/unlink

hi,
   I have a doubt about when a particular dentry_unset thus
inode_unref on parent dir happens on fuse-bridge in gluster.
When a file is looked up for the first time fuse_entry_cbk does
'inode_link' with parent-gfid/bname. Whenever an unlink/rmdir/(lookup
gives ENOENT) happens then corresponding inode unlink happens. The
question is, will the present set of operations lead to leaks:
1) Mount 'M0' creates a file 'a'
2) Mount 'M1' of same volume deletes file 'a'

M0 never touches 'a' anymore. When will inode_unlink happen for such
cases? Will it lead to memory leaks?
Kernel will eventually send forget (a) on M0 and that will cleanup 
the dentries and inode. Its equivalent to a file being looked up and 
never used again (deleting doesn't matter in this case).
Do you know the trigger points for that? When I do 'touch a' on the 
mount point and leave the system like that, forget is not coming.

If I do unlink on the file then forget is coming.

Pranith


As per my understanding forgets can come in one of these below situations:
1) File/directory is removed
2) As per the above example where there are 2 mounts. In this case, say 
one of the mounts removes the file/directory. The other client will get 
forget if it tries to access the removed file/directory.
3) In glusterfs fuse client the lru limit for the inodes is infinity. If 
the lru-limit is finite, then forgets can come once the lru-limit is 
crossed.

4) When drop caches is done, the kernel will send forgets.

Please let me know if I have missed anything.

Regards,
Raghavendra Bhat





Pranith





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Need clarification regarding the "force" option for snapshot delete.

2014-06-30 Thread Raghavendra Bhat

On Friday 27 June 2014 10:47 AM, Raghavendra Talur wrote:

Inline.

- Original Message -
From: "Atin Mukherjee" 
To: "Sachin Pandit" , "Gluster Devel" 
, gluster-us...@gluster.org
Sent: Thursday, June 26, 2014 3:30:31 PM
Subject: Re: [Gluster-devel] Need clarification regarding the "force" option 
for snapshot delete.



On 06/26/2014 01:58 PM, Sachin Pandit wrote:

Hi all,

We had some concern regarding the snapshot delete "force" option,
That is the reason why we thought of getting advice from everyone out here.

Currently when we give "gluster snapshot delete ", It gives a 
notification
saying that "mentioned snapshot will be deleted, Do you still want to continue 
(y/n)?".
As soon as he presses "y" it will delete the snapshot.

Our new proposal is, When a user issues snapshot delete command "without force"
then the user should be given a notification saying to use "force" option to
delete the snap.

In that case "gluster snapshot delete " becomes useless apart
from throwing a notification. If we can ensure snapshot delete all works
only with "force" option then we can have gluster snapshot delete
 to work as it is now.

~Atin

Agree with Atin here, asking user to execute same command with force appended is
not right.



When snapshot delete command is issued with "force" option then the user should
be given a notification saying "Mentioned snapshot will be deleted, Do you still
want to continue (y/n)".

The reason we thought of bringing this up is because we have planned to 
introduce
a command "gluster snapshot delete all" which deletes all the snapshot in a 
system,
and "gluster snapshot delete volume " which deletes all the snapshots 
in
the mentioned volume. If user accidentally issues any one of the above mentioned
command and press "y" then he might lose few or more snapshot present in 
volume/system.
(Thinking it will ask for notification for each delete).

It will be good to have this feature, asking for y for every delete.
When force is used we don't ask confirmation for each. Similar to rm -f.

If that is not feasible as of now, is something like this better?

Case 1 : Single snap
[root@snapshot-24 glusterfs]# gluster snapshot delete snap1
Deleting snap will erase all the information about the snap.
Do you still want to continue? (y/n) y
[root@snapshot-24 glusterfs]#

Case 2: Delete all system snaps
[root@snapshot-24 glusterfs]# gluster snapshot delete all
Deleting  snaps stored on the system
Do you still want to continue? (y/n) y
[root@snapshot-24 glusterfs]#

Case 3: Delete all volume snaps
[root@snapshot-24 glusterfs]# gluster snapshot delete volume volname
Deleting  snaps for the volume volname
Do you still want to continue? (y/n) y
[root@snapshot-24 glusterfs]#

Idea here being, if the Warnings to different commands are different
then users may pause for  moment to read and check the message.
We can even list the snaps to be deleted even if we don't ask for
confirmation for each.

Raghavendra Talur


Agree with Raghavendra Talur. It would be better to ask the user without 
force option. The above method suggested by Talur seems to be neat.


Regards,
Raghavendra Bhat


Do you think notification would be more than enough, or do we need to introduce
a "force" option ?

--
Current procedure:
--

[root@snapshot-24 glusterfs]# gluster snapshot delete snap1
Deleting snap will erase all the information about the snap.
Do you still want to continue? (y/n)


Proposed procedure:
---

[root@snapshot-24 glusterfs]# gluster snapshot delete snap1
Please use the force option to delete the snap.

[root@snapshot-24 glusterfs]# gluster snapshot delete snap1 force
Deleting snap will erase all the information about the snap.
Do you still want to continue? (y/n)
--

We are looking forward for the feedback on this.

Thanks,
Sachin Pandit.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] regarding inode-unref on root inode

2014-06-24 Thread Raghavendra Bhat

On Tuesday 24 June 2014 08:17 PM, Pranith Kumar Karampuri wrote:

Does anyone know why inode_unref is no-op for root inode?

I see the following code in inode.c

 static inode_t *
 __inode_unref (inode_t *inode)
 {
 if (!inode)
 return NULL;

 if (__is_root_gfid(inode->gfid))
 return inode;
 ...
}


I think its done with the intention that, root inode should *never* ever 
get removed from the active inodes list. (not even accidentally). So 
unref on root-inode is a no-op. Dont know whether there are any other 
reasons.


Regards,
Raghavendra Bhat



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glupy test failing

2014-06-20 Thread Raghavendra Bhat

On Friday 20 June 2014 09:44 PM, Justin Clift wrote:

On 20/06/2014, at 3:49 PM, Vijay Bellur wrote:


Side-effect of merging this patch [1]. Have reverted the change to let 
regression tests pass.


That seems to have fixed it.

+ Justin


Yeah. It has been fixed. Thanks :)

Regards,
Raghavendra Bhat


--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glupy test failing

2014-06-20 Thread Raghavendra Bhat


Hi,

I am seeing glupy.t test being failed in some testcases. It is failing 
in my local machine as well (with latest master). Is it a genuine 
failure or a spurious one?


/tests/features/glupy.t(Wstat: 0 Tests: 6 
Failed: 2)

  Failed tests:  2, 6

As per the logfile of the fuse mount done in the testcase this is the 
error:



[2014-06-20 14:15:53.038826] I [MSGID: 100030] [glusterfsd.c:1998:main] 
0-glusterfs: Started running glusterfs version 3.5qa2 (args: glusterfs 
-f /d/backends/glupytest.vol /mnt/glusterfs/0)
[2014-06-20 14:15:53.059484] E [glupy.c:2382:init] 0-vol-glupy: Python 
import failed
[2014-06-20 14:15:53.059575] E [xlator.c:425:xlator_init] 0-vol-glupy: 
Initialization of volume 'vol-glupy' failed, review your volfile again
[2014-06-20 14:15:53.059587] E [graph.c:322:glusterfs_graph_init] 
0-vol-glupy: initializing translator failed
[2014-06-20 14:15:53.059595] E [graph.c:525:glusterfs_graph_activate] 
0-graph: init failed
[2014-06-20 14:15:53.060045] W [glusterfsd.c:1182:cleanup_and_exit] (--> 
0-: received signum (0), shutting down
[2014-06-20 14:15:53.060090] I [fuse-bridge.c:5561:fini] 0-fuse: 
Unmounting '/mnt/glusterfs/0'.
[2014-06-20 14:19:01.867378] I [MSGID: 100030] [glusterfsd.c:1998:main] 
0-glusterfs: Started running glusterfs version 3.5qa2 (args: glusterfs 
-f /d/backends/glupytest.vol /mnt/glusterfs/0)
[2014-06-20 14:19:01.897158] E [glupy.c:2382:init] 0-vol-glupy: Python 
import failed
[2014-06-20 14:19:01.897241] E [xlator.c:425:xlator_init] 0-vol-glupy: 
Initialization of volume 'vol-glupy' failed, review your volfile again
[2014-06-20 14:19:01.897252] E [graph.c:322:glusterfs_graph_init] 
0-vol-glupy: initializing translator failed
[2014-06-20 14:19:01.897260] E [graph.c:525:glusterfs_graph_activate] 
0-graph: init failed
[2014-06-20 14:19:01.897635] W [glusterfsd.c:1182:cleanup_and_exit] (--> 
0-: received signum (0), shutting down
[2014-06-20 14:19:01.897677] I [fuse-bridge.c:5561:fini] 0-fuse: 
Unmounting '/mnt/glusterfs/0'.



Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Shall we revert quota-anon-fd.t?

2014-06-10 Thread Raghavendra Bhat

On Wednesday 11 June 2014 08:21 AM, Pranith Kumar Karampuri wrote:

hi,
   I see that quota-anon-fd.t is causing too many spurious failures. I 
think we should revert it and raise a bug so that it can be fixed and 
committed again along with the fix.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


+1
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] autodelete in snapshots

2014-06-04 Thread Raghavendra Bhat

On Wednesday 04 June 2014 11:23 AM, Rajesh Joseph wrote:


- Original Message -

From: "M S Vishwanath Bhat" 
To: "Rajesh Joseph" 
Cc: "Vijay Bellur" , "Seema Naik" , "Gluster 
Devel"

Sent: Tuesday, June 3, 2014 5:55:27 PM
Subject: Re: [Gluster-devel] autodelete in snapshots

On 3 June 2014 15:21, Rajesh Joseph  wrote:



- Original Message -
From: "M S Vishwanath Bhat" 
To: "Vijay Bellur" 
Cc: "Seema Naik" , "Gluster Devel" <
gluster-devel@gluster.org>
Sent: Tuesday, June 3, 2014 1:02:08 AM
Subject: Re: [Gluster-devel] autodelete in snapshots




On 2 June 2014 20:22, Vijay Bellur < vbel...@redhat.com > wrote:



On 04/23/2014 05:50 AM, Vijay Bellur wrote:


On 04/20/2014 11:42 PM, Lalatendu Mohanty wrote:


On 04/16/2014 11:39 AM, Avra Sengupta wrote:


The whole purpose of introducing the soft-limit is, that at any point
of time the number of
snaps should not exceed the hard limit. If we trigger auto-delete on
hitting hard-limit, then
the purpose itself is lost, because at that point we would be taking a
snap, making the limit
hard-limit + 1, and then triggering auto-delete, which violates the
sanctity of the hard-limit.
Also what happens when we are at hard-limit + 1, and another snap is
issued, while auto-delete
is yet to process the first delete. At that point we end up at
hard-limit + 1. Also what happens
if for a particular snap the auto-delete fails.

We should see the hard-limit, as something set by the admin keeping in
mind the resource consumption
and at no-point should we cross this limit, come what may. If we hit
this limit, the create command
should fail asking the user to delete snaps using the "snapshot
delete" command.

The two options Raghavendra mentioned are applicable for the
soft-limit only, in which cases on
hitting the soft-limit

1. Trigger auto-delete

or

2. Log a warning-message, for the user saying the number of snaps is
exceeding the snap-limit and
display the number of available snaps

Now which of these should happen also depends on the user, because the
auto-delete option
is configurable.

So if the auto-delete option is set as true, auto-delete should be
triggered and the above message
should also be logged.

But if the option is set as false, only the message should be logged.

This is the behaviour as designed. Adding Rahul, and Seema in the
mail, to reflect upon the
behaviour as well.

Regards,
Avra

This sounds correct. However we need to make sure that the usage or
documentation around this should be good enough , so that users
understand the each of the limits correctly.


It might be better to avoid the usage of the term "soft-limit".
soft-limit as used in quota and other places generally has an alerting
connotation. Something like "auto-deletion-limit" might be better.


I still see references to "soft-limit" and auto deletion seems to get
triggered upon reaching soft-limit.

Why is the ability to auto delete not configurable? It does seem pretty
nasty to go about deleting snapshots without obtaining explicit consent
from the user.

I agree with Vijay here. It's not good to delete a snap (even though it is
oldest) without the explicit consent from user.

FYI It took me more than 2 weeks to figure out that my snaps were getting
autodeleted after reaching "soft-limit". For all I know I had not done
anything and my snap restore were failing.

I propose to remove the terms "soft" and "hard" limit. I believe there
should be a limit (just "limit") after which all snapshot creates should
fail with proper error messages. And there can be a water-mark after which
user should get warning messages. So below is my proposal.

auto-delete + snap-limit: If the snap-limit is set to n , next snap create
(n+1th) will succeed only if if auto-delete is set to on/true/1 and oldest
snap will get deleted automatically. If autodelete is set to off/false/0 ,
(n+1)th snap create will fail with proper error message from gluster CLI
command. But again by default autodelete should be off.

snap-water-mark : This should come in picture only if autodelete is turned
off. It should not have any meaning if auto-delete is turned ON. Basically
it's usage is to give the user warning that limit almost being reached and
it is time for admin to decide which snaps should be deleted (or which
should be kept)

*my two cents*

-MS


The reason for having a hard-limit is to stop snapshot creation once we
reached this limit. This helps to have a control over the resource
consumption. Therefore if we only have this limit (as snap-limit) then
there is no question of auto-delete. Auto-delete can only be triggered once
the count crosses the limit. Therefore we introduced the concept of
soft-limit and a hard-limit. As the name suggests once the hard-limit is
reached no more snaps will be created.


Perhaps I could have been more clearer. auto-delete value does come into
picture when limit is reached.

There is a limit 'n' (snap-limit), and when we reach this limit, what
happens to next snap creat

Re: [Gluster-devel] inode lru limit

2014-06-02 Thread Raghavendra Bhat

On Tuesday 03 June 2014 07:00 AM, Raghavendra Gowdappa wrote:


- Original Message -

From: "Raghavendra Bhat" 
To: gluster-devel@gluster.org
Cc: "Anand Avati" 
Sent: Monday, June 2, 2014 6:41:30 PM
Subject: Re: [Gluster-devel] inode lru limit

On Monday 02 June 2014 11:06 AM, Raghavendra G wrote:





On Fri, May 30, 2014 at 2:24 PM, Raghavendra Bhat < rab...@redhat.com >
wrote:



Hi,

Currently the lru-limit of the inode table in brick processes is 16384. There
is a option to configure it to some other value. The protocol/server uses
inode_lru_limit variable present in its private structure while creating the
inode table (whose default value is 16384). When the option is reconfigured
via volume set option the protocol/server's inode_lru_limit variable present
in its private structure is changed. But the actual size of the inode table
still remains same as old one. Only when the brick is restarted the newly
set value comes into picture. Is it ok? Should we change the inode table's
lru_limit variable also as part of reconfigure? If so, then probably we
might have to remove the extra inodes present in the lru list by calling
inode_table_prune.

Yes, I think we should change the inode table's lru limit too and call
inode_table_prune. From what I know, I don't think this change would cause
any problems.


But as of now the inode table is bound to bound_xl which is associated with
the client_t object for the client being connected. As part of fops we can
get the bound_xl (thus the inode table) from the rpc request
(req->trans->xl_private). But in reconfigure we get just the xlator pointer
of protocol/server and dict containing new options.

So what I am planning is this. If the xprt_list (transport list corresponding
to the clients mounted) is empty, then just set the private structure's
variable for lru limit (which will be used to create the inode table when a
client mounts). If xprt_list of protocol/server's private structure is not
empty, then get one of the transports from that list and get the client_t
object corresponding to the transport, from which bould_xl is obtained (all
the client_t objects share the same inode table) . Then from bound_xl
pointer to inode table is got and its variable for lru limit is also set to
the value specified via cli and inode_table_prune is called to purge the
extra inodes.

In the above proposal if there are no active clients, lru limit of itable is 
not reconfigured. Here are two options to improve correctness of your proposal.
If there are no active clients, then there will not be any itable. 
itable will be created when 1st client connects to the brick. And while 
creating the itable we use the inode_lru_limit variable present in 
protocol/server's private structure and inode table that is created also 
saves the same value.

1. On a successful handshake, you check whether the lru_limit of itable is 
equal to configured value. If not equal, set it to the configured value and 
prune the itable. The cost is that you check inode table's lru limit on every 
client connection.
On successful handshake, for the 1st client inode table will be created 
with lru_limit value saved in protocol/server's private. For further 
handshakes since inode table is already there, new inode tables will not 
be created. So instead of waiting for a new handshake to happen to set 
the lru_limit and purge the inode table, I think its better to do it as 
part of reconfigure itself.


2. Traverse through the list of all xlators (since there is no easy way of 
finding potential candidates for bound_xl other than peaking into options 
specific to authentication) and if there is an itable associated with that 
xlator, set its lru limit and prune it. The cost here is traversing the list of 
xlators. However, our xlator list in brick process is relatively small, this 
shouldn't have too much performance impact.

Comments are welcome.


Regards,
Raghavendra Bhat

Does it sound OK?

Regards,
Raghavendra Bhat

Regards,
Raghavendra Bhat






Please provide feedback


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel



--
Raghavendra G


___
Gluster-devel mailing list Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] inode lru limit

2014-06-02 Thread Raghavendra Bhat

On Monday 02 June 2014 11:06 AM, Raghavendra G wrote:



On Fri, May 30, 2014 at 2:24 PM, Raghavendra Bhat <mailto:rab...@redhat.com>> wrote:



Hi,

Currently the lru-limit of the inode table in brick processes is
16384. There is a option to configure it to some other value. The
protocol/server uses inode_lru_limit variable present in its
private structure while creating the inode table (whose default
value is 16384). When the option is reconfigured via volume set
option the protocol/server's inode_lru_limit variable present in
its private structure is changed. But the actual size of the inode
table still remains same as old one. Only when the brick is
restarted the newly set value comes into picture. Is it ok? Should
we change the inode table's lru_limit variable also as part of
reconfigure? If so, then probably we might have to remove the
extra inodes present in the lru list by calling inode_table_prune.


Yes, I think we should change the inode table's lru limit too and call 
inode_table_prune. From what I know, I don't think this change would 
cause any problems.




But as of now the inode table is bound to bound_xl which is associated 
with the client_t object for the client being connected. As part of fops 
we can get the bound_xl  (thus the inode table) from the rpc request 
(req->trans->xl_private). But in reconfigure we get just the xlator 
pointer of protocol/server and dict containing new options.


So what I am planning is this. If the xprt_list (transport list 
corresponding to the clients mounted) is empty, then just set the 
private structure's variable for lru limit (which will be used to create 
the inode table when a client mounts). If xprt_list of protocol/server's 
private structure is not empty, then get one of the transports from that 
list and get the client_t object corresponding to the transport, from 
which bould_xl is obtained (all the client_t objects share the same 
inode table) . Then from bound_xl pointer to inode table is got and its 
variable for lru limit is also set to the value specified via cli and 
inode_table_prune is called to purge the extra inodes.


Does it sound OK?

Regards,
Raghavendra Bhat

Regards,
Raghavendra Bhat



    Please provide feedback


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




--
Raghavendra G


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] struct dirent in snapview-server.c

2014-06-02 Thread Raghavendra Bhat

On Monday 02 June 2014 06:47 AM, Harshavardhana wrote:

It is always possible to translate structures, the question is whether
it is useful of not. d_off is the offset of this struct dirent within
the buffer for the whole directory returned by getdents(2) system call.

Since we glusterfs does not use getdents(2) but upper level
opendir(3)/readdir(3), which use getdents(2) themselves, it never has
the whole buffer, and therefore I am not sure it can make any use of
d_off.

Understood that makes sense.


Hi Emmanuel,

I have raised a bug for it 
(https://bugzilla.redhat.com/show_bug.cgi?id=1103591) and have sent a 
patch to handle the issue (http://review.gluster.org/#/c/7946/). Please 
let me know if this handles the issue.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] inode lru limit

2014-05-30 Thread Raghavendra Bhat


Hi,

Currently the lru-limit of the inode table in brick processes is 16384. 
There is a option to configure it to some other value. The 
protocol/server uses inode_lru_limit variable present in its private 
structure while creating the inode table (whose default value is 16384). 
When the option is reconfigured via volume set option the 
protocol/server's inode_lru_limit variable present in its private 
structure is changed. But the actual size of the inode table still 
remains same as old one. Only when the brick is restarted the newly set 
value comes into picture. Is it ok? Should we change the inode table's 
lru_limit variable also as part of reconfigure? If so, then probably we 
might have to remove the extra inodes present in the lru list by calling 
inode_table_prune.


Please provide feedback


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel