Re: [Gluster-devel] [Gluster-users] Release 6.5: Expected tagging on 5th August

2019-08-01 Thread Soumya Koduri

Hi Hari,

[1] is a critical patch which addresses issue affecting upcall 
processing by applications such as NFS-Ganesha. As soon as it gets 
merged in master, I shall backport it to release-7/6/5 branches. Kindly 
consider the same.


Thanks,
Soumya

[1] https://review.gluster.org/#/c/glusterfs/+/23108/

On 8/1/19 12:21 PM, Hari Gowtham wrote:

Hi,

Expected tagging date for release-6.5 is on August, 5th, 2019.

Please ensure required patches are backported and also are passing
regressions and are appropriately reviewed for easy merging and tagging
on the date.


___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Release 4.1.10: Expected tagging on July 15th

2019-07-16 Thread Soumya Koduri



On 7/11/19 5:34 PM, Pasi Kärkkäinen wrote:

On Tue, Jul 09, 2019 at 02:32:46PM +0530, Hari Gowtham wrote:

Hi,

Expected tagging date for release-4.1.10 is on July, 15th 2019.

NOTE: This is the last release for 4 series.

Branch 4 will be EOLed after this. So if there are any critical patches
please ensure they are backported and also are passing
regressions and are appropriately reviewed for easy merging and tagging
on the date.



Glusterfs 4.1.9 did close this issue:

"gfapi: do not block epoll thread for upcall notifications":
https://bugzilla.redhat.com/show_bug.cgi?id=1694563


But more patches are needed to properly fix the issue, so it'd really nice to 
have these patches backported to 4.1.10 aswell:

"gfapi: fix incorrect initialization of upcall syncop arguments":
https://bugzilla.redhat.com/show_bug.cgi?id=1718316

"Upcall: Avoid sending upcalls for invalid Inode":
https://bugzilla.redhat.com/show_bug.cgi?id=1718338


This gfapi/upcall issue gets easily triggered with nfs-ganesha, and causes "complete 
IO hang", as can be seem here:

"Complete IO hang on CentOS 7.5":
https://github.com/nfs-ganesha/nfs-ganesha/issues/335


Thanks Pasi.

@Hari,

The smoke tests have passed for these patches now. Kindly merge them.

Thanks,
Soumya




Thanks,

-- Pasi

  

--
Regards,
Hari Gowtham.
  
___


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] Backlog/Improvements tracking

2019-04-30 Thread Soumya Koduri

Hi,

To track any new feature or improvements we are currently using github . 
I assume those issues refer to the ones which are actively being worked 
upon. How do we track backlogs which may not get addressed (at least in 
the near future)?


For eg., I am planning to close couple of RFE BZs [1]..[3] which were 
filed to improve upcall mechanism, as there is no active development 
happening in those aspects. But at the same time I like to retain the 
list for future reference (in case any new member like to take up).


Can we use github itself to track all the feature-gaps of a component in 
one issue (note: it may be in open state forever) or is it better to 
document these as limitations in the admin/developer guide & close BZ/issue?


Thanks,
Soumya

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1214654
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1214644
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1200264
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Issue with posix locks

2019-04-01 Thread Soumya Koduri



On 4/1/19 2:23 PM, Xavi Hernandez wrote:
On Mon, Apr 1, 2019 at 10:15 AM Soumya Koduri <mailto:skod...@redhat.com>> wrote:




On 4/1/19 10:02 AM, Pranith Kumar Karampuri wrote:
 >
 >
 > On Sun, Mar 31, 2019 at 11:29 PM Soumya Koduri
mailto:skod...@redhat.com>
 > <mailto:skod...@redhat.com <mailto:skod...@redhat.com>>> wrote:
 >
 >
 >
 >     On 3/29/19 11:55 PM, Xavi Hernandez wrote:
 >      > Hi all,
 >      >
 >      > there is one potential problem with posix locks when used in a
 >      > replicated or dispersed volume.
 >      >
 >      > Some background:
 >      >
 >      > Posix locks allow any process to lock a region of a file
multiple
 >     times,
 >      > but a single unlock on a given region will release all
previous
 >     locks.
 >      > Locked regions can be different for each lock request and
they can
 >      > overlap. The resulting lock will cover the union of all locked
 >     regions.
 >      > A single unlock (the region doesn't necessarily need to
match any
 >     of the
 >      > ranges used for locking) will create a "hole" in the currently
 >     locked
 >      > region, independently of how many times a lock request covered
 >     that region.
 >      >
 >      > For this reason, the locks xlator simply combines the
locked regions
 >      > that are requested, but it doesn't track each individual
lock range.
 >      >
 >      > Under normal circumstances this works fine. But there are
some cases
 >      > where this behavior is not sufficient. For example, suppose we
 >     have a
 >      > replica 3 volume with quorum = 2. Given the special nature
of posix
 >      > locks, AFR sends the lock request sequentially to each one
of the
 >      > bricks, to avoid that conflicting lock requests from other
 >     clients could
 >      > require to unlock an already locked region on the client
that has
 >     not
 >      > got enough successful locks (i.e. quorum). An unlock here not
 >     only would
 >      > cancel the current lock request. It would also cancel any
previously
 >      > acquired lock.
 >      >
 >
 >     I may not have fully understood, please correct me. AFAIU, lk
xlator
 >     merges locks only if both the lk-owner and the client opaque
matches.
 >
 >     In the case which you have mentioned above, considering clientA
 >     acquired
 >     locks on majority of quorum (say nodeA and nodeB) and clientB
on nodeC
 >     alone- clientB now has to unlock/cancel the lock it acquired
on nodeC.
 >
 >     You are saying the it could pose a problem if there were already
 >     successful locks taken by clientB for the same region which
would get
 >     unlocked by this particular unlock request..right?
 >
 >     Assuming the previous locks acquired by clientB are shared
(otherwise
 >     clientA wouldn't have got granted lock for the same region on
nodeA &
 >     nodeB), they would still hold true on nodeA & nodeB  as the
unlock
 >     request was sent to only nodeC. Since the majority of quorum
nodes
 >     still
 >     hold the locks by clientB, this isn't serious issue IMO.
 >
 >     I haven't looked into heal part but would like to understand
if this is
 >     really an issue in normal scenarios as well.
 >
 >
 > This is how I understood the code. Consider the following case:
 > Nodes A, B, C have locks with start and end offsets: 5-15 from
mount-1
 > and lock-range 2-3 from mount-2.
 > If mount-1 requests nonblocking lock with lock-range 1-7 and in
parallel
 > lets say mount-2 issued unlock of 2-3 as well.
 >
 > nodeA got unlock from mount-2 with range 2-3 then lock from
mount-1 with
 > range 1-7, so the lock is granted and merged to give 1-15
 > nodeB got lock from mount-1 with range 1-7 before unlock of 2-3
which
 > leads to EAGAIN which will trigger unlocks on granted lock in
mount-1
 > which will end up doing unlock of 1-7 on nodeA leading to lock-range
 > 8-15 instead of the original 5-15 on nodeA. Whereas nodeB and
nodeC will
 > have range 5-15.
 >
 > Let me know if my understanding is wrong.

Both of us mentioned the same points. So in the example you gave ,
mount-1 lost its previous lock on nodeA but majority of the quorum
   

Re: [Gluster-devel] Issue with posix locks

2019-04-01 Thread Soumya Koduri



On 4/1/19 10:02 AM, Pranith Kumar Karampuri wrote:



On Sun, Mar 31, 2019 at 11:29 PM Soumya Koduri <mailto:skod...@redhat.com>> wrote:




On 3/29/19 11:55 PM, Xavi Hernandez wrote:
 > Hi all,
 >
 > there is one potential problem with posix locks when used in a
 > replicated or dispersed volume.
 >
 > Some background:
 >
 > Posix locks allow any process to lock a region of a file multiple
times,
 > but a single unlock on a given region will release all previous
locks.
 > Locked regions can be different for each lock request and they can
 > overlap. The resulting lock will cover the union of all locked
regions.
 > A single unlock (the region doesn't necessarily need to match any
of the
 > ranges used for locking) will create a "hole" in the currently
locked
 > region, independently of how many times a lock request covered
that region.
 >
 > For this reason, the locks xlator simply combines the locked regions
 > that are requested, but it doesn't track each individual lock range.
 >
 > Under normal circumstances this works fine. But there are some cases
 > where this behavior is not sufficient. For example, suppose we
have a
 > replica 3 volume with quorum = 2. Given the special nature of posix
 > locks, AFR sends the lock request sequentially to each one of the
 > bricks, to avoid that conflicting lock requests from other
clients could
 > require to unlock an already locked region on the client that has
not
 > got enough successful locks (i.e. quorum). An unlock here not
only would
 > cancel the current lock request. It would also cancel any previously
 > acquired lock.
 >

I may not have fully understood, please correct me. AFAIU, lk xlator
merges locks only if both the lk-owner and the client opaque matches.

In the case which you have mentioned above, considering clientA
acquired
locks on majority of quorum (say nodeA and nodeB) and clientB on nodeC
alone- clientB now has to unlock/cancel the lock it acquired on nodeC.

You are saying the it could pose a problem if there were already
successful locks taken by clientB for the same region which would get
unlocked by this particular unlock request..right?

Assuming the previous locks acquired by clientB are shared (otherwise
clientA wouldn't have got granted lock for the same region on nodeA &
nodeB), they would still hold true on nodeA & nodeB  as the unlock
request was sent to only nodeC. Since the majority of quorum nodes
still
hold the locks by clientB, this isn't serious issue IMO.

I haven't looked into heal part but would like to understand if this is
really an issue in normal scenarios as well.


This is how I understood the code. Consider the following case:
Nodes A, B, C have locks with start and end offsets: 5-15 from mount-1 
and lock-range 2-3 from mount-2.
If mount-1 requests nonblocking lock with lock-range 1-7 and in parallel 
lets say mount-2 issued unlock of 2-3 as well.


nodeA got unlock from mount-2 with range 2-3 then lock from mount-1 with 
range 1-7, so the lock is granted and merged to give 1-15
nodeB got lock from mount-1 with range 1-7 before unlock of 2-3 which 
leads to EAGAIN which will trigger unlocks on granted lock in mount-1 
which will end up doing unlock of 1-7 on nodeA leading to lock-range 
8-15 instead of the original 5-15 on nodeA. Whereas nodeB and nodeC will 
have range 5-15.


Let me know if my understanding is wrong.


Both of us mentioned the same points. So in the example you gave , 
mount-1 lost its previous lock on nodeA but majority of the quorum 
(nodeB and nodeC) still have the previous lock  (range: 5-15) intact. So 
this shouldn't ideally lead to any issues as other conflicting locks are 
blocked or failed by majority of the nodes (provided there are no brick 
dis/re-connects).


Wrt to brick disconnects/re-connects, if we can get in general lock 
healing (not getting into implementation details atm) support, that 
should take care of correcting lock range on nodeA as well right?


That said I am not suggesting that we should stick to existing behavior, 
just trying to get clarification to check if we can avoid any 
overhead/side-effects with maintaining multiple locks.


Thanks,
Soumya





Thanks,
Soumya

 > However, when something goes wrong (a brick dies during a lock
request,
 > or there's a network partition or some other weird situation), it
could
 > happen that even using sequential locking, only one brick
succeeds the
 > lock request. In this case, AFR should cancel the previous lock
(and it
 > does), but this also cancels any previously acquired lock on that
 > region, which is not good.
  

Re: [Gluster-devel] Issue with posix locks

2019-03-31 Thread Soumya Koduri




On 3/29/19 11:55 PM, Xavi Hernandez wrote:

Hi all,

there is one potential problem with posix locks when used in a 
replicated or dispersed volume.


Some background:

Posix locks allow any process to lock a region of a file multiple times, 
but a single unlock on a given region will release all previous locks. 
Locked regions can be different for each lock request and they can 
overlap. The resulting lock will cover the union of all locked regions. 
A single unlock (the region doesn't necessarily need to match any of the 
ranges used for locking) will create a "hole" in the currently locked 
region, independently of how many times a lock request covered that region.


For this reason, the locks xlator simply combines the locked regions 
that are requested, but it doesn't track each individual lock range.


Under normal circumstances this works fine. But there are some cases 
where this behavior is not sufficient. For example, suppose we have a 
replica 3 volume with quorum = 2. Given the special nature of posix 
locks, AFR sends the lock request sequentially to each one of the 
bricks, to avoid that conflicting lock requests from other clients could 
require to unlock an already locked region on the client that has not 
got enough successful locks (i.e. quorum). An unlock here not only would 
cancel the current lock request. It would also cancel any previously 
acquired lock.




I may not have fully understood, please correct me. AFAIU, lk xlator 
merges locks only if both the lk-owner and the client opaque matches.


In the case which you have mentioned above, considering clientA acquired 
locks on majority of quorum (say nodeA and nodeB) and clientB on nodeC 
alone- clientB now has to unlock/cancel the lock it acquired on nodeC.


You are saying the it could pose a problem if there were already 
successful locks taken by clientB for the same region which would get 
unlocked by this particular unlock request..right?


Assuming the previous locks acquired by clientB are shared (otherwise 
clientA wouldn't have got granted lock for the same region on nodeA & 
nodeB), they would still hold true on nodeA & nodeB  as the unlock 
request was sent to only nodeC. Since the majority of quorum nodes still 
hold the locks by clientB, this isn't serious issue IMO.


I haven't looked into heal part but would like to understand if this is 
really an issue in normal scenarios as well.


Thanks,
Soumya

However, when something goes wrong (a brick dies during a lock request, 
or there's a network partition or some other weird situation), it could 
happen that even using sequential locking, only one brick succeeds the 
lock request. In this case, AFR should cancel the previous lock (and it 
does), but this also cancels any previously acquired lock on that 
region, which is not good.


A similar thing can happen if we try to recover (heal) posix locks that 
were active after a brick has been disconnected (for any reason) and 
then reconnected.


To fix all these situations we need to change the way posix locks are 
managed by locks xlator. One possibility would be to embed the lock 
request inside an inode transaction using inodelk. Since inodelks do not 
suffer this problem, the follwing posix lock could be sent safely. 
However this implies an additional network request, which could cause 
some performance impact. Eager-locking could minimize the impact in some 
cases. However this approach won't work for lock recovery after a 
disconnect.


Another possibility is to send a special partial posix lock request 
which won't be immediately merged with already existing locks once 
granted. An additional confirmation request of the partial posix lock 
will be required to fully grant the current lock and merge it with the 
existing ones. This requires a new network request, which will add 
latency, and makes everything more complex since there would be more 
combinations of states in which something could fail.


So I think one possible solution would be the following:

1. Keep each posix lock as an independent object in locks xlator. This 
will make it possible to "invalidate" any already granted lock without 
affecting already established locks.


2. Additionally, we'll keep a sorted list of non-overlapping segments of 
locked regions. And we'll count, for each region, how many locks are 
referencing it. One lock can reference multiple segments, and each 
segment can be referenced by multiple locks.


3. An additional lock request that overlaps with an existing segment, 
can cause this segment to be split to satisfy the non-overlapping property.


4. When an unlock request is received, all segments intersecting with 
the region are eliminated (it may require some segment splits on the 
edges), and the unlocked region is subtracted from each lock associated 
to the segment. If a lock gets an empty region, it's removed.


5. We'll create a special "remove lock" request that doesn't unlock a 
region but removes an already 

Re: [Gluster-devel] [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Soumya Koduri



On 3/27/19 12:55 PM, Xavi Hernandez wrote:

Hi Raghavendra,

On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


All,

Glusterfs cleans up POSIX locks held on an fd when the client/mount
through which those locks are held disconnects from bricks/server.
This helps Glusterfs to not run into a stale lock problem later (For
eg., if application unlocks while the connection was still down).
However, this means the lock is no longer exclusive as other
applications/clients can acquire the same lock. To communicate that
locks are no longer valid, we are planning to mark the fd (which has
POSIX locks) bad on a disconnect so that any future operations on
that fd will fail, forcing the application to re-open the fd and
re-acquire locks it needs [1].


Wouldn't it be better to retake the locks when the brick is reconnected 
if the lock is still in use ?


BTW, the referenced bug is not public. Should we open another bug to 
track this ?



Note that with AFR/replicate in picture we can prevent errors to
application as long as Quorum number of children "never ever" lost
connection with bricks after locks have been acquired. I am using
the term "never ever" as locks are not healed back after
re-connection and hence first disconnect would've marked the fd bad
and the fd remains so even after re-connection happens. So, its not
just Quorum number of children "currently online", but Quorum number
of children "never having disconnected with bricks after locks are
acquired".


I think this requisite is not feasible. In a distributed file system, 
sooner or later all bricks will be disconnected. It could be because of 
failures or because an upgrade is done, but it will happen.


The difference here is how long are fd's kept open. If applications open 
and close files frequently enough (i.e. the fd is not kept open more 
time than it takes to have more than Quorum bricks disconnected) then 
there's no problem. The problem can only appear on applications that 
open files for a long time and also use posix locks. In this case, the 
only good solution I see is to retake the locks on brick reconnection.



However, this use case is not affected if the application don't
acquire any POSIX locks. So, I am interested in knowing
* whether your use cases use POSIX locks?
* Is it feasible for your application to re-open fds and re-acquire
locks on seeing EBADFD errors?


I think that many applications are not prepared to handle that.


+1 to all the points mentioned by Xavi. This has been day-1 issue for 
all the applications using locks (like NFS-Ganesha and Samba). Not many 
applications re-open and re-acquire the locks. On receiving EBADFD, that 
error is most likely propagated to application clients.


Agree with Xavi that its better to heal/re-acquire the locks on brick 
reconnects before it accepts any fresh requests. I also suggest to have 
this healing mechanism generic enough (if possible) to heal any 
server-side state (like upcall, leases etc).


Thanks,
Soumya



Xavi


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7

regards,
Raghavendra

___
Gluster-users mailing list
gluster-us...@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
gluster-us...@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] requesting review available gluster* plugins in sos

2019-03-19 Thread Soumya Koduri




On 3/19/19 9:49 AM, Sankarshan Mukhopadhyay wrote:

 is (as might just be widely known)
an extensible, portable, support data collection tool primarily aimed
at Linux distributions and other UNIX-like operating systems.

At present there are 2 plugins

and 
I'd like to request that the maintainers do a quick review that this
sufficiently covers topics to help diagnose issues.


There is one plugin available for nfs-ganesha as well - 
https://github.com/sosreport/sos/blob/master/sos/plugins/nfsganesha.py


It needs a minor update. Sent a pull request for the same - 
https://github.com/sosreport/sos/pull/1593


Kindly review.

Thanks,
Soumya


This is a lead up to requesting more usage of the sos tool to diagnose
issues we see reported.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 6: Kick off!

2019-01-24 Thread Soumya Koduri

Hi Shyam,

Sorry for the late response. I just realized that we had two more new 
APIs glfs_setattr/fsetattr which uses 'struct stat' made public [1]. As 
mentioned in one of the patchset review comments, since the goal is to 
move to glfs_stat in release-6, do we need to update these APIs as well 
to use the new struct? Or shall we retain them in FUTURE for now and 
address in next minor release? Please suggest.


Thanks,
Soumya

[1] https://review.gluster.org/#/c/glusterfs/+/21734/


On 1/23/19 8:43 PM, Shyam Ranganathan wrote:

On 1/23/19 6:03 AM, Ashish Pandey wrote:


Following is the patch I am working and targeting -
https://review.gluster.org/#/c/glusterfs/+/21933/


This is a bug fix, and the patch size at the moment is also small in
lines changed. Hence, even if it misses branching the fix can be backported.

Thanks for the heads up!
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [NFS-Ganesha-Devel] Re: Problems about cache virtual glusterfs ACLs for ganesha in md-cache

2018-10-12 Thread Soumya Koduri



On 10/12/18 5:55 PM, Kinglong Mee wrote:

On 2018/10/12 14:34, Soumya Koduri wrote:> On 10/12/18 7:22 AM, Kinglong Mee 
wrote:

On 2018/10/11 19:09, Soumya Koduri wrote:

NFS-Ganesha's md-cache layer already does extensive caching of attributes and 
ACLs of each file looked upon. Do you see any additional benefit with turning 
on gluster md-cache as well?  More replies inline..


Yes, I think.

The logical is different between md-cache and ganesha's cache,
Ganesha caches xattr data depends on timeout, if timeout, ganesha get it from 
back-end glusterfs;
Md-cache caches depneds on timeout too, but md-cache can delay the timeout for 
some cases.


Could you please list out which xattrs fall into that category. AFAIK, like 
iatt, even all xattrs are invalidated post timeout in md-cache xlator.


The iatt's expire time is ->ia_time + timeout, xatt's expire time is ->xa_time 
+ timeout.
Most FOPs reply (read/write/truncate...) contains the postattr incidentally,
and update the ->ia_time.
But, ->xa_time cannot be updated as ->ia_time now, it only be updated at 
mdc_lookup_cbk now.

I will add a case of update ->xa_time if file's mtiem/ctime does not change
when updating the ->ia_time.
And, let stat/fstat/setattr/fsetattr/getxattr/fgetxattr's reply contains the 
xattr incidentally,
and update the ->xa_time too.


+1




By turning on both these caches, the process shall consume more memory and CPUs 
to store and invalidate same set of attributes at two different layers right. 
Do you see much better performance when compared to that cost?


I think md-cache supporting cache virtual ACLs is a function update,
ganesha can use it by the new added option, or not.

If setting a smaller timeout for ganesha's cache, it timeout frequently.
But, md-cache can return the cached xattr.

I do not have a comparing data between them.
After testing, we can chose the better combination,
1. md-cache on and ganesha's cache on;
2. md-cache on but ganesha's cache off;
3. md-cache off but ganesha's cache on.



yes..Please share your findings if you get to test these combinations.

Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [NFS-Ganesha-Devel] Re: Problems about cache virtual glusterfs ACLs for ganesha in md-cache

2018-10-12 Thread Soumya Koduri



On 10/12/18 12:04 PM, Soumya Koduri wrote:
1. Cache it separately as posix ACLs (a new option maybe 
"cache-glusterfs-acl" is added);

 And make sure _posix_xattr_get_set fills them when lookup requests.



I am not sure if posix layer can handle it. Virtual xattrs are 
in-memory and not stored on disk. They are converted to/from posix-acl 
in posix-acl xlator. So FWIU, posix-acl xlator should handle setting 
these attributes as part of LOOKUP response if needed. Same shall 
apply for any virtual xattr cached in md-cache. Request Poornima to 
comment.


Posix-acl can hand it correctly now.


Okay.





At a time, any gfapi consumer would use either posix-acl or virtual 
glusterfs ACLs. So having two options to selectively choose which one 
of them to cache sounds better to me instead of unnecessarily storing 
two different representations of the same ACL.


Make sense.
I will add another option for virtual glusterfs ACLs in md-cache.


Cool. thanks!

-Soumya



thanks,
Kinglong Mee

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [NFS-Ganesha-Devel] Re: Problems about cache virtual glusterfs ACLs for ganesha in md-cache

2018-10-12 Thread Soumya Koduri



On 10/12/18 7:22 AM, Kinglong Mee wrote:

On 2018/10/11 19:09, Soumya Koduri wrote:

NFS-Ganesha's md-cache layer already does extensive caching of attributes and 
ACLs of each file looked upon. Do you see any additional benefit with turning 
on gluster md-cache as well?  More replies inline..


Yes, I think.

The logical is different between md-cache and ganesha's cache,
Ganesha caches xattr data depends on timeout, if timeout, ganesha get it from 
back-end glusterfs;
Md-cache caches depneds on timeout too, but md-cache can delay the timeout for 
some cases.


Could you please list out which xattrs fall into that category. AFAIK, 
like iatt, even all xattrs are invalidated post timeout in md-cache xlator.


By turning on both these caches, the process shall consume more memory 
and CPUs to store and invalidate same set of attributes at two different 
layers right. Do you see much better performance when compared to that 
cost?


Thanks,
Soumya






On 10/11/18 7:47 AM, Kinglong Mee wrote:

Cc nfs-ganesha,

Md-cache has option "cache-posix-acl" that controls caching of posix ACLs
("system.posix_acl_access"/"system.posix_acl_default") and virtual glusterfs 
ACLs
("glusterfs.posix.acl"/"glusterfs.posix.default_acl") now.

But, _posix_xattr_get_set does not fill virtual glusterfs ACLs when lookup 
requests.
So, md-cache caches bad virtual glusterfs ACLs.

After I turn on "cache-posix-acl" option to cache ACLs at md-cache, nfs client 
gets many EIO errors.

https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/427305

There are two chooses for cache virtual glusterfs ACLs in md-cache,
1. Cache it separately as posix ACLs (a new option maybe "cache-glusterfs-acl" 
is added);
     And make sure _posix_xattr_get_set fills them when lookup requests.



I am not sure if posix layer can handle it. Virtual xattrs are in-memory and 
not stored on disk. They are converted to/from posix-acl in posix-acl xlator. 
So FWIU, posix-acl xlator should handle setting these attributes as part of 
LOOKUP response if needed. Same shall apply for any virtual xattr cached in 
md-cache. Request Poornima to comment.


Posix-acl can hand it correctly now.



At a time, any gfapi consumer would use either posix-acl or virtual glusterfs 
ACLs. So having two options to selectively choose which one of them to cache 
sounds better to me instead of unnecessarily storing two different 
representations of the same ACL.


Make sense.
I will add another option for virtual glusterfs ACLs in md-cache.

thanks,
Kinglong Mee



Thanks,
Soumya


2. Does not cache it, only cache posix ACLs;
     If gfapi request it, md-cache lookup according posix ACL at cache,
     if exist, make the virtual glusterfs ACL locally and return to gfapi;
     otherwise, send the request to glusterfsd.

Virtual glusterfs ACLs are another format of posix ACLs, there are larger than 
posix ACLs,
and always exist no matter the really posix ACL exist or not.




So, I'd prefer #2.
Any comments are welcome.

thanks,
Kinglong Mee

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel




___
Devel mailing list -- de...@lists.nfs-ganesha.org
To unsubscribe send an email to devel-le...@lists.nfs-ganesha.org


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Problems about cache virtual glusterfs ACLs for ganesha in md-cache

2018-10-11 Thread Soumya Koduri
NFS-Ganesha's md-cache layer already does extensive caching of 
attributes and ACLs of each file looked upon. Do you see any additional 
benefit with turning on gluster md-cache as well?  More replies inline..


On 10/11/18 7:47 AM, Kinglong Mee wrote:

Cc nfs-ganesha,

Md-cache has option "cache-posix-acl" that controls caching of posix ACLs
("system.posix_acl_access"/"system.posix_acl_default") and virtual glusterfs 
ACLs
("glusterfs.posix.acl"/"glusterfs.posix.default_acl") now.

But, _posix_xattr_get_set does not fill virtual glusterfs ACLs when lookup 
requests.
So, md-cache caches bad virtual glusterfs ACLs.

After I turn on "cache-posix-acl" option to cache ACLs at md-cache, nfs client 
gets many EIO errors.

https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/427305

There are two chooses for cache virtual glusterfs ACLs in md-cache,
1. Cache it separately as posix ACLs (a new option maybe "cache-glusterfs-acl" 
is added);
And make sure _posix_xattr_get_set fills them when lookup requests.



I am not sure if posix layer can handle it. Virtual xattrs are in-memory 
and not stored on disk. They are converted to/from posix-acl in 
posix-acl xlator. So FWIU, posix-acl xlator should handle setting these 
attributes as part of LOOKUP response if needed. Same shall apply for 
any virtual xattr cached in md-cache. Request Poornima to comment.


At a time, any gfapi consumer would use either posix-acl or virtual 
glusterfs ACLs. So having two options to selectively choose which one of 
them to cache sounds better to me instead of unnecessarily storing two 
different representations of the same ACL.


Thanks,
Soumya


2. Does not cache it, only cache posix ACLs;
If gfapi request it, md-cache lookup according posix ACL at cache,
if exist, make the virtual glusterfs ACL locally and return to gfapi;
otherwise, send the request to glusterfsd.

Virtual glusterfs ACLs are another format of posix ACLs, there are larger than 
posix ACLs,
and always exist no matter the really posix ACL exist or not.




So, I'd prefer #2.
Any comments are welcome.

thanks,
Kinglong Mee

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] On making performance.parallel-readdir as a default option

2018-09-24 Thread Soumya Koduri

Please find my comments inline.

On 9/22/18 8:56 AM, Raghavendra Gowdappa wrote:



On Fri, Sep 21, 2018 at 11:25 PM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


Hi all,

We've a feature performance.parallel-readdir [1] that is known to
improve performance of readdir operations [2][3][4]. The option is
especially useful when distribute scale is relatively large (>10)
and is known to improve performance of readdir operations even on
smaller scale of distribute count 1 [4].

However, this option is not enabled by default. I am here proposing
to make this as a default feature.

But, there are some important things to be addressed in
readdir-ahead (which is core part of parallel-readdir), before we
can do so:

To summarize issues with readdir-ahead:
* There seems to be one prominent problem of missing dentries with
parallel-readdir. There was one problem discussed on tech-list just
yesterday. I've heard about this recurrently earlier too. Not sure
whether this is the problem of missing unlink/rmdir/create etc fops
(see below) in readdir-ahead. ATM, no RCA.


IMHO, this is a must fix to enable this option by default.


* fixes to maintain stat-consistency in dentries pre-fetched have
not made into downstream yet (though merged upstream [5]).
* readdir-ahead doesn't implement directory modification fops like
rmdir/create/symlink/link/unlink/rename. This means cache won't be
updated wiith newer content, even on single mount till its consumed
by application or purged.


As you had explained, since this affects cache-consistency, this as well 
needs to be addressed.



* dht linkto-files should store relative positions of subvolumes
instead of absolute subvolume name, so that changes to immediate
child won't render them stale.


FWIU from your explanation, this may affect performance for a brief 
moment when the option is turned on but as such doesn't result in 
incorrect results. So considering that these options are usually 
configured at the beginning of the volume configuration and not toggled 
often, this may not be blocker.




* Features parallel-readdir depends on to be working should be
enabled automatically even though they were off earlier when
parallel-readdir is enabled [6].


Since readdir-ahead is one such option which was not turned on (by 
default) till now and most of the above mentioned issues are with 
readdir-ahead, will it be helpful if we enable only readdir-ahead for 
few releases, get enough testing done and then consider parallel-readdir?



Thanks,
Soumya


I've listed important known issues above. But we can discuss which
are the blockers for making this feature as a default.

Thoughts?

[1] http://review.gluster.org/#/c/16090/
[2]

https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf
(sections on small directory)
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1628807#c35

[4] https://www.spinics.net/lists/gluster-users/msg34956.html
[5] http://review.gluster.org/#/c/glusterfs/+/20639/
[6] https://bugzilla.redhat.com/show_bug.cgi?id=1631406

regards,
Raghavendra


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 4.0: Schedule and scope clarity (responses needed)

2017-11-21 Thread Soumya Koduri

Hi Shyam,

Now, glusterfs/github [1] reads ~50 issues as being targets in 4.0, and 
among this about 2-4 are marked closed (or done).


Ask1: Request each of you to go through the issue list and coordinate 
with a maintainer, to either mark an issues milestone correctly (i.e 
retain it in 4.0 or move it out) and also leave a comment on the issue 
about its readiness.


Ask 2: If there are issues that you are working on and are not marked 
against the 4.0 milestone, please do the needful for the same.


We would like to get leases support as experimental feature in 4.0. 
Request to consider the same. Details are in [1].


Thanks,
Soumya

[1] https://github.com/gluster/glusterfs/issues/350



Ask 3: Please mail the devel list, on features that are making it to 
4.0, so that the project board can be rightly populated with the issue.


Ask 4: If the 4.0 branching date was extended by another 4 weeks, would 
that enable you to finish additional features that are already marked 
for 4.0? This helps us move the needle on branching to help land the 
right set of features.


Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 3.13: (STM release) Details

2017-09-27 Thread Soumya Koduri

Hi Shyam,

On 09/11/2017 07:51 PM, Shyam Ranganathan wrote:

Hi,

The next community release of Gluster is 3.13, which is a short term 
maintenance release is slated for release on 30th Nov [1] [2]. Thus 
giving a 2 month head room to get to 4.0 work done, while maintaining 
the cadence of releasing every 3 months.


This mail is to announce the scope, schedule and request contributors 
for the features that will make it to 3.13.


1) Scope:
3.13 is expected to be *lean* in scope, as most work is expected to be 
around 4.0 in the coming months. This STM can be considered as a sneak 
peek for some 4.0 features, that will *not* break any compatibility in 
the 3.x release line.


2) The release calendar looks as follows,
- Feature proposal end date: Sep-22-2017
- Branching: Oct-16-2017
- Release: Nov-30-2017

- Feature proposal:
   - Contributors need to request for features that will be a part of 
the 3.13 release, sending a mail to the devel list, and including the 
github issue # for the feature




Sorry for the delay.  I would like to propose below feature for 3.13 
release -


gfapi: APIs needed to register cbk functions for upcalls

Summary: Instead of polling continuously for upcalls, we need APIs for 
the applications to be able to register upcall events and the 
corresponding callback functions to be invoked.


github issue: #315 [1]

Patch (under review): https://review.gluster.org/#/c/18349/

Please let me know if this can be targeted for 3.13 release.

Thanks,
Soumya
[1] https://github.com/gluster/glusterfs/issues/315


- Branching: date by when declared features (appearing in the release 
lane, in the github project board [3]) need to be completed, incomplete 
or features that are not ready by the branching date, would be pushed 
out to the next release.


Thanks,
Shyam

[1] Github milestone: https://github.com/gluster/glusterfs/milestone/6
[2] Release schedule: https://www.gluster.org/release-schedule/
[3] Github project board: https://github.com/gluster/glusterfs/projects/1
___
maintainers mailing list
maintain...@gluster.org
http://lists.gluster.org/mailman/listinfo/maintainers

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Proposed Protocol changes for 4.0: Need feedback.

2017-09-01 Thread Soumya Koduri



On 08/11/2017 06:04 PM, Amar Tumballi wrote:

Hi All,

Below are the proposed protocol changes (ie, XDR changes on the wire) we 
are thinking for Gluster 4.0.


Poornima and I were discussing if we can include volume uuid as part of 
Handshake protocol between protocol/client and protocol/server so that 
clients do not re-connect if the volume was deleted and recreated with 
the same name, eliminating potential issues at upper layers [1].


We haven't looked into details, but the idea is to have glusterd2 send 
volume uuid as part of GETSPEC request to clients & brick processes 
which shall be used by protocol/client & protocol/server (may be along 
with vol name as well) during HNDSK_SETVOLUME.


Poornima/Ram,

Please add if I missed out anything.

Thanks,
Soumya


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1463191
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 3.12 and 4.0: Thoughts on scope

2017-05-18 Thread Soumya Koduri



On 05/16/2017 02:10 PM, Kaushal M wrote:

On 16 May 2017 06:16, "Shyam" > wrote:

Hi,

Let's start a bit early on 3.12 and 4.0 roadmap items, as there have
been quite a few discussions around this in various meetups.

Here is what we are hearing (or have heard), so if you are working
on any of these items, do put up your github issue, and let us know
which release you are targeting these for.

If you are working on something that is not represented here, shout
out, and we can get that added to the list of items in the upcoming
releases.

Once we have a good collection slotted into the respective releases
(on github), we can further announce the same in the users list as well.

3.12:
1. Geo-replication to cloud (ie, s3 or glacier like storage target)
2. Basic level of throttling support on server side to manage the
self-heal processes running.
3. Brick Multiplexing (Better support, more control)
4. GFID to path improvements
5. Resolve issues around disconnects and ping-timeouts
6. Halo with hybrid mode was supposed to be with 3.12
7. Procedures and code for +1 scaling the cluster?
8. Lookup-optimized turned on by default.
9. Thin client (or server side clustering) - phase 1.


10. > We also have the IPV6 patch by FB. This was supposed to go into 
3.11 but

hasn't. The main thing blocking this is having an actual IPV6
environment to test it in.


11. Also we would like to propose support for leases and lock-owner via 
gfAPI in 3.12.


There are already POC patches sent by Poornima and Anoop. They need 
testing (have started) and updates. I have raised github-issue [1] to 
track the same.






4.0: (more thematic than actual features at the moment)
1. Separation of Management and Filesystem layers (aka GlusterD2
related efforts)
2. Scaling Distribution logic
3. Better consistency with rename() and link() operations
4. Thin client || Clustering Logic on server side - Phase 2
5. Quota: re-look at optimal support
6. Improvements in debug-ability and more focus on testing coverage
based on use-cases.

  7. Zero-copy Writes

There was some effort put up by Sachin wrt this feature[2]. I would like 
to take it forward and propose the design changes if needed to be 
consumed by external applications (at-least existing ones like 
NFS-Ganesha or Samba). Github issue#[3]


Thanks,
Soumya

[1] https://github.com/gluster/glusterfs/issues/213
[2] https://review.gluster.org/#/c/14784/
[3] https://github.com/gluster/glusterfs/issues/214



Components moving out of support in possibly 4.0
- Stripe translator
- AFR with just 2 subvolume (either use Arbiter or 3 way replicate)
- Re-validate few performance translator's presence.

Thanks,
Shyam

___
Gluster-devel mailing list
Gluster-devel@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-devel





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Announcing release 3.11 : Scope, schedule and feature tracking

2017-04-26 Thread Soumya Koduri

Hi Shyam,

On 04/25/2017 07:38 PM, Shyam wrote:

On 04/25/2017 07:40 AM, Pranith Kumar Karampuri wrote:



On Thu, Apr 13, 2017 at 8:17 PM, Shyam > wrote:

On 02/28/2017 10:17 AM, Shyam wrote:
1) Halo - Initial Cut (@pranith)


Sorry for the delay in response. Due to some other work engagements, I
couldn't spend time on this, I think I can get this done if there is one
week grace period, by 5thMay. Or I can get this done for 3.12.0. Do let
me know what you think.


Let us backport this to 3.11 post branching, that way the schedule is
kept as is. It would help to stick to the schedule if this gets
backported by, May-05th.

Considering this request, any other features that needs a few more days
to be completed, can target this date by when (post branching) we need
the backport of the feature to 3.11 branch.


If that is the case, can [1] be considered for 3.11 as well. I hadn't 
proposed it earlier as I wasn't sure if it shall be ready on time. I now 
have initial prototype working. I will submit subsequent patches for any 
additional changes or testscript.


Thanks,
Soumya

[1] https://github.com/gluster/glusterfs/issues/174



Thanks,
Shyam
___
Gluster-users mailing list
gluster-us...@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Proposal for an extended READDIRPLUS operation via gfAPI

2017-04-21 Thread Soumya Koduri

Hi,

We currently have readdirplus operation to fetch stat for each of the 
dirents. But that may not be sufficient and often applications may need 
extra information, like for eg., NFS-Ganesha like applications which 
operate on handles need to generate handles for each of those dirents 
returned. So this would require extra calls to the backend, in this case 
LOOKUP (which is very expensive operation) resulting in quite slow

readdir performance.

To address that, introducing this new API using which applications can
request for any extra information to be returned as part of
readdirplus response [1]

Patch: https://review.gluster.org/#/c/15663
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1442950
Github issue#  https://github.com/gluster/glusterfs/issues/174

The interface is designed to be a bit aligned with xstat [2] format as 
suggested by Niels. This helps in extending this operation in future.


The synopsis of this new API - the arguments to be passed and how it can 
be used is mentioned in the patch [3].


Since the initial requirement is to return handles, I have defined it 
with glfs_h_* prefix for now, but it may well be used by applications 
not having to use them (like SMB). Suggestions are welcome.


The current changes are POC and need to tested extensively but we had 
seen a huge performance improvement (with initial patch-set, at least on 
a single brick volume).


Request for comments/suggestions on any improvements needed on the 
interface.


Thanks,
Soumya

[1] https://review.gluster.org/#/c/15663
[2] https://lwn.net/Articles/394298/
[3] https://review.gluster.org/#/c/15663/8/api/src/glfs-handles.h
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] What does xdata mean? "gfid-req"?

2017-03-20 Thread Soumya Koduri



On 03/18/2017 06:51 PM, Zhitao Li wrote:

Hello, everyone,


I am investigating  the difference between stat and lookup operations in
GlusterFs now. In the translator named "md_cache", stat operation will
hit the cache generally, while lookup operation will miss the cache.


The reason is that for lookup operation, md_cache will check whether the
xdata is satisfied. In my case, lookup will include xdata "gfid-req"
filled by fuse-bridge. However, in md_cache, this check never pass
because the load flag of mdc_key "gfid-req"  is always 0.


Client(in this case fuse-bridge) generates gfid and sets it as xdata 
'gfid-req' key during the first lookup so as to let server heal the 
file/dir with the missing gfid (if any) with the newly generated one.


I guess md-cache ignores the LOOKUP fop with this xdata key set as it 
implies that its the first lookup done by the client. Even if it doesn't 
filter it out, the file/dir entry will not be present in the
cache then. Subsequent LOOKUPs should be served from md-cache. Poornima 
(cc'ed) shall be able to clarify the actual reason.


Thanks,
Soumya




Could anyone tell me why "gfid-req" is filled by
fuse-bridge.c(fuse_getattr: nodeid==1->lookup)? What does it mean? And
how xdata is used?





If no xdata, what would happen?

Thank you!


Best regards,
Zhitao Li

Sent from Outlook 


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Consistent time attributes (ctime, atime and mtime) across replica set and distribution set

2017-03-20 Thread Soumya Koduri



On 03/20/2017 08:53 AM, Vijay Bellur wrote:



On Sun, Mar 19, 2017 at 10:14 AM, Amar Tumballi <atumb...@redhat.com
<mailto:atumb...@redhat.com>> wrote:



On Thu, Mar 16, 2017 at 6:52 AM, Soumya Koduri <skod...@redhat.com
<mailto:skod...@redhat.com>> wrote:



On 03/16/2017 02:27 PM, Mohammed Rafi K C wrote:



On 03/15/2017 11:31 PM, Soumya Koduri wrote:

Hi Rafi,

I haven't thoroughly gone through design. But have few
comments/queries which I have posted inline for now .

On 02/28/2017 01:11 PM, Mohammed Rafi K C wrote:

Thanks for the reply , Comments are inline



On 02/28/2017 12:50 PM, Niels de Vos wrote:

On Tue, Feb 28, 2017 at 11:21:55AM +0530,
Mohammed Rafi K C wrote:

Hi All,


We discussed the problem $subject in the
mail thread [1]. Based on the
comments and suggestions I will summarize
the design (Made as
points for
simplicity.)


1) As part of each fop, top layer will
generate a time stamp and
pass it
to the down along with other param.

1.1) This will bring a dependency for
NTP synced clients along
with
servers

What do you mean with "top layer"? Is this on
the Gluster client, or
does the time get inserted on the bricks?

It is the top layer (master xlator) in client graph
like fuse, gfapi,
nfs . My mistake I should have mentioned . Sorry for
that.


These clients shouldn't include internal client
processes like
rebalance, self-heal daemons right? IIUC from [1], we
should avoid
changing times during rebalance and self-heals.

Also what about fops generated from the underlying layers -
getxattr/setxattr which may modify these time attributes?


Since the time stamps are appended from master xlators like
fuse , we
will not have the timestamp for internal daemons as they
don't have
master xlator loaded. internal fops won't generate new
timestamp , even
if we are sending an internal fops from say dht, it will
have only one
time genrated by fuse. So I think this is fine.


Okay. But since self-heal and snapview-server (atleast) seem to
be using gfapi, how can gfapi differentiate between these
internal clients and other applications (like NFS/SMB)? I
thought we need intelligence on server-side to identify such
clients based on pid and then avoid updating timestamp sent by them.


Very good point. Considering we should be using gfapi in future for
most of the internal processes, this becomes more important. Should
we just add it as argument to glfs_init() options? If you set a flag
during init, the ctime xattr is not generated for the fops which
otherwise generate and send them.




Most internal clients are recognized by their special PIDs
(gf_special_pid_t) in various translators. We could store this PID
information in a global location and use that to determine whether
timestamp generation is needed. This way we will not be required to
change any api signature for distinguishing internal clients.

+1. Changing API (even with symbol version) would result in some 
inconvenience for the existing applications. Additionally, if this 
support is made optional (via any option), it shall be easier to isolate 
these changes in gfapi/gluster code-path.


Thanks,
Soumya


Regards,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] REMINDER: Gluster Community Bug Triage meeting at 12:00 UTC (~in 5 minutes)

2016-11-15 Thread Soumya Koduri

Hi all,

Apologies for the late notice.

This meeting is scheduled for anyone who is interested in learning more
about, or assisting with the Bug Triage.

Meeting details:
- location: #gluster-meeting on Freenode IRC
  (https://webchat.freenode.net/?channels=gluster-meeting  )
- date: every Tuesday
- time: 12:00 UTC
  (in your terminal, run: date -d "12:00 UTC")
- agenda: https://public.pad.fsfe.org/p/gluster-bug-triage

Currently the following items are listed:
* Roll Call
* Status of last weeks action items
* Group Triage
* Open Floor

The last two topics have space for additions. If you have a suitable bug
or topic to discuss, please add it to the agenda.

Appreciate your participation.

Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Possible problem introduced by http://review.gluster.org/15573

2016-10-21 Thread Soumya Koduri



On 10/21/2016 02:03 PM, Xavier Hernandez wrote:

Hi Niels,

On 21/10/16 10:03, Niels de Vos wrote:

On Fri, Oct 21, 2016 at 09:03:30AM +0200, Xavier Hernandez wrote:

Hi,

I've just tried Gluster 3.8.5 with Proxmox using gfapi and I
consistently
see a crash each time an attempt to connect to the volume is made.


Thanks, that likely is the same bug as
https://bugzilla.redhat.com/1379241 .


I'm not sure it's the same problem. The crash on my case happens always
and immediately. When creating an image, the file is created but size is
0. The stack trace is quite different also.


Right. The issue reported in sug1379241 looks like the one we hit with 
client-io-threads enabled (already discussed in gluster-devel). 
Disabling that option may prevent the crash seen.


Thanks,
Soumya



Xavi



Satheesaran, could you revert commit 7a50690 from the build that you
were testing, and see if that causes the problem to go away again? Let
me know of you want me to provide RPMs for testing.

Niels



The backtrace of the crash shows this:

#0  pthread_spin_lock () at
../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
#1  0x7fe5345776a5 in fd_unref (fd=0x7fe523f7205c) at fd.c:553
#2  0x7fe53482ba18 in glfs_io_async_cbk (op_ret=,
op_errno=0, frame=, cookie=0x7fe526c67040,
iovec=iovec@entry=0x0, count=count@entry=0)
at glfs-fops.c:839
#3  0x7fe53482beed in glfs_fsync_async_cbk (frame=,
cookie=, this=, op_ret=,
op_errno=,
prebuf=, postbuf=0x7fe5217fe890, xdata=0x0) at
glfs-fops.c:1382
#4  0x7fe520be2eb7 in ?? () from
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.5/xlator/debug/io-stats.so
#5  0x7fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef3ac,
cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
#6  0x7fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef204,
cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
#7  0x7fe525f78219 in dht_fsync_cbk (frame=0x7fe52ceef2d8,
cookie=0x560ef95398e8, this=0x0, op_ret=0, op_errno=0,
prebuf=0x7fe5217fe820, postbuf=0x7fe5217fe890, xdata=0x0)
at dht-inode-read.c:873
#8  0x7fe5261bbc7f in client3_3_fsync_cbk (req=0x7fe525f78030
, iov=0x7fe526c61040, count=8, myframe=0x7fe52ceef130) at
client-rpc-fops.c:975
#9  0x7fe5343201f0 in rpc_clnt_handle_reply (clnt=0x18,
clnt@entry=0x7fe526fafac0, pollin=0x7fe526c3a1c0) at rpc-clnt.c:791
#10 0x7fe53432056c in rpc_clnt_notify (trans=,
mydata=0x7fe526fafaf0, event=, data=0x7fe526c3a1c0) at
rpc-clnt.c:962
#11 0x7fe53431c8a3 in rpc_transport_notify (this=,
event=, data=) at rpc-transport.c:541
#12 0x7fe5283e8d96 in socket_event_poll_in (this=0x7fe526c69440) at
socket.c:2267
#13 0x7fe5283eaf37 in socket_event_handler (fd=,
idx=5,
data=0x7fe526c69440, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397
#14 0x7fe5345ab3f6 in event_dispatch_epoll_handler
(event=0x7fe5217fecc0, event_pool=0x7fe526ca2040) at event-epoll.c:571
#15 event_dispatch_epoll_worker (data=0x7fe527c0f0c0) at
event-epoll.c:674
#16 0x7fe5324140a4 in start_thread (arg=0x7fe5217ff700) at
pthread_create.c:309
#17 0x7fe53214962d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The fd being unreferenced contains this:

(gdb) print *fd
$6 = {
  pid = 97649,
  flags = 2,
  refcount = 0,
  inode_list = {
next = 0x7fe523f7206c,
prev = 0x7fe523f7206c
  },
  inode = 0x0,
  lock = {
spinlock = 1,
mutex = {
  __data = {
__lock = 1,
__count = 0,
__owner = 0,
__nusers = 0,
__kind = 0,
__spins = 0,
__elision = 0,
__list = {
  __prev = 0x0,
  __next = 0x0
}
  },
  __size = "\001", '\000' ,
  __align = 1
}
  },
  _ctx = 0x7fe52ec31c40,
  xl_count = 11,
  lk_ctx = 0x7fe526c126a0,
  anonymous = _gf_false
}

fd->inode is NULL, explaining the cause of the crash. We also see that
fd->refcount is already 0. So I'm wondering if this couldn't be an extra
fd_unref() introduced by that patch.

The crash seems to happen immediately after a graph switch.

Xavi

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Possible problem introduced by http://review.gluster.org/15573

2016-10-21 Thread Soumya Koduri

Hi Xavi,

On 10/21/2016 12:57 PM, Xavier Hernandez wrote:

Looking at the code, I think that the added fd_unref() should only be
called if the fop preparation fails. Otherwise the callback already
unreferences the fd.

Code flow:

* glfs_fsync_async_common() takes an fd ref and calls STACK_WIND passing
that fd.
* Just after that a ref is released.
* When glfs_io_async_cbk() is called another ref is released.

Note that if fop preparation fails, a single fd_unref() is called, but
on success two fd_unref() are called.


Sorry for the inconvenience caused. I think its patch#15573 hasn't 
caused the problem but has highlighted another ref leak in the code.


From the code I see that glfs_io_async_cbk() does fd_unref (glfd->fd) 
but not the fd passed in STACK_WIND_COOKIE() of the fop.


If I take any fop, for eg.,
glfs_fsync_common() {

   fd = glfs_resolve_fd (glfd->fs, subvol, glfd);


}

Here in glfs_resolve_fd ()

fd_t *
__glfs_resolve_fd (struct glfs *fs, xlator_t *subvol, struct glfs_fd *glfd)
{
fd_t *fd = NULL;

if (glfd->fd->inode->table->xl == subvol)
return fd_ref (glfd->fd);

 Here we can see that  we are taking extra ref additional to 
the ref already taken for glfd->fd. That means the caller of this 
function needs to fd_unref(fd) irrespective of subsequent fd_unref 
(glfd->fd).


fd = __glfs_migrate_fd (fs, subvol, glfd);
if (!fd)
return NULL;


if (subvol == fs->active_subvol) { 


fd_unref (glfd->fd);
glfd->fd = fd_ref (fd); 


}

 I think the issue is here during graph_switch(). You have 
mentioned as well that the crash happens post graph_switch. Maybe here 
we are missing an extra ref to be taken for fd additional to glfd->fd. I 
need to look through __glfs_migrate_fd() to confirm that. But these are 
my initial thoughts.


Please let me know your comments.

Thanks,
Soumya




Xavi

On 21/10/16 09:03, Xavier Hernandez wrote:

Hi,

I've just tried Gluster 3.8.5 with Proxmox using gfapi and I
consistently see a crash each time an attempt to connect to the volume
is made.

The backtrace of the crash shows this:

#0  pthread_spin_lock () at ../nptl/sysdeps/x86_64/pthread_spin_lock.S:24
#1  0x7fe5345776a5 in fd_unref (fd=0x7fe523f7205c) at fd.c:553
#2  0x7fe53482ba18 in glfs_io_async_cbk (op_ret=,
op_errno=0, frame=, cookie=0x7fe526c67040,
iovec=iovec@entry=0x0, count=count@entry=0)
at glfs-fops.c:839
#3  0x7fe53482beed in glfs_fsync_async_cbk (frame=,
cookie=, this=, op_ret=,
op_errno=,
prebuf=, postbuf=0x7fe5217fe890, xdata=0x0) at
glfs-fops.c:1382
#4  0x7fe520be2eb7 in ?? () from
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.5/xlator/debug/io-stats.so
#5  0x7fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef3ac,
cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
#6  0x7fe5345d118a in default_fsync_cbk (frame=0x7fe52ceef204,
cookie=0x560ef95398e8, this=0x8, op_ret=0, op_errno=0, prebuf=0x1,
postbuf=0x7fe5217fe890, xdata=0x0) at defaults.c:1508
#7  0x7fe525f78219 in dht_fsync_cbk (frame=0x7fe52ceef2d8,
cookie=0x560ef95398e8, this=0x0, op_ret=0, op_errno=0,
prebuf=0x7fe5217fe820, postbuf=0x7fe5217fe890, xdata=0x0)
at dht-inode-read.c:873
#8  0x7fe5261bbc7f in client3_3_fsync_cbk (req=0x7fe525f78030
, iov=0x7fe526c61040, count=8, myframe=0x7fe52ceef130) at
client-rpc-fops.c:975
#9  0x7fe5343201f0 in rpc_clnt_handle_reply (clnt=0x18,
clnt@entry=0x7fe526fafac0, pollin=0x7fe526c3a1c0) at rpc-clnt.c:791
#10 0x7fe53432056c in rpc_clnt_notify (trans=,
mydata=0x7fe526fafaf0, event=, data=0x7fe526c3a1c0) at
rpc-clnt.c:962
#11 0x7fe53431c8a3 in rpc_transport_notify (this=,
event=, data=) at rpc-transport.c:541
#12 0x7fe5283e8d96 in socket_event_poll_in (this=0x7fe526c69440) at
socket.c:2267
#13 0x7fe5283eaf37 in socket_event_handler (fd=,
idx=5, data=0x7fe526c69440, poll_in=1, poll_out=0, poll_err=0) at
socket.c:2397
#14 0x7fe5345ab3f6 in event_dispatch_epoll_handler
(event=0x7fe5217fecc0, event_pool=0x7fe526ca2040) at event-epoll.c:571
#15 event_dispatch_epoll_worker (data=0x7fe527c0f0c0) at
event-epoll.c:674
#16 0x7fe5324140a4 in start_thread (arg=0x7fe5217ff700) at
pthread_create.c:309
#17 0x7fe53214962d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The fd being unreferenced contains this:

(gdb) print *fd
$6 = {
  pid = 97649,
  flags = 2,
  refcount = 0,
  inode_list = {
next = 0x7fe523f7206c,
prev = 0x7fe523f7206c
  },
  inode = 0x0,
  lock = {
spinlock = 1,
mutex = {
  __data = {
__lock = 1,
__count = 0,
__owner = 0,
__nusers = 0,
__kind = 0,
__spins = 0,
__elision = 0,
__list = {
  __prev = 0x0,
  __next = 0x0
}
  },
  __size = "\001", '\000' ,
  __align = 1
}
  },
  _ctx = 0x7fe52ec31c40,
  

Re: [Gluster-devel] Regression caused to gfapi applications with enabling client-io-threads by default

2016-10-06 Thread Soumya Koduri



On 10/05/2016 07:32 PM, Pranith Kumar Karampuri wrote:



On Wed, Oct 5, 2016 at 2:00 PM, Soumya Koduri <skod...@redhat.com
<mailto:skod...@redhat.com>> wrote:

Hi,

With http://review.gluster.org/#/c/15051/
<http://review.gluster.org/#/c/15051/>, performace/client-io-threads
is enabled by default. But with that we see regression caused to
nfs-ganesha application trying to un/re-export any glusterfs volume.
This shall be the same case with any gfapi application using
glfs_fini().

More details and the RCA can be found at [1].

In short, iot-worker threads spawned  (when the above option is
enabled) are not cleaned up as part of io-threads-xlator->fini() and
those threads could end up accessing invalid/freed memory post
glfs_fini().

The actual fix is to address io-threads-xlator->fini() to cleanup
those threads before exiting. But since those threads' IDs are
currently not stored, the fix could be very intricate and take a
while. So till then to avoid all existing applications crash, I
suggest to keep this option disabled by default and update this
known_issue with enabling this option in the release-notes.

I sent a patch to revert the commit -
http://review.gluster.org/#/c/15616/
<http://review.gluster.org/#/c/15616/> [2]


Good catch! I think the correct fix would be to make sure all threads
die as part of PARENT_DOWN then?


From my understanding, I think these threads should be cleaned up as 
part of xlator->fini().I am not sure if it needs to be handled even for 
PARENT_DOWN as well. Do we re-spawn the threads as part of PARENT_UP then?


Till that part gets fixed, can we make this option back to off by 
default to avoid the regressions with master and release-3.9 branch?


Thanks,
Soumya




Comments/Suggestions are welcome.

Thanks,
Soumya

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1380619#c11
<https://bugzilla.redhat.com/show_bug.cgi?id=1380619#c11>
[2] http://review.gluster.org/#/c/15616/
<http://review.gluster.org/#/c/15616/>




--
Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Regression caused to gfapi applications with enabling client-io-threads by default

2016-10-05 Thread Soumya Koduri

Hi,

With http://review.gluster.org/#/c/15051/, performace/client-io-threads 
is enabled by default. But with that we see regression caused to 
nfs-ganesha application trying to un/re-export any glusterfs volume. 
This shall be the same case with any gfapi application using glfs_fini().


More details and the RCA can be found at [1].

In short, iot-worker threads spawned  (when the above option is enabled) 
are not cleaned up as part of io-threads-xlator->fini() and those 
threads could end up accessing invalid/freed memory post glfs_fini().


The actual fix is to address io-threads-xlator->fini() to cleanup those 
threads before exiting. But since those threads' IDs are currently not 
stored, the fix could be very intricate and take a while. So till then 
to avoid all existing applications crash, I suggest to keep this option 
disabled by default and update this known_issue with enabling this 
option in the release-notes.


I sent a patch to revert the commit - 
http://review.gluster.org/#/c/15616/ [2]


Comments/Suggestions are welcome.

Thanks,
Soumya

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1380619#c11
[2] http://review.gluster.org/#/c/15616/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Minutes from today's Gluster Community Bug Triage meeting (Oct 4 2016)

2016-10-04 Thread Soumya Koduri

Hi,

Please find the minutes of today's Gluster Community Bug Triage meeting 
at the links posted below. We had very few participants today as many 
are traveling. Thanks to hgowtham and ankitraj for joining.


Minutes: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-10-04/gluster_bug_triage.2016-10-04-12.01.html
Minutes (text): 
https://meetbot.fedoraproject.org/gluster-meeting/2016-10-04/gluster_bug_triage.2016-10-04-12.01.txt
Log: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-10-04/gluster_bug_triage.2016-10-04-12.01.log.html 



Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] REMINDER: Gluster Community Bug Triage meeting at 12:00 UTC (~in 30 minutes)

2016-10-04 Thread Soumya Koduri

Hi all,

This meeting is scheduled for anyone who is interested in learning more
about, or assisting with the Bug Triage.

Meeting details:
- location: #gluster-meeting on Freenode IRC
 (https://webchat.freenode.net/?channels=gluster-meeting  )
- date: every Tuesday
- time: 12:00 UTC
 (in your terminal, run: date -d "12:00 UTC")
- agenda: https://public.pad.fsfe.org/p/gluster-bug-triage

Currently the following items are listed:
* Roll Call
* Status of last weeks action items
* Group Triage
* Open Floor

The last two topics have space for additions. If you have a suitable bug
or topic to discuss, please add it to the agenda.

Appreciate your participation.

Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Dht readdir filtering out names

2016-09-30 Thread Soumya Koduri



On 09/30/2016 10:08 AM, Pranith Kumar Karampuri wrote:

Does samba/gfapi/nfs-ganesha have options to disable readdirp?


AFAIK, currently there is no option to disable/enable readdirp in gfapi 
& nfs-ganesha (not sure about samba). But looks like nfs-ganesha seem to 
be always using readdir, which I plan to change it to readdirp in the 
near future to check if it improves performance of stat on small-files. 
Could you please summarize the issues with using readdirp?


Thanks,
Soumya



On Fri, Sep 30, 2016 at 10:04 AM, Pranith Kumar Karampuri
> wrote:

What if the lower xlators want to set the entry->inode to NULL and
clear the entry->d_stat to force a lookup on the name? i.e.
gfid-split-brain/ia_type mismatches.

On Fri, Sep 30, 2016 at 10:00 AM, Raghavendra Gowdappa
> wrote:



- Original Message -
> From: "Raghavendra Gowdappa" >
> To: "Pranith Kumar Karampuri" >
> Cc: "Shyam Ranganathan" >, "Nithya
Balachandran" >, "Gluster Devel"
> >
> Sent: Friday, September 30, 2016 9:58:34 AM
> Subject: Re: Dht readdir filtering out names
>
>
>
> - Original Message -
> > From: "Pranith Kumar Karampuri" >
> > To: "Raghavendra Gowdappa" >
> > Cc: "Shyam Ranganathan" >, "Nithya Balachandran"
> > >, "Gluster
Devel"
> > >
> > Sent: Friday, September 30, 2016 9:53:44 AM
> > Subject: Re: Dht readdir filtering out names
> >
> > On Fri, Sep 30, 2016 at 9:50 AM, Raghavendra Gowdappa
>
> > wrote:
> >
> > >
> > >
> > > - Original Message -
> > > > From: "Pranith Kumar Karampuri" >
> > > > To: "Raghavendra Gowdappa" >
> > > > Cc: "Shyam Ranganathan" >, "Nithya Balachandran" <
> > > nbala...@redhat.com >,
"Gluster Devel"
> > > > >
> > > > Sent: Friday, September 30, 2016 9:15:04 AM
> > > > Subject: Re: Dht readdir filtering out names
> > > >
> > > > On Fri, Sep 30, 2016 at 9:13 AM, Raghavendra Gowdappa <
> > > rgowd...@redhat.com >
> > > > wrote:
> > > >
> > > > > dht_readdirp_cbk has different behaviour for
directories and files.
> > > > >
> > > > > 1. If file, pick the dentry (passed from subvols as
part of readdirp
> > > > > response) if the it corresponds to data file.
> > > > > 2. If directory pick the dentry if readdirp response
is from
> > > hashed-subvol.
> > > > >
> > > > > In all other cases, the dentry is skipped and not
passed to higher
> > > > > layers/application. To elaborate, the dentries which
are ignored are:
> > > > > 1. dentries corresponding to linkto files.
> > > > > 2. dentries from non-hashed subvols corresponding to
directories.
> > > > >
> > > > > Since the behaviour is different for different
filesystem objects,
> > > > > dht
> > > > > needs ia_type to choose its behaviour.
> > > > >
> > > > > - Original Message -
> > > > > > From: "Pranith Kumar Karampuri" >
> > > > > > To: "Shyam Ranganathan" >, "Raghavendra
> > > Gowdappa" <
> > > > > rgowd...@redhat.com >,
"Nithya Balachandran"
> > > > > > >
> > > > > > Cc: "Gluster Devel" >
> > > > > > Sent: Friday, September 30, 2016 8:39:28 AM
> > > > > > Subject: Dht readdir filtering out names
> > > > > >
> > > > > > hi,
> > > > > >In dht_readdirp_cbk() there is a 

Re: [Gluster-devel] Dht readdir filtering out names

2016-09-30 Thread Soumya Koduri



On 09/30/2016 03:02 PM, Poornima Gurusiddaiah wrote:

In gfapi, we pass down readdirp, irrespective of whether the application
called readdir/readdirp.
Hence the behaviour will be same for samba and Ganesha i suppose.


But in gfapi, I see clear distinction for readdir and readdirp calls


>>>>
if (plus)
ret = syncop_readdirp (subvol, fd, 131072, glfd->offset,
   , NULL, NULL);
else
ret = syncop_readdir (subvol, fd, 131072, glfd->offset,
  , NULL, NULL);
DECODE_SYNCOP_ERR (ret);
<<<<

And nfs-ganesha doesn't set 'plus' boolean atm. So I assume think it 
doesn't get converted to readdirp. Or is it that syncop_readdir is 
converted to readdirp in any of the underlying xlators?


Thanks,
Soumya



Regards,
Poornima



*From: *"Pranith Kumar Karampuri" <pkara...@redhat.com>
*To: *"Raghavendra Gowdappa" <rgowd...@redhat.com>, "Poornima
Gurusiddaiah" <pguru...@redhat.com>, "Raghavendra Talur"
<rta...@redhat.com>, "Soumya Koduri" <skod...@redhat.com>
*Cc: *"Shyam Ranganathan" <srang...@redhat.com>, "Nithya
Balachandran" <nbala...@redhat.com>, "Gluster Devel"
<gluster-devel@gluster.org>
*Sent: *Friday, September 30, 2016 12:38:06 AM
*Subject: *Re: Dht readdir filtering out names

Does samba/gfapi/nfs-ganesha have options to disable readdirp?

On Fri, Sep 30, 2016 at 10:04 AM, Pranith Kumar Karampuri
<pkara...@redhat.com <mailto:pkara...@redhat.com>> wrote:

What if the lower xlators want to set the entry->inode to NULL
and clear the entry->d_stat to force a lookup on the name? i.e.
gfid-split-brain/ia_type mismatches.

On Fri, Sep 30, 2016 at 10:00 AM, Raghavendra Gowdappa
<rgowd...@redhat.com <mailto:rgowd...@redhat.com>> wrote:



- Original Message -
> From: "Raghavendra Gowdappa" <rgowd...@redhat.com 
<mailto:rgowd...@redhat.com>>
> To: "Pranith Kumar Karampuri" <pkara...@redhat.com 
<mailto:pkara...@redhat.com>>
> Cc: "Shyam Ranganathan" <srang...@redhat.com 
<mailto:srang...@redhat.com>>, "Nithya
Balachandran" <nbala...@redhat.com
<mailto:nbala...@redhat.com>>, "Gluster Devel"
> <gluster-devel@gluster.org <mailto:gluster-devel@gluster.org>>
> Sent: Friday, September 30, 2016 9:58:34 AM
> Subject: Re: Dht readdir filtering out names
>
>
>
> - Original Message -
> > From: "Pranith Kumar Karampuri" <pkara...@redhat.com
<mailto:pkara...@redhat.com>>
> > To: "Raghavendra Gowdappa" <rgowd...@redhat.com
<mailto:rgowd...@redhat.com>>
> > Cc: "Shyam Ranganathan" <srang...@redhat.com
<mailto:srang...@redhat.com>>, "Nithya Balachandran"
> > <nbala...@redhat.com <mailto:nbala...@redhat.com>>,
"Gluster Devel"
> > <gluster-devel@gluster.org
<mailto:gluster-devel@gluster.org>>
> > Sent: Friday, September 30, 2016 9:53:44 AM
> > Subject: Re: Dht readdir filtering out names
> >
> > On Fri, Sep 30, 2016 at 9:50 AM, Raghavendra Gowdappa
<rgowd...@redhat.com <mailto:rgowd...@redhat.com>>
> > wrote:
> >
> > >
> > >
> > > - Original Message -
> > > > From: "Pranith Kumar Karampuri" <pkara...@redhat.com
<mailto:pkara...@redhat.com>>
> > > > To: "Raghavendra Gowdappa" <rgowd...@redhat.com
<mailto:rgowd...@redhat.com>>
> > > > Cc: "Shyam Ranganathan" <srang...@redhat.com
<mailto:srang...@redhat.com>>, "Nithya Balachandran" <
> > > nbala...@redhat.com <mailto:nbala...@redhat.com>>,
"Gluster Devel"
> > > > <gluster-devel@gluster.org
<mailto:gluster-devel@gluster.org>>
> > > > Sent: Friday, September 30, 2016 9:15:04 AM
> > > > Subject: Re: Dht readdir filte

Re: [Gluster-devel] [Gluster-users] GlusterFs upstream bugzilla components Fine graining

2016-09-28 Thread Soumya Koduri

Hi,

On 09/28/2016 11:24 AM, Muthu Vigneshwaran wrote:

> +- Component GlusterFS
> |
> |
> |  +Subcomponent nfs

Maybe its time to change it to 'gluster-NFS/native NFS'. Niels/Kaleb?


+- Component gdeploy

|  |

|  +Subcomponent sambha

|  +Subcomponent hyperconvergence

|  +Subcomponent RHSC 2.0


gdeploy has support for 'ganesha' configuration as well. Also would it 
help if we have additional subcomponent 'glusterfs' as well, may be as 
the default one (any new support being added can fall under that 
category)? Request Sac to comment.


Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] review request - Change the way client uuid is built

2016-09-23 Thread Soumya Koduri



On 09/23/2016 11:48 AM, Poornima Gurusiddaiah wrote:



- Original Message -

From: "Niels de Vos" 
To: "Raghavendra Gowdappa" 
Cc: "Gluster Devel" 
Sent: Wednesday, September 21, 2016 3:52:39 AM
Subject: Re: [Gluster-devel] review request - Change the way client uuid is 
built

On Wed, Sep 21, 2016 at 01:47:34AM -0400, Raghavendra Gowdappa wrote:

Hi all,

[1] might have implications across different components in the stack. Your
reviews are requested.



rpc : Change the way client uuid is built

Problem:
Today the main users of client uuid are protocol layers, locks, leases.
Protocolo layers requires each client uuid to be unique, even across
connects and disconnects. Locks and leases on the server side also use
the same client uid which changes across graph switches and across
file migrations. Which makes the graph switch and file migration
tedious for locks and leases.
As of today lock migration across graph switch is client driven,
i.e. when a graph switches, the client reassociates all the locks(which
were associated with the old graph client uid) with the new graphs
client uid. This means flood of fops to get and set locks for each fd.
Also file migration across bricks becomes even more difficult as
client uuid for the same client, is different on the other brick.

The exact set of issues exists for leases as well.

Hence the solution:
Make the migration of locks and leases during graph switch and migration,
server driven instead of client driven. This can be achieved by changing
the format of client uuid.

Client uuid currently:
%s(ctx uuid)-%s(protocol client name)-%d(graph id)%s(setvolume
count/reconnect count)

Proposed Client uuid:
"CTX_ID:%s-GRAPH_ID:%d-PID:%d-HOST:%s-PC_NAME:%s-RECON_NO:%s"
-  CTX_ID: This is will be constant per client.
-  GRAPH_ID, PID, HOST, PC_NAME(protocol client name), RECON_NO(setvolume
count)
remains the same.

With this, the first part of the client uuid, CTX_ID+GRAPH_ID remains
constant across file migration, thus the migration is made easier.

Locks and leases store only the first part CTX_ID+GRAPH_ID as their
client identification. This means, when the new graph connects,


Can we assume that CTX_ID+GRAPH_ID shall be unique across clients all 
the time? If not, wouldn't we get into issues of clientB's locks/leases 
not conflicting with locks/leases of clientA's.



the locks and leases xlator should walk through their database
to update the client id, to have new GRAPH_ID. Thus the graph switch
is made server driven and saves a lot of network traffic.


What is the plan to have the CTX_ID+GRAPH_ID shared over multiple gfapi
applications? This would be important for NFS-Ganesha failover where one
NFS-Ganesha process is stopped, and the NFS-Clients (by virtual-ip) move
to an other NFS-Ganesha server.


Sharing it across multiple gfapi applications is currently not supported.
Do you mean, setting the CTX_ID+GRAPH_ID at the init of the other client,
or during replay of locks during the failover?
If its the former, we need an api in gfapi to take the CTX_ID+GRAPH_ID as
an argument and other things.

Will there be a way to set CTX_ID(+GRAPH_ID?) through libgfapi? That
would allow us to add a configuration option to NFS-Ganesha and have the
whole NFS-Ganesha cluster use the same locking/leases.

Ah, ok. the whole of cluster will have the same CTX_ID(+GRAPH_ID?), but then
the cleanup logic will not work, as the disconnect cleanup happens as soon as
one of the NFS-Ganesha disconnects?


yes. If we have uniform ID (CTX_ID+GRAPH_ID?) across clients, we should 
keep locks/leases as long as even one client is connected and not clean 
them up as part of fd cleanup during disconnects.


Thanks,
Soumya



This patch doesn't eliminate the migration that is required during graph switch,
it still is necessary, but it can be server driven instead of client driven.


Thanks,
Niels




Change-Id: Ia81d57a9693207cd325d7b26aee4593fcbd6482c
BUG: 1369028
Signed-off-by: Poornima G 
Signed-off-by: Susant Palai 



[1] http://review.gluster.org/#/c/13901/10/

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Fixing setfsuid/gid problems in posix xlator

2016-09-23 Thread Soumya Koduri



On 09/23/2016 08:28 AM, Pranith Kumar Karampuri wrote:

hi,
   Jiffin found an interesting problem in posix xlator where we have
never been using setfsuid/gid (http://review.gluster.org/#/c/15545/),
what I am seeing regressions after this is, if the files are created
using non-root user then the file creation fails because that user
doesn't have permissions to create the gfid-link. So it seems like the
correct way forward for this patch is to write wrappers around
sys_ to do setfsuid/gid do the actual operation requested and
then set it back to old uid/gid and then do the internal operations. I
am planning to write posix_sys_() to do the same, may be a macro?.


Why not otherwise around? As in can we switch to superuser when required 
so that we know what all internal operations need root access and avoid 
misusing it.


Thanks,
Soumya


I need inputs from you guys to let me know if I am on the right path
and if you see any issues with this approach.

--
Pranith


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Upcall details for NLINK

2016-09-19 Thread Soumya Koduri



On 09/19/2016 10:08 AM, Niels de Vos wrote:

Duh, and now with the attachment. I'm going to get some coffee now.


On Mon, Sep 19, 2016 at 06:22:58AM +0200, Niels de Vos wrote:

Hey Soumya,

do we have a description of the different actions that we expect/advise
users of upcall to take? I'm looking at the flags that are listed in
libglusterfs/src/upcall-utils.h and api/src/glfs-handles.h and passed in
the glfs_callback_inode_arg structure from api/src/glfs-handles.h.


Not very detailed. But a minimal description of each of these flags is 
provided in the definitions of these flags (now moved to the 
file:upcall-utils.h).




We have a NLINK flag, but that does not seem to carry the stat/iatt
attributes for the changed inode. It seems we send an upcall on file
removal that incudes NLINK, and just after that we send an other one
with FORGET.


From the code, I see that it does seem to be sending stat of the inode 
being (un)linked. May be if it is the last link to be removed, stat 
structure could have been NULL. Could you please check with files with 
link count >1?


FORGET was not exactly related to removal/unlink of the file. It is to 
be sent whenever (protocol/)server does inode_forget which could be for 
various other reasons. But yeah, as you have said, when the last link is 
removed, since a FORGET gets definitely sent which invalidates the inode 
cache entry, there is no point sending NLINK flag just before that. We 
could have a check to avoid NLINK upcall if it is the last link of the 
file being removed.



Thanks,
Soumya



This attachment in Bugzilla shows the behaviour:
  https://bugzilla.redhat.com/attachment.cgi?id=1202190

You'll need https://code.wireshark.org/review/17776 to decode the flags,
so I'll attach the tshark output to this email for your convenience.
  $ tshark -r /tmp/upcall_xid.1474190284.pcap.gz -V 'glusterfs.cbk'

Question: For the NLINK flag, should we not include the stat/iatt of the
modified inode? And only if the iatt->nlink is 0, a FORGET should get
sent? NLINK would then only happen when a (not the last) hardlink is
removed.

Thanks,
Niels





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster Developer Summit 2016 Talk Schedule

2016-09-15 Thread Soumya Koduri



On 09/16/2016 03:48 AM, Amye Scavarda wrote:


On Thu, Sep 15, 2016 at 8:26 AM, Pranith Kumar Karampuri
<pkara...@redhat.com <mailto:pkara...@redhat.com>> wrote:



On Thu, Sep 15, 2016 at 2:37 PM, Soumya Koduri <skod...@redhat.com
<mailto:skod...@redhat.com>> wrote:

Hi Amye,

Is there any plan to record these talks?


I had same question.


There is no planned recording for this, however, what we've done before
is ask people to record one of their practice runs through BlueJeans or
Hangouts.

We'll post those recordings through the Gluster Community channels.


Great. Thanks

-Soumya


- amye


Thanks,
Soumya

On 09/15/2016 03:09 AM, Amye Scavarda wrote:

Thanks to all that submitted talks, and thanks to the
program committee
who helped select this year's content.

This will be posted on the main Summit page as
well: gluster.org/events/summit2016
<http://gluster.org/events/summit2016>
<http://gluster.org/events/summit2016
<http://gluster.org/events/summit2016>>

October 6
9:00am - 9:25amOpening Session
9:30 - 9:55amDHT: current design, (dis)advantages,
challenges - A
perspective- Raghavendra Gowdappa
10:00am - 10:25am  DHT2 - O Brother, Where Art Thou? - Shyam
Ranganathan
10:30am - 10:55am Performance bottlenecks for metadata
workload in
Gluster - Poornima Gurusiddaiah ,  Rajesh Joseph
11:00am - 11:25am The life of a consultant listed on
gluster.org <http://gluster.org>
<http://gluster.org> - Ivan Rossi
11:30am - 11:55am Architecture of the High Availability
Solution for
Ganesha and Samba - Kaleb Keithley
12:00 - 1:00pmLunch
1:00pm - 1:25pmChallenges with Gluster and Persistent Memory
- Dan Lambright
1:25pm - 1:55pmThrottling in gluster  - Ravishankar
Narayanankutty
2:00pm  - 2:25pmGluster: The Ugly Parts - Jeff Darcy
2:30pm  - 2:55pmDeterministic Releases and How to Get There
- Nigel Babu
3:00pm - 3:25pmBreak
3:30pm - 4:00pmBirds of a Feather Sessions
4:00pm - 4:55pmBirds of a Feather Sessions
Evening Reception to be announced


October 7
9:00am - 9:25amGFProxy: Scaling the GlusterFS FUSE Client -
Shreyas Siravara
9:30 - 9:55amSharding in GlusterFS - Past, Present and
Future - Krutika
Dhananjay
10:00am - 10:25amObject Storage with Gluster - Prashanth Pai
10:30am - 10:55am Containers and Perisstent Storage for
Containers. -
Humble Chirammal, Luis Pabon
11:00am - 11:25am Gluster as Block Store in Containers  -
Prasanna Kalever
11:30am - 11:55amAn Update on GlusterD-2.0 - Kaushal Madappa
12:00 - 1:00pmLunch
1:00pm - 1:25pmIntegration of GlusterFS in to Commvault data
platform  -
Ankireddypalle Reddy
1:30-1:55pmBootstrapping Challenge
2:00pm  - 2:25pmPractical Glusto Example - Jonathan Holloway
2:30pm  - 2:55pmState of Gluster Performance - Manoj Pillai
3:00pm - 3:25pmServer side replication - Avra Sengupta
3:30pm - 4:00pmBirds of a Feather Sessions
4:00pm - 4:55pmBirds of a Feather Sessions
5:00pm - 5:30pm Closing

--
Amye Scavarda | a...@redhat.com <mailto:a...@redhat.com>
<mailto:a...@redhat.com <mailto:a...@redhat.com>> | Gluster
Community Lead


___
Gluster-devel mailing list
Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-devel
<http://www.gluster.org/mailman/listinfo/gluster-devel>

___
Gluster-devel mailing list
Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-devel
<http://www.gluster.org/mailman/listinfo/gluster-devel>




--
Pranith




--
Amye Scavarda | a...@redhat.com <mailto:a...@redhat.com> | Gluster
Community Lead

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster Developer Summit 2016 Talk Schedule

2016-09-15 Thread Soumya Koduri

Hi Amye,

Is there any plan to record these talks?

Thanks,
Soumya

On 09/15/2016 03:09 AM, Amye Scavarda wrote:

Thanks to all that submitted talks, and thanks to the program committee
who helped select this year's content.

This will be posted on the main Summit page as
well: gluster.org/events/summit2016 

October 6
9:00am - 9:25amOpening Session
9:30 - 9:55amDHT: current design, (dis)advantages, challenges - A
perspective- Raghavendra Gowdappa
10:00am - 10:25am  DHT2 - O Brother, Where Art Thou? - Shyam Ranganathan
10:30am - 10:55am Performance bottlenecks for metadata workload in
Gluster - Poornima Gurusiddaiah ,  Rajesh Joseph
11:00am - 11:25am The life of a consultant listed on gluster.org
 - Ivan Rossi
11:30am - 11:55am Architecture of the High Availability Solution for
Ganesha and Samba - Kaleb Keithley
12:00 - 1:00pmLunch
1:00pm - 1:25pmChallenges with Gluster and Persistent Memory - Dan Lambright
1:25pm - 1:55pmThrottling in gluster  - Ravishankar Narayanankutty
2:00pm  - 2:25pmGluster: The Ugly Parts - Jeff Darcy
2:30pm  - 2:55pmDeterministic Releases and How to Get There - Nigel Babu
3:00pm - 3:25pmBreak
3:30pm - 4:00pmBirds of a Feather Sessions
4:00pm - 4:55pmBirds of a Feather Sessions
Evening Reception to be announced


October 7
9:00am - 9:25amGFProxy: Scaling the GlusterFS FUSE Client - Shreyas Siravara
9:30 - 9:55amSharding in GlusterFS - Past, Present and Future - Krutika
Dhananjay
10:00am - 10:25amObject Storage with Gluster - Prashanth Pai
10:30am - 10:55am Containers and Perisstent Storage for Containers. -
Humble Chirammal, Luis Pabon
11:00am - 11:25am Gluster as Block Store in Containers  - Prasanna Kalever
11:30am - 11:55amAn Update on GlusterD-2.0 - Kaushal Madappa
12:00 - 1:00pmLunch
1:00pm - 1:25pmIntegration of GlusterFS in to Commvault data platform  -
Ankireddypalle Reddy
1:30-1:55pmBootstrapping Challenge
2:00pm  - 2:25pmPractical Glusto Example - Jonathan Holloway
2:30pm  - 2:55pmState of Gluster Performance - Manoj Pillai
3:00pm - 3:25pmServer side replication - Avra Sengupta
3:30pm - 4:00pmBirds of a Feather Sessions
4:00pm - 4:55pmBirds of a Feather Sessions
5:00pm - 5:30pm Closing

--
Amye Scavarda | a...@redhat.com  | Gluster
Community Lead


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Checklist for ganesha FSAL plugin integration testing for 3.9

2016-09-06 Thread Soumya Koduri

CCin gluster-devel & users ML. Somehow they got missed in my earlier reply.

Thanks,
Soumya

On 09/06/2016 12:19 PM, Soumya Koduri wrote:


On 09/03/2016 12:44 AM, Pranith Kumar Karampuri wrote:

hi,
Did you get a chance to decide on the nfs-ganesha integrations
tests that need to be run before doing an upstream gluster release?
Could you let me know who will be providing with the list?



I have added few basic test cases for NFS-Ganesha FSAL and Upcall
component in the etherpad shared. Please check and update the tests
which you recommend.

Thanks,
Soumya


I can update it at
https://public.pad.fsfe.org/p/gluster-component-release-checklist
<https://public.pad.fsfe.org/p/gluster-component-release-checklist>

--
Aravinda & Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.9. feature freeze status check

2016-08-29 Thread Soumya Koduri



On 08/26/2016 09:38 PM, Pranith Kumar Karampuri wrote:

hi,
  Now that we are almost near the feature freeze date (31st of Aug),
want to get a sense if any of the status of the features.

Please respond with:
1) Feature already merged
2) Undergoing review will make it by 31st Aug
3) Undergoing review, but may not make it by 31st Aug
4) Feature won't make it for 3.9.

I added the features that were not planned(i.e. not in the 3.9 roadmap
page) but made it to the release and not planned but may make it to
release at the end of this mail.
If you added a feature on master that will be released as part of 3.9.0
but forgot to add it to roadmap page, please let me know I will add it.

Here are the features planned as per the roadmap:
1) Throttling
Feature owner: Ravishankar

2) Trash improvements
Feature owners: Anoop, Jiffin

3) Kerberos for Gluster protocols:
Feature owners: Niels, Csaba

4) SELinux on gluster volumes:
Feature owners: Niels, Manikandan

5) Native sub-directory mounts:
Feature owners: Kaushal, Pranith

6) RichACL support for GlusterFS:
Feature owners: Rajesh Joseph

7) Sharemodes/Share reservations:
Feature owners: Raghavendra Talur, Poornima G, Soumya Koduri, Rajesh
Joseph, Anoop C S

8) Integrate with external resource management software
Feature owners: Kaleb Keithley, Jose Rivera

9) Python Wrappers for Gluster CLI Commands
Feature owners: Aravinda VK

10) Package and ship libgfapi-python
Feature owners: Prashant Pai

11) Management REST APIs
Feature owners: Aravinda VK

12) Events APIs
Feature owners: Aravinda VK

13) CLI to get state representation of a cluster from the local glusterd pov
Feature owners: Samikshan Bairagya

14) Posix-locks Reclaim support
Feature owners: Soumya Koduri


Sorry this feature will not make it 3.9. Hopefully will get it in the 
next release.




15) Deprecate striped volumes
Feature owners: Vijay Bellur, Niels de Vos

16) Improvements in Gluster NFS-Ganesha integration
Feature owners: Jiffin Tony Thottan, Soumya Koduri


This one is already merged.

Thanks,
Soumya



*The following need to be added to the roadmap:*

Features that made it to master already but were not palnned:
1) Multi threaded self-heal in EC
Feature owner: Pranith (Did this because serkan asked for it. He has 9PB
volume, self-healing takes a long time :-/)

2) Lock revocation (Facebook patch)
Feature owner: Richard Wareing

Features that look like will make it to 3.9.0:
1) Hardware extension support for EC
Feature owner: Xavi

2) Reset brick support for replica volumes:
Feature owner: Anuradha

3) Md-cache perf improvements in smb:
Feature owner: Poornima

--
Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Support to reclaim locks (posix) provided lkowner & range matches

2016-08-10 Thread Soumya Koduri
We (CC'ed) had a brief discussion on use-cases and semantics of the lock 
reclamation support. Few of the changes suggested to the existing 
proposal are -


1) Allow reclamation of locks only if there is an existing lock present 
on the file with the same owner. The reason it is needed is to not allow 
applications misuse this feature and to maintain data-integrity. That is 
in case if the lock is already cleaned up or if there is a lock owned by 
another client, since server cannot guarantee the file state, it should 
reject reclamation request and let the application know that previous 
lock no longer exists.


2) With (1) above, we shall need the server to hold on to the locks for 
a longer time instead of cleaning them up immediately as soon as 
disconnect event happens with older client. This could be achieved by 
enabling grace_timer support (as mentioned by Vijay earlier) on the 
server-side.


I have updated the feature-spec[1] with the details. Comments are welcome.

Thanks,
Soumya

[1] http://review.gluster.org/#/c/15053/3/under_review/reclaim-locks.md

On 07/28/2016 07:29 PM, Soumya Koduri wrote:



On 07/27/2016 02:38 AM, Vijay Bellur wrote:


On 07/26/2016 05:56 AM, Soumya Koduri wrote:

Hi Vijay,

On 07/26/2016 12:13 AM, Vijay Bellur wrote:

On 07/22/2016 08:44 AM, Soumya Koduri wrote:

Hi,

In certain scenarios (esp.,in highly available environments), the
application may have to fail-over/connect to a different glusterFS
client while the I/O is happening. In such cases until there is a ping
timer expiry and glusterFS server cleans up the locks held by the
older
glusterFS client, the application will not be able to reclaim their
lost
locks. To avoid that we need support in Gluster to let clients reclaim
the existing locks provided lkwoner and the lock range matches.



If the server detects a disconnection, it goes about cleaning up the
locks held by the disconnected client. Only if the failover connection
happens before this server cleanup the outlined scheme would work.Since
there is no ping timer on the server, do you propose to have a grace
timer on the server?


But we are looking for a solution which can work in active-active
configuration as well. We need to handle cases where in the connection
between server and the old-client is still in use, which can happen
during load-balancing or failback.

Different cases which I can outline are:

Application Client - (AC)
Application/GlusterClient 1 - GC1
Application/GlusterClient 2 - GC2
Gluster Server (GS)

1) Active-Passive config  (service gone down)

AC > GC1  > GS (GC2 is not active)

| (failover)
v

AC > GC2  > GS (GC1 connection gets dropped and GC2 establishes
connection)

In this case, we can have grace timer to allow reclaims only for certain
time post GC2 (any) rpc connection establishment.

2) Active-Active config  (service gone down)

AC > GC1  > GS
 ^
 |
 GC2  ---

| (failover)
v

AC > GC2  > GS (GC1 connection gets dropped)

The grace timer then shall not get triggered in this case. But at-least
the locks from GC1 gets cleaned post its connection cleanup.



grace timer is not required if lock reclamation can happen before the
old connection between GC1 & GS gets dropped. Is this guaranteed to
happen every time?


Not all the time but more likely since failover time is usually lesser
than ping timer / rpc connection expiry time.





3) Active-Active config  (both the services active/load-balancing)
This is the trick one.

AC > GC1  > GS
 ^
 |
 GC2  ---

| (load-balancing/failback)
v

 GC1  > GS
 ^
 |
AC > GC2  ---

The locks taken by GC1 shall end up being on the server for ever unless
we restart either GC1 or the server.



Yes, this is trickier. The behavior is dependent on how the application
performs a failback. How do we handle this with Ganesha today? Since the
connection between nfs client and Ganesha/GC1 is broken, would it not
send cleanup requests on locks it held on behalf of that client?


Yes. I checked within NFS-Ganesha community too. There seems to be a
provision in NFS-Ganesha to trigger an event upon receiving which it can
flush the locks associated with an IP. We could send this event to the
active servers (in this case GC1) while triggering fail-back. So from
NFS-Ganesha perspective, this seems to be taken care of. Unless some
other application (SMB3 handles?) has this use-case, we may for now can
ignore it.




Considering above cases, looks like we may need to allow reclaim of the
locks all the time. Please suggest if I have missed out any details.



I agree that lock reclamation is needed. Grace timeout behavior does
need more thought for all these cases. Given the involved nature of this
problem, it might be better to write down a more detailed spec that
discusses all these cases for a m

Re: [Gluster-devel] Support to reclaim locks (posix) provided lkowner & range matches

2016-07-28 Thread Soumya Koduri



On 07/27/2016 02:38 AM, Vijay Bellur wrote:


On 07/26/2016 05:56 AM, Soumya Koduri wrote:

Hi Vijay,

On 07/26/2016 12:13 AM, Vijay Bellur wrote:

On 07/22/2016 08:44 AM, Soumya Koduri wrote:

Hi,

In certain scenarios (esp.,in highly available environments), the
application may have to fail-over/connect to a different glusterFS
client while the I/O is happening. In such cases until there is a ping
timer expiry and glusterFS server cleans up the locks held by the older
glusterFS client, the application will not be able to reclaim their
lost
locks. To avoid that we need support in Gluster to let clients reclaim
the existing locks provided lkwoner and the lock range matches.



If the server detects a disconnection, it goes about cleaning up the
locks held by the disconnected client. Only if the failover connection
happens before this server cleanup the outlined scheme would work.Since
there is no ping timer on the server, do you propose to have a grace
timer on the server?


But we are looking for a solution which can work in active-active
configuration as well. We need to handle cases where in the connection
between server and the old-client is still in use, which can happen
during load-balancing or failback.

Different cases which I can outline are:

Application Client - (AC)
Application/GlusterClient 1 - GC1
Application/GlusterClient 2 - GC2
Gluster Server (GS)

1) Active-Passive config  (service gone down)

AC > GC1  > GS (GC2 is not active)

| (failover)
v

AC > GC2  > GS (GC1 connection gets dropped and GC2 establishes
connection)

In this case, we can have grace timer to allow reclaims only for certain
time post GC2 (any) rpc connection establishment.

2) Active-Active config  (service gone down)

AC > GC1  > GS
 ^
 |
 GC2  ---

| (failover)
v

AC > GC2  > GS (GC1 connection gets dropped)

The grace timer then shall not get triggered in this case. But at-least
the locks from GC1 gets cleaned post its connection cleanup.



grace timer is not required if lock reclamation can happen before the
old connection between GC1 & GS gets dropped. Is this guaranteed to
happen every time?


Not all the time but more likely since failover time is usually lesser 
than ping timer / rpc connection expiry time.






3) Active-Active config  (both the services active/load-balancing)
This is the trick one.

AC > GC1  > GS
 ^
 |
 GC2  ---

| (load-balancing/failback)
v

 GC1  > GS
 ^
 |
AC > GC2  ---

The locks taken by GC1 shall end up being on the server for ever unless
we restart either GC1 or the server.



Yes, this is trickier. The behavior is dependent on how the application
performs a failback. How do we handle this with Ganesha today? Since the
connection between nfs client and Ganesha/GC1 is broken, would it not
send cleanup requests on locks it held on behalf of that client?

Yes. I checked within NFS-Ganesha community too. There seems to be a 
provision in NFS-Ganesha to trigger an event upon receiving which it can 
flush the locks associated with an IP. We could send this event to the 
active servers (in this case GC1) while triggering fail-back. So from 
NFS-Ganesha perspective, this seems to be taken care of. Unless some 
other application (SMB3 handles?) has this use-case, we may for now can 
ignore it.





Considering above cases, looks like we may need to allow reclaim of the
locks all the time. Please suggest if I have missed out any details.



I agree that lock reclamation is needed. Grace timeout behavior does
need more thought for all these cases. Given the involved nature of this
problem, it might be better to write down a more detailed spec that
discusses all these cases for a more thorough review.


Sure. I will open up a spec.

Thanks,
Soumya







For client-side support, I am thinking if we can integrate with the new
lock API being introduced as part of mandatory lock support in gfapi
[2]



Is glfs_file_lock() planned to be used here? If so, how do we specify
that it is a reclaim lock in this api?


Yes. We have been discussing on that patch-set if we can use the same
API. We should either have a separate field to pass reclaim flag or if
we choose not to change its definition, then probably can have
additional lock types -

GLFS_LK_ADVISORY
GLFS_LK_MANDATORY

New lock-types
GLFS_LK_RECLAIM_ADVISORY
GLFS_LK_RECLAIM_MANDATORY



Either approach seems reasonable to me.



We also would need to pass the reclaim_lock flag over rpc.


To avoid new fop/rpc changes, I was considering to take xdata approach
(similar to the way lock mode is passed in xdata for mandatory lock
support) since the processing of reclamation doesn't differ much from
the existing lk fop except for conflicting lock checks.



This looks ok to me.

Thanks,
Vijay




__

Re: [Gluster-devel] Support to reclaim locks (posix) provided lkowner & range matches

2016-07-26 Thread Soumya Koduri

Hi Vijay,

On 07/26/2016 12:13 AM, Vijay Bellur wrote:

On 07/22/2016 08:44 AM, Soumya Koduri wrote:

Hi,

In certain scenarios (esp.,in highly available environments), the
application may have to fail-over/connect to a different glusterFS
client while the I/O is happening. In such cases until there is a ping
timer expiry and glusterFS server cleans up the locks held by the older
glusterFS client, the application will not be able to reclaim their lost
locks. To avoid that we need support in Gluster to let clients reclaim
the existing locks provided lkwoner and the lock range matches.



If the server detects a disconnection, it goes about cleaning up the
locks held by the disconnected client. Only if the failover connection
happens before this server cleanup the outlined scheme would work.Since
there is no ping timer on the server, do you propose to have a grace
timer on the server?


But we are looking for a solution which can work in active-active 
configuration as well. We need to handle cases where in the connection 
between server and the old-client is still in use, which can happen 
during load-balancing or failback.


Different cases which I can outline are:

Application Client - (AC)
Application/GlusterClient 1 - GC1
Application/GlusterClient 2 - GC2
Gluster Server (GS)

1) Active-Passive config  (service gone down)

AC > GC1  > GS (GC2 is not active)

| (failover)
v

AC > GC2  > GS (GC1 connection gets dropped and GC2 establishes 
connection)


In this case, we can have grace timer to allow reclaims only for certain 
time post GC2 (any) rpc connection establishment.


2) Active-Active config  (service gone down)

AC > GC1  > GS
 ^
 |
 GC2  ---

| (failover)
v

AC > GC2  > GS (GC1 connection gets dropped)

The grace timer then shall not get triggered in this case. But at-least 
the locks from GC1 gets cleaned post its connection cleanup.



3) Active-Active config  (both the services active/load-balancing)
This is the trick one.

AC > GC1  > GS
 ^
 |
 GC2  ---

| (load-balancing/failback)
v

 GC1  > GS
 ^
 |
AC > GC2  ---

The locks taken by GC1 shall end up being on the server for ever unless 
we restart either GC1 or the server.


Considering above cases, looks like we may need to allow reclaim of the 
locks all the time. Please suggest if I have missed out any details.






For client-side support, I am thinking if we can integrate with the new
lock API being introduced as part of mandatory lock support in gfapi [2]



Is glfs_file_lock() planned to be used here? If so, how do we specify
that it is a reclaim lock in this api?


Yes. We have been discussing on that patch-set if we can use the same 
API. We should either have a separate field to pass reclaim flag or if 
we choose not to change its definition, then probably can have 
additional lock types -


GLFS_LK_ADVISORY
GLFS_LK_MANDATORY

New lock-types
GLFS_LK_RECLAIM_ADVISORY
GLFS_LK_RECLAIM_MANDATORY



We also would need to pass the reclaim_lock flag over rpc.


To avoid new fop/rpc changes, I was considering to take xdata approach 
(similar to the way lock mode is passed in xdata for mandatory lock 
support) since the processing of reclamation doesn't differ much from 
the existing lk fop except for conflicting lock checks.


http://review.gluster.org/#/c/14986/2/xlators/features/locks/src/posix.c

Please let me know your thoughts.

Thanks,
Soumya



-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Support to reclaim locks (posix) provided lkowner & range matches

2016-07-22 Thread Soumya Koduri

Hi,

In certain scenarios (esp.,in highly available environments), the 
application may have to fail-over/connect to a different glusterFS 
client while the I/O is happening. In such cases until there is a ping 
timer expiry and glusterFS server cleans up the locks held by the older 
glusterFS client, the application will not be able to reclaim their lost 
locks. To avoid that we need support in Gluster to let clients reclaim 
the existing locks provided lkwoner and the lock range matches.


One of the applications which shall benefit from this support is 
NFS-Ganesha. NFS clients try to reclaim their post server reboot.


I have made relevant changes (WIP) on the server side to have this 
support [1]. The changes include -


* A new CLI option is provided "features.locks-reclaim-lock" to enable 
this support.


* Assuming below is done on the client-side (gfapi) - TODO
While re-trying the lock request, application has to notify the 
glusterFS client that it is a reclaim request. Client on receiving such 
request should set a boolean "reclaim-lock" in the xdata passed to lock 
request.


* On the server-side -
  - A new field 'reclaim' is added to 'posix_lock_t' to note if it is 
to be reclaimed.
  - While processing LOCK fop, if the "reclaim-lock" is set in the 
xdata received, reclaim field will be enabled in the new posix lock created.
  - While checking for conflicting locks (in 'same_owner()'), if the 
reclaim field is set, comparison will be done for lkowner and lock 
ranges instead of comparing both lkwoner and client UID.

  - Later it will fall through '__insert_and_merge' and the old lock
will be updated with the details of the new lock created (along with 
client details).


For client-side support, I am thinking if we can integrate with the new 
lock API being introduced as part of mandatory lock support in gfapi [2]


Kindly take a look and provide your comments/suggestions.

The changes seemed minimal and hence I haven't added it as 3.9 release 
feature. But if you feel it is a feature candidate, please let me know. 
I shall open up a feature page.


Thanks,
Soumya

[1] http://review.gluster.org/#/c/14986/
[2] http://review.gluster.org/11177
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression failures in last 3 days

2016-07-20 Thread Soumya Koduri



On 07/20/2016 12:41 PM, Soumya Koduri wrote:



On 07/20/2016 12:00 PM, Soumya Koduri wrote:



On 07/20/2016 11:55 AM, Ravishankar N wrote:

On 07/20/2016 11:51 AM, Kotresh Hiremath Ravishankar wrote:

Hi,

Here is the patch for br-stub.t failures.
http://review.gluster.org/14960
Thanks Soumya for root causing this.

Thanks and Regards,
Kotresh H R



arbiter-mount.t has failed despite having this check.:-(


Hmm right. Wrt to your question posted in another mail -


All the tests seem to have failed because the NFS export is not

available (nothing wrong with the .t itself). I've CC'ed the NFS folks.
Maybe we can increase the value of NFS_EXPORT_TIMEOUT?

Increasing "NFS_EXPORT_TIMEOUT" will not help as it determines the
maximum mount of time "showmount' command should take to complete.
Probably we should either  wait/loop for "NFS_EXPORT_TIMEOUT" amount of
time till the NFS server becomes available before executing 'showmount'.



I have submitted below patch to query the "showmount" cmd output in a
loop. Comments are welcome.

- http://review.gluster.org/#/c/14961/


Ah. Sorry I misinterpreted "EXPECT_WITHIN" keyword. It seems to be 
already doing the iteration. From the arbiter test logs [1], I see that 
NFS service is already started by the time showmount command is issued.


[2016-07-19 13:00:40.533847] I [rpc-drc.c:689:rpcsvc_drc_init] 
0-rpc-service: DRC is turned OFF
[2016-07-19 13:00:40.533881] I [MSGID: 112110] [nfs.c:1524:init] 0-nfs: 
NFS service started

..

[2016-07-19 13:00:40.706206]:++ 
G_LOG:./tests/basic/afr/arbiter-mount.t: TEST: 18 1 
is_nfs_export_available ++


Not sure why the command would still fail. Are you able to reproduce 
this issue locally in any test machine. We can add exit whenever this 
command fails and then examine the service.


Thanks,
Soumya


[1] 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/22354/consoleFull


Thanks,
Soumya


Thanks,
Soumya


-Ravi

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression failures in last 3 days

2016-07-20 Thread Soumya Koduri



On 07/20/2016 12:00 PM, Soumya Koduri wrote:



On 07/20/2016 11:55 AM, Ravishankar N wrote:

On 07/20/2016 11:51 AM, Kotresh Hiremath Ravishankar wrote:

Hi,

Here is the patch for br-stub.t failures.
http://review.gluster.org/14960
Thanks Soumya for root causing this.

Thanks and Regards,
Kotresh H R



arbiter-mount.t has failed despite having this check.:-(


Hmm right. Wrt to your question posted in another mail -


All the tests seem to have failed because the NFS export is not

available (nothing wrong with the .t itself). I've CC'ed the NFS folks.
Maybe we can increase the value of NFS_EXPORT_TIMEOUT?

Increasing "NFS_EXPORT_TIMEOUT" will not help as it determines the
maximum mount of time "showmount' command should take to complete.
Probably we should either  wait/loop for "NFS_EXPORT_TIMEOUT" amount of
time till the NFS server becomes available before executing 'showmount'.



I have submitted below patch to query the "showmount" cmd output in a 
loop. Comments are welcome.


- http://review.gluster.org/#/c/14961/

Thanks,
Soumya


Thanks,
Soumya


-Ravi

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression failures in last 3 days

2016-07-20 Thread Soumya Koduri



On 07/20/2016 11:55 AM, Ravishankar N wrote:

On 07/20/2016 11:51 AM, Kotresh Hiremath Ravishankar wrote:

Hi,

Here is the patch for br-stub.t failures.
http://review.gluster.org/14960
Thanks Soumya for root causing this.

Thanks and Regards,
Kotresh H R



arbiter-mount.t has failed despite having this check.:-(


Hmm right. Wrt to your question posted in another mail -

>>> All the tests seem to have failed because the NFS export is not 
available (nothing wrong with the .t itself). I've CC'ed the NFS folks. 
Maybe we can increase the value of NFS_EXPORT_TIMEOUT?


Increasing "NFS_EXPORT_TIMEOUT" will not help as it determines the 
maximum mount of time "showmount' command should take to complete. 
Probably we should either  wait/loop for "NFS_EXPORT_TIMEOUT" amount of 
time till the NFS server becomes available before executing 'showmount'.


Thanks,
Soumya


-Ravi

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Minutes from today's Gluster Community Bug Triage meeting (July 12 2016)

2016-07-12 Thread Soumya Koduri

Hi,

Thanks to everyone who joined the meeting. Please find the minutes of 
today's Gluster Community Bug Triage meeting at the below links.


Minutes: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-07-12/gluster_bug_triage.2016-07-12-12.00.html
Minutes (text): 
https://meetbot.fedoraproject.org/gluster-meeting/2016-07-12/gluster_bug_triage.2016-07-12-12.00.txt 

Log: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-07-12/gluster_bug_triage.2016-07-12-12.00.log.html



Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] REMINDER: Gluster Community Bug Triage meeting at 12:00 UTC (~in 30 minutes)

2016-07-12 Thread Soumya Koduri

Hi all,

This meeting is scheduled for anyone who is interested in learning more
about, or assisting with the Bug Triage.

Meeting details:
- location: #gluster-meeting on Freenode IRC
 (https://webchat.freenode.net/?channels=gluster-meeting  )
- date: every Tuesday
- time: 12:00 UTC
 (in your terminal, run: date -d "12:00 UTC")
- agenda: https://public.pad.fsfe.org/p/gluster-bug-triage

Currently the following items are listed:
* Roll Call
* Status of last weeks action items
* Group Triage
* Open Floor

The last two topics have space for additions. If you have a suitable bug
or topic to discuss, please add it to the agenda.

Appreciate your participation.

Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [NFS-ganesha] unlink file remains in ./glusterfs/unlinks after delete file

2016-07-01 Thread Soumya Koduri
FYI - "http://review.gluster.org/#/c/14840 " contains the fix for 3.7 
branch.


Thanks,
Soumya

On 07/01/2016 11:38 AM, Soumya Koduri wrote:

Hi,

On 06/30/2016 11:56 AM, 梁正和 wrote:

Hi,

I'm trying to export gluster-volume by nfs-ganesha.

After create --> Some I/O --> delete file from nfs mount point.
The file has been moved to ./glusterfs/unlinkls.


There was an fd leak when a file is created using gfapi handleops (which
NFS-Ganesha uses) and FWIU, if there is an open fd, glusterfs-server
moves the file being removed to ".glusterfs/unlink" folder unless its
inode entry gets purged when the inode table which it maintains gets
full or the brick process is restarted.

The fix for "glfd" leak is already merged in master -
"http://review.gluster.org/#/c/14532/;

Will backport this patch to 3.7 branch. If 3.7.13 merge window gets
closed, the fix shall be available in 3.7.14. Till then to get past this
issue, request to restart brick process.

Thanks,
Soumya



Excepted result: no files in the unlink folder.

Environment: single gluster server with

nfs-ganesha version: 2.2.0-6
glusterfs version: 3.7.12

# Gluster volume info

Volume Name: for_nfs
Type: Distribute
Volume ID: 5db07be3-0f09-413e-a857-33982c1a41e7
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/export/d1/fs
Options Reconfigured:
performance.readdir-ahead: on


# ganesha_mgr show_exports

Exports:
  Id, path,nfsv3, mnt, nlm4, rquota,nfsv40, nfsv41, nfsv42,
9p, last
 2,  /for_nfs,  0,  0,  0,  0,  1,  0,  0,  0, Thu Jun 30 22:08:36
2016, 346097480 nsecs
 0,  /,  0,  0,  0,  0,  0,  0,  0,  0, Thu Jun 30 22:00:21 2016,
516560773 nsecs



Steps to Reproduce:
1. Create a gluster volume and share threoug nfs-ganesha
2. On the nfs mount point, do some file operation (ex echo foor >> bar)
3. Delete bar from the nfs mount point

Result: file bar has been move to unlink folder

# pwd

/export/d1/fs/.glusterfs/unlink

# ls

cef0380d-8300-44ee-8e78-9e773938c935


# cat cef0380d-8300-44ee-8e78-9e773938c935

foo

Thanks,

--
梁正和 Jheng-He Liang
otira...@gmail.com <mailto:navy...@gmail.com>


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [NFS-ganesha] unlink file remains in ./glusterfs/unlinks after delete file

2016-07-01 Thread Soumya Koduri

Hi,

On 06/30/2016 11:56 AM, 梁正和 wrote:

Hi,

I'm trying to export gluster-volume by nfs-ganesha.

After create --> Some I/O --> delete file from nfs mount point.
The file has been moved to ./glusterfs/unlinkls.


There was an fd leak when a file is created using gfapi handleops (which 
NFS-Ganesha uses) and FWIU, if there is an open fd, glusterfs-server 
moves the file being removed to ".glusterfs/unlink" folder unless its 
inode entry gets purged when the inode table which it maintains gets 
full or the brick process is restarted.


The fix for "glfd" leak is already merged in master - 
"http://review.gluster.org/#/c/14532/;


Will backport this patch to 3.7 branch. If 3.7.13 merge window gets 
closed, the fix shall be available in 3.7.14. Till then to get past this 
issue, request to restart brick process.


Thanks,
Soumya



Excepted result: no files in the unlink folder.

Environment: single gluster server with

nfs-ganesha version: 2.2.0-6
glusterfs version: 3.7.12

# Gluster volume info

Volume Name: for_nfs
Type: Distribute
Volume ID: 5db07be3-0f09-413e-a857-33982c1a41e7
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/export/d1/fs
Options Reconfigured:
performance.readdir-ahead: on


# ganesha_mgr show_exports

Exports:
  Id, path,nfsv3, mnt, nlm4, rquota,nfsv40, nfsv41, nfsv42, 9p, last
 2,  /for_nfs,  0,  0,  0,  0,  1,  0,  0,  0, Thu Jun 30 22:08:36
2016, 346097480 nsecs
 0,  /,  0,  0,  0,  0,  0,  0,  0,  0, Thu Jun 30 22:00:21 2016,
516560773 nsecs



Steps to Reproduce:
1. Create a gluster volume and share threoug nfs-ganesha
2. On the nfs mount point, do some file operation (ex echo foor >> bar)
3. Delete bar from the nfs mount point

Result: file bar has been move to unlink folder

# pwd

/export/d1/fs/.glusterfs/unlink

# ls

cef0380d-8300-44ee-8e78-9e773938c935


# cat cef0380d-8300-44ee-8e78-9e773938c935

foo

Thanks,

--
梁正和 Jheng-He Liang
otira...@gmail.com 


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] **Reminder** Triaging and Updating Bug status

2016-06-28 Thread Soumya Koduri

Hi,

We have noticed that many of the bugs (esp., in the recent past the ones 
filed against 'tests' component) which are being actively worked upon do 
not have either 'Triaged' keyword set or bug status(/assignee) updated 
appropriately. Sometimes even many of the active community members fail 
to do so.


Request everyone to follow the Bug triage and status guidelines [1] [2] 
while working on one.


Thanks,
Soumya

[1] http://www.gluster.org/community/documentation/index.php/Bug_triage
[2] https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

2016-06-28 Thread Soumya Koduri

Hi Raghavendra/Venky,

Gentle reminder. Oleksander (post-factum) confirmed that their 
production setup has been running with this patch included since 
sometime and had no issues. Shall we consider merging this patch if 
there are no review comments?


Thanks,
Soumya

On 03/09/2016 09:16 PM, Kotresh Hiremath Ravishankar wrote:

Hi All,

The following patch is sent to address changelog rpc mem-leak issues.
The fix is intricate and needs lot of testing before taking in. Please
review the same.

http://review.gluster.org/#/c/13658/1

Thanks and Regards,
Kotresh H R


- Original Message -

From: "Venky Shankar" <vshan...@redhat.com>
To: "Raghavendra G" <raghaven...@gluster.com>
Cc: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>, "Gluster Devel" 
<gluster-devel@gluster.org>
Sent: Monday, March 7, 2016 3:52:13 PM
Subject: Re: [Gluster-devel] Cores generated with 
./tests/geo-rep/georep-basic-dr-tarssh.t

On Fri, Mar 04, 2016 at 02:02:32PM +0530, Raghavendra G wrote:

On Thu, Mar 3, 2016 at 6:26 PM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:


Hi,

Yes, with this patch we need not set conn->trans to NULL in
rpc_clnt_disable



While [1] fixes the crash, things can be improved in the way how changelog
is using rpc.

1. In the current code, there is an rpc_clnt object leak during disconnect
event.


Just for the record (as I couldn't find this information while building up
rpc infrastructure in changelog):

Unref rpc-clnt object after calling rpc_clnt_disable() in notify() upon
RPC_CLNT_DISCONNECT. Free up 'mydata' in notify() upon RPC_CLNT_DESTROY.


2. Also, freed "mydata" of changelog is still associated with rpc_clnt
object (corollary of 1), though change log might not get any events with
"mydata" (as connection is dead).

I've discussed with Kotresh about changes needed, offline. So, following
are the action items.
1. Soumya's patch [2] is valid and is needed for 3.7 branch too.
2. [2] can be accepted. However, someone might want to re-use an rpc object
after disabling it, like introducing a new api rpc_clnt_enable_again
(though no of such use-cases is very less). But [2] doesn't allow it. The
point is as long as rpc-clnt object is alive, transport object is alive
(though disconnected) and we can re-use it. So, I would prefer not to
accept it.
3. Kotresh will work on new changes to make sure changelog makes correct
use of rpc-clnt.

[1] http://review.gluster.org/#/c/13592
[2] http://review.gluster.org/#/c/1359

regards,
Raghavendra.



Thanks and Regards,
Kotresh H R

- Original Message -

From: "Soumya Koduri" <skod...@redhat.com>
To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>, "Raghavendra

G" <raghaven...@gluster.com>

Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Thursday, March 3, 2016 5:06:00 PM
Subject: Re: [Gluster-devel] Cores generated with

./tests/geo-rep/georep-basic-dr-tarssh.t




On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:

[Replying on top of my own reply]

Hi,

I have submitted the below patch [1] to avoid the issue of
'rpc_clnt_submit'
getting reconnected. But it won't take care of memory leak problem
you

were

trying to fix. That we have to carefully go through all cases and fix

it.

Please have a look at it.


Looks good. IIUC, with this patch, we need not set conn->trans to NULL
in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak
as
the transport object shall then get freed as part of
'rpc_clnt_trigger_destroy'.



http://review.gluster.org/#/c/13592/

Thanks and Regards,
Kotresh H R

- Original Message -

From: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
To: "Soumya Koduri" <skod...@redhat.com>
Cc: "Raghavendra G" <raghaven...@gluster.com>, "Gluster Devel"
<gluster-devel@gluster.org>
Sent: Thursday, March 3, 2016 3:39:11 PM
Subject: Re: [Gluster-devel] Cores generated with
./tests/geo-rep/georep-basic-dr-tarssh.t

Hi Soumya,

I tested the lastes patch [2] on master where your previous patch
[1]

in

merged.
I see crashes at different places.

1. If there are code paths that are holding rpc object without
taking

ref

on
it, all those
 code path will crash on invoking rpc submit on that object as
 rpc
 object
 would have freed
 by last unref on DISCONNECT event. I see this kind of use-case
 in
 chagnelog rpc code.
 Need to check on other users of rpc.

Agree. We should fix all such code-paths. Since this seem to be an
intricate fix, shall we take these patches only in master branch and
not
in 3.7 release for now till we fix all such paths as we encounter?



2. And also we need to take care of reconnect timers that are being

set

and
are re-tried to
 connect back on expiration. In those cases also, we might crash

as rpc

Re: [Gluster-devel] [Gluster-users] Minutes of today's Gluster Community Bug Triage meeting (May 17 2016)

2016-05-17 Thread Soumya Koduri



On 05/17/2016 07:09 PM, M S Vishwanath Bhat wrote:



On 17 May 2016 at 18:51, Soumya Koduri <skod...@redhat.com
<mailto:skod...@redhat.com>> wrote:

Hi,

Please find the minutes of today's Gluster Community Bug Triage
meeting below. Thanks to everyone who have attended the meeting.

Minutes:

https://meetbot.fedoraproject.org/gluster-meeting/2016-05-17/gluster_bug_triage.2016-05-17-12.01.html
Minutes (text):

https://meetbot.fedoraproject.org/gluster-meeting/2016-05-17/gluster_bug_triage.2016-05-17-12.01.txt
Log:

https://meetbot.fedoraproject.org/gluster-meeting/2016-05-17/gluster_bug_triage.2016-05-17-12.01.log.html


#gluster-meeting: Gluster Bug Triage



Meeting started by skoduri at 12:01:05 UTC. The full logs are available
at

https://meetbot.fedoraproject.org/gluster-meeting/2016-05-17/gluster_bug_triage.2016-05-17-12.01.log.html
.



Meeting summary
---
* agenda: https://public.pad.fsfe.org/p/gluster-bug-triage  (skoduri,
   12:01:46)
* Roll Call  (skoduri, 12:01:46)

* Last week action items  (skoduri, 12:03:11)

* msvbhat will look into lalatenduM's automated Coverity setup in
   Jenkins  which need assistance from an admin with more
permissions
   (skoduri, 12:03:29)
   * ACTION: msvbhat will look into lalatenduM's automated Coverity
 setup in   Jenkins  which need assistance from an admin with
 more permissions  (skoduri, 12:05:57)

* ndevos need to decide on how to provide/use debug builds  (skoduri,
   12:06:13)
   * ACTION: ndevos need to decide on how to provide/use debug builds
 (skoduri, 12:07:07)

* Manikandan to followup with kashlm to get access to gluster-infra
   (skoduri, 12:07:22)
   * ACTION: Manikandan to followup with kshlm to get access to
 gluster-infra  (skoduri, 12:08:29)

* Manikandan and Nandaja will update on bug automation  (skoduri,
   12:08:47)
   * gem (Nandaja) making progress on bug automation  (skoduri,
12:09:18)
   * LINK: https://github.com/nandajavarma/gerrit-hooks   (skoduri,
 12:09:30)
   * ACTION: gem  to followup with kshlm to get access to gluster-infra
 (skoduri, 12:10:45)

* ndevos need to decide on how to provide/use debug builds  (skoduri,
   12:11:16)

* ndevos to propose some test-cases for minimal libgfapi test  (skoduri,
   12:11:54)
   * ACTION: ndevos to propose some test-cases for minimal libgfapi test
 (skoduri, 12:12:14)

* ndevos need to discuss about writing a script to update bug assignee
   from gerrit patch  (skoduri, 12:12:26)
   * Updating  bug assignee from gerrit patch is taken care as part of
 the bugzilla updates automation scripts being worked upon by
gem and
 Manikandan  (skoduri, 12:15:00)

* msvbhat  provide a simple step/walk-through on how to provide
   testcases for the nightly rpm tests  (skoduri, 12:15:12)
   * ACTION: msvbhat  provide a simple step/walk-through on how to
 provide testcases for the nightly rpm tests  (skoduri, 12:16:16)


I have done this long back (I think 2 months ago) but somehow missed in
updating the meeting. The info is available in README
<https://github.com/gluster/distaf/blob/master/README.md> and HOWTO
<https://github.com/gluster/distaf/blob/master/docs/HOWTO.md>.

Someone please update.


Done :)

Thanks,
Soumya


Best Regards,
Vishwanath



* rafi needs to followup on #bug 1323895  (skoduri, 12:16:31)

* Scheduling moderators for Gluster Community Bug Triage meeting for a
   month  (skoduri, 12:20:32)
   * Saravanakmr will host the meeting  (skoduri, 12:21:30)
   * jiffin shall host the meeting on May 31  (skoduri, 12:22:06)

* Group triage  (skoduri, 12:22:26)
   * LINK: https://public.pad.fsfe.org/p/gluster-bugs-to-triage
 (skoduri, 12:22:38)

* Open Floor  (skoduri, 12:38:56)

Meeting ended at 12:41:31 UTC.




Action Items

* msvbhat will look into lalatenduM's automated Coverity setup in
   Jenkins  which need assistance from an admin with more
permissions
* ndevos need to decide on how to provide/use debug builds
* Manikandan to followup with kshlm to get access to gluster-infra
* gem  to followup with kshlm to get access to gluster-infra
* ndevos to propose some test-cases for minimal libgfapi test
* msvbhat  provide a simple step/walk-through on how to provide
   testcases for the nightly rpm tests




Action Items, by person
---
* gem
   * gem  to followup with kshlm to get access to gluster-infra
* msvbhat
   * msvbhat will look into lalatenduM's automated Coverity setup in
 Jenkins  which need assistance

[Gluster-devel] Minutes of today's Gluster Community Bug Triage meeting (May 17 2016)

2016-05-17 Thread Soumya Koduri

Hi,

Please find the minutes of today's Gluster Community Bug Triage meeting 
below. Thanks to everyone who have attended the meeting.


Minutes: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-05-17/gluster_bug_triage.2016-05-17-12.01.html
Minutes (text): 
https://meetbot.fedoraproject.org/gluster-meeting/2016-05-17/gluster_bug_triage.2016-05-17-12.01.txt
Log: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-05-17/gluster_bug_triage.2016-05-17-12.01.log.html



#gluster-meeting: Gluster Bug Triage



Meeting started by skoduri at 12:01:05 UTC. The full logs are available
at
https://meetbot.fedoraproject.org/gluster-meeting/2016-05-17/gluster_bug_triage.2016-05-17-12.01.log.html
.



Meeting summary
---
* agenda: https://public.pad.fsfe.org/p/gluster-bug-triage  (skoduri,
  12:01:46)
* Roll Call  (skoduri, 12:01:46)

* Last week action items  (skoduri, 12:03:11)

* msvbhat will look into lalatenduM's automated Coverity setup in
  Jenkins  which need assistance from an admin with more permissions
  (skoduri, 12:03:29)
  * ACTION: msvbhat will look into lalatenduM's automated Coverity
setup in   Jenkins  which need assistance from an admin with
more permissions  (skoduri, 12:05:57)

* ndevos need to decide on how to provide/use debug builds  (skoduri,
  12:06:13)
  * ACTION: ndevos need to decide on how to provide/use debug builds
(skoduri, 12:07:07)

* Manikandan to followup with kashlm to get access to gluster-infra
  (skoduri, 12:07:22)
  * ACTION: Manikandan to followup with kshlm to get access to
gluster-infra  (skoduri, 12:08:29)

* Manikandan and Nandaja will update on bug automation  (skoduri,
  12:08:47)
  * gem (Nandaja) making progress on bug automation  (skoduri, 12:09:18)
  * LINK: https://github.com/nandajavarma/gerrit-hooks   (skoduri,
12:09:30)
  * ACTION: gem  to followup with kshlm to get access to gluster-infra
(skoduri, 12:10:45)

* ndevos need to decide on how to provide/use debug builds  (skoduri,
  12:11:16)

* ndevos to propose some test-cases for minimal libgfapi test  (skoduri,
  12:11:54)
  * ACTION: ndevos to propose some test-cases for minimal libgfapi test
(skoduri, 12:12:14)

* ndevos need to discuss about writing a script to update bug assignee
  from gerrit patch  (skoduri, 12:12:26)
  * Updating  bug assignee from gerrit patch is taken care as part of
the bugzilla updates automation scripts being worked upon by gem and
Manikandan  (skoduri, 12:15:00)

* msvbhat  provide a simple step/walk-through on how to provide
  testcases for the nightly rpm tests  (skoduri, 12:15:12)
  * ACTION: msvbhat  provide a simple step/walk-through on how to
provide testcases for the nightly rpm tests  (skoduri, 12:16:16)

* rafi needs to followup on #bug 1323895  (skoduri, 12:16:31)

* Scheduling moderators for Gluster Community Bug Triage meeting for a
  month  (skoduri, 12:20:32)
  * Saravanakmr will host the meeting  (skoduri, 12:21:30)
  * jiffin shall host the meeting on May 31  (skoduri, 12:22:06)

* Group triage  (skoduri, 12:22:26)
  * LINK: https://public.pad.fsfe.org/p/gluster-bugs-to-triage
(skoduri, 12:22:38)

* Open Floor  (skoduri, 12:38:56)

Meeting ended at 12:41:31 UTC.




Action Items

* msvbhat will look into lalatenduM's automated Coverity setup in
  Jenkins  which need assistance from an admin with more permissions
* ndevos need to decide on how to provide/use debug builds
* Manikandan to followup with kshlm to get access to gluster-infra
* gem  to followup with kshlm to get access to gluster-infra
* ndevos to propose some test-cases for minimal libgfapi test
* msvbhat  provide a simple step/walk-through on how to provide
  testcases for the nightly rpm tests




Action Items, by person
---
* gem
  * gem  to followup with kshlm to get access to gluster-infra
* msvbhat
  * msvbhat will look into lalatenduM's automated Coverity setup in
Jenkins  which need assistance from an admin with more
permissions
  * msvbhat  provide a simple step/walk-through on how to provide
testcases for the nightly rpm tests
* ndevos
  * ndevos need to decide on how to provide/use debug builds
  * ndevos to propose some test-cases for minimal libgfapi test
* **UNASSIGNED**
  * Manikandan to followup with kshlm to get access to gluster-infra




People Present (lines said)
---
* skoduri (78)
* gem (12)
* ndevos (10)
* Saravanakmr (7)
* zodbot (4)
* msvbhat (3)
* kkeithley (3)
* jiffin (2)
* glusterbot (2)


Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] REMINDER: Gluster Community Bug Triage meeting at 12:00 UTC ~(in 2.5 hours)

2016-05-17 Thread Soumya Koduri

Hi,

This meeting is scheduled for anyone, who is interested in learning more
about, or assisting with the Bug Triage.

Meeting details:
- location: #gluster-meeting on Freenode IRC
(https://webchat.freenode.net/?channels=gluster-meeting )
- date: every Tuesday
- time: 12:00 UTC
 (in your terminal, run: date -d "12:00 UTC")
- agenda: https://public.pad.fsfe.org/p/gluster-bug-triage

Currently the following items are listed:
* Roll Call
* Status of last weeks action items
* Group Triage
* Open Floor

The last two topics have space for additions. If you have a suitable bug
or topic to discuss, please add it to the agenda.

Appreciate your participation.
Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] gfapi, readdirplus and forced lookup after inode_link

2016-05-11 Thread Soumya Koduri



On 05/11/2016 10:17 PM, Soumya Koduri wrote:



On 05/11/2016 06:12 PM, Raghavendra Gowdappa wrote:



- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Soumya Koduri" <skod...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Wednesday, May 11, 2016 4:28:28 PM
Subject: Re: [Gluster-devel] gfapi,readdirplus and forced lookup
after inode_link



----- Original Message -

From: "Soumya Koduri" <skod...@redhat.com>
To: "Mohammed Rafi K C" <rkavu...@redhat.com>, "Raghavendra Gowdappa"
<rgowd...@redhat.com>, "Niels de Vos"
<nde...@redhat.com>, "Raghavendra Talur" <rta...@redhat.com>, "Poornima
Gurusiddaiah" <pguru...@redhat.com>
Cc: "+rhs-zteam" <rhs-zt...@redhat.com>, "Rajesh Joseph"
<rjos...@redhat.com>, "jtho >> Jiffin Thottan"
<jthot...@redhat.com>
Sent: Wednesday, May 11, 2016 3:55:05 PM
Subject: Re: gfapi, readdirplus and forced lookup after inode_link



On 05/11/2016 12:41 PM, Mohammed Rafi K C wrote:



On 05/11/2016 12:28 PM, Soumya Koduri wrote:

Hi Raghavendra,



On 05/11/2016 12:01 PM, Raghavendra Gowdappa wrote:

Hi all,

There are certain code-paths where the layers managing inodes
(gfapi,
fuse, nfsv3 etc) need to do a lookup even though the inode is found
in inode-table. readdirplus is one such codepath (but not only one).
The reason for doing this is that
1. not all xlators have enough information in readdirp_cbk to make
inode usable (for eg., dht cannot build layout for directory
inodes).
2. There are operations (like dht directory self-healing) which are
needed for maintaining internal consistency and these operations
cannot be done in readdirp.

This forcing of lookup on a linked inode is normally achieved in two
ways:
1. lower layers (like dht) setting entry->inode to NULL (without
entry->inode, interface layers cannot link the inode).


Rafi (CC'ed) had made changes to fix readdirp specific issue
(required
for tiered volumes) as part of
http://review.gluster.org/#/c/14109/ to
do explicit lookup if either entry->inode is set to NULL or inode_ctx
is NULL in gfapi. And I think he had made similar changes for
gluster-NFS as well to provide support for tiered volumes.  I am not
sure if it is handled in common resolver code-path. Have to look at
the code. Rafi shall be able to confirm it.


The changes I made in the three access layers are for inodes which was
linked from lower layers. Which means the inodes linked from lower
layer
won't have inode ctx set in upper xlators, ie, during resolving we
will
send explicit lookup.

With this changes during resolve if inode_ctx is not set then it will
send a lookup + if set_need_lookup flag is set in inode_ctx, then also
we will send a lookup


That's correct. I think gfapi and fuse-bridge are handling this
properly i.e., sending a lookup before resuming fop if:
1. No context of xlator (fuse/gfapi) is present in inode.
Or
2. Context is set and it says resolution is necessary.

Note that case 1 is necessary as inode_linking is done in dht during
directory healing. So, other fops might encounter an inode on which
resolution is still in progress and not complete yet. As inode-context
is set in fuse-bridge/gfapi only after a successful lookup, absence of
context can be used as a hint for resolution being in progress.

I am not sure NFSv3 server is doing this.


Case (1) is definitely handled in gluster-NFS. In fact looks like it was
as part of single BZ# 1297311 [1], all the required changes were made in
these layers (fuse,gNFS, gfapi) to handle the cases you had mentioned
above. However, at-least when compared the patches [2], [3] & [4], I do
not see need_lookup changes in gluster-NFS.

Rafi, do you recall why its been so?  Was it intentional?


I think the reason being, in gluster-NFS, " nfs_fix_generation()", where 
inode_ctx is set, is being called only at selective places. So at-least 
in readdirp_cbk(), we seem to be just linking inodes but not setting 
inode_ctx. So that would result in forced lookup next time any fop is 
performed on that inode.


Rafi,
 Could you please confirm if that was indeed the case?

Thanks,
Soumya



Thanks,
Soumya

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1297311
[2] http://review.gluster.org/#/c/13224/
[3] http://review.gluster.org/#/c/13225
[4] http://review.gluster.org/#/c/13226



Also, on bricks too quota-enforcer and bit-rot does inode-linking. So,
protocol/server also needs to do similar things.



As Du mentioned, readdirp set need_lookup everytime for entries in
readdirp, I saw that code in fuse, and gfapi. But I don't remember
such
code in gNFS.


There are checks for "entry->inode == NULL" in gNFS case as well. Looks
like it was Jiffin who made those changes (again wrt to tiered volumes)
- http://review.gl

Re: [Gluster-devel] gfapi, readdirplus and forced lookup after inode_link

2016-05-11 Thread Soumya Koduri



On 05/11/2016 06:12 PM, Raghavendra Gowdappa wrote:



- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Soumya Koduri" <skod...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Wednesday, May 11, 2016 4:28:28 PM
Subject: Re: [Gluster-devel] gfapi, readdirplus and forced lookup after 
inode_link



----- Original Message -

From: "Soumya Koduri" <skod...@redhat.com>
To: "Mohammed Rafi K C" <rkavu...@redhat.com>, "Raghavendra Gowdappa"
<rgowd...@redhat.com>, "Niels de Vos"
<nde...@redhat.com>, "Raghavendra Talur" <rta...@redhat.com>, "Poornima
Gurusiddaiah" <pguru...@redhat.com>
Cc: "+rhs-zteam" <rhs-zt...@redhat.com>, "Rajesh Joseph"
<rjos...@redhat.com>, "jtho >> Jiffin Thottan"
<jthot...@redhat.com>
Sent: Wednesday, May 11, 2016 3:55:05 PM
Subject: Re: gfapi, readdirplus and forced lookup after inode_link



On 05/11/2016 12:41 PM, Mohammed Rafi K C wrote:



On 05/11/2016 12:28 PM, Soumya Koduri wrote:

Hi Raghavendra,



On 05/11/2016 12:01 PM, Raghavendra Gowdappa wrote:

Hi all,

There are certain code-paths where the layers managing inodes (gfapi,
fuse, nfsv3 etc) need to do a lookup even though the inode is found
in inode-table. readdirplus is one such codepath (but not only one).
The reason for doing this is that
1. not all xlators have enough information in readdirp_cbk to make
inode usable (for eg., dht cannot build layout for directory inodes).
2. There are operations (like dht directory self-healing) which are
needed for maintaining internal consistency and these operations
cannot be done in readdirp.

This forcing of lookup on a linked inode is normally achieved in two
ways:
1. lower layers (like dht) setting entry->inode to NULL (without
entry->inode, interface layers cannot link the inode).


Rafi (CC'ed) had made changes to fix readdirp specific issue (required
for tiered volumes) as part of http://review.gluster.org/#/c/14109/ to
do explicit lookup if either entry->inode is set to NULL or inode_ctx
is NULL in gfapi. And I think he had made similar changes for
gluster-NFS as well to provide support for tiered volumes.  I am not
sure if it is handled in common resolver code-path. Have to look at
the code. Rafi shall be able to confirm it.


The changes I made in the three access layers are for inodes which was
linked from lower layers. Which means the inodes linked from lower layer
won't have inode ctx set in upper xlators, ie, during resolving we will
send explicit lookup.

With this changes during resolve if inode_ctx is not set then it will
send a lookup + if set_need_lookup flag is set in inode_ctx, then also
we will send a lookup


That's correct. I think gfapi and fuse-bridge are handling this properly i.e., 
sending a lookup before resuming fop if:
1. No context of xlator (fuse/gfapi) is present in inode.
Or
2. Context is set and it says resolution is necessary.

Note that case 1 is necessary as inode_linking is done in dht during directory 
healing. So, other fops might encounter an inode on which resolution is still 
in progress and not complete yet. As inode-context is set in fuse-bridge/gfapi 
only after a successful lookup, absence of context can be used as a hint for 
resolution being in progress.

I am not sure NFSv3 server is doing this.


Case (1) is definitely handled in gluster-NFS. In fact looks like it was 
as part of single BZ# 1297311 [1], all the required changes were made in 
these layers (fuse,gNFS, gfapi) to handle the cases you had mentioned 
above. However, at-least when compared the patches [2], [3] & [4], I do 
not see need_lookup changes in gluster-NFS.


Rafi, do you recall why its been so?  Was it intentional?

Thanks,
Soumya

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1297311
[2] http://review.gluster.org/#/c/13224/
[3] http://review.gluster.org/#/c/13225
[4] http://review.gluster.org/#/c/13226



Also, on bricks too quota-enforcer and bit-rot does inode-linking. So, 
protocol/server also needs to do similar things.



As Du mentioned, readdirp set need_lookup everytime for entries in
readdirp, I saw that code in fuse, and gfapi. But I don't remember such
code in gNFS.


There are checks for "entry->inode == NULL" in gNFS case as well. Looks
like it was Jiffin who made those changes (again wrt to tiered volumes)
- http://review.gluster.org/#/c/12960/

But all these checks seem to be in only readdirp_cbk codepath where
directory entries are filled. What are other fops which need such
special handling?


There are some codepaths, where linking is done by xlators who don't do
resolution. A rough search shows following components:
1. quota enforcer
2. bitrot
3. dht/tier (needed, but currently not doing).
4. trash (for .trash I suppose)

However, none of these are e

Re: [Gluster-devel] [Gluster-users] Exporting Gluster Volume

2016-05-04 Thread Soumya Koduri

Hi Abhishek,

Below 'rpcinfo' output doesn't list 'nfsacl' protocol. That must be the 
reason client is not able set ACLs. Could you please check the log file 
'/var/lib/glusterfs/nfs.log' if there are any errors logged with respect 
protocol registration failures.


Thanks,
Soumya

On 05/04/2016 11:15 AM, ABHISHEK PALIWAL wrote:

Hi Niels,

Please reply it is really urgent.

Regards,
Abhishek

On Tue, May 3, 2016 at 11:36 AM, ABHISHEK PALIWAL
> wrote:

Hi Niels,

Do you require more logs...

Regards,
Abhishek

On Mon, May 2, 2016 at 4:58 PM, ABHISHEK PALIWAL
> wrote:

Hi Niels,


Here is the output of rpcinfo -p $NFS_SERVER

root@128:/# rpcinfo -p $NFS_SERVER
program vers proto   port  service
 104   tcp111  portmapper
 103   tcp111  portmapper
 102   tcp111  portmapper
 104   udp111  portmapper
 103   udp111  portmapper
 102   udp111  portmapper
 153   tcp  38465  mountd
 151   tcp  38465  mountd
 133   tcp  38465  nfs
 1002273   tcp  38465


out of mount command

#mount -vvv -t nfs -o acl,vers=3 128.224.95.140:/gv0 /tmp/e
mount: fstab path: "/etc/fstab"
mount: mtab path:  "/etc/mtab"
mount: lock path:  "/etc/mtab~"
mount: temp path:  "/etc/mtab.tmp"
mount: UID:0
mount: eUID:   0
mount: spec:  "128.224.95.140:/gv0"
mount: node:  "/tmp/e"
mount: types: "nfs"
mount: opts:  "acl,vers=3"
mount: external mount: argv[0] = "/sbin/mount.nfs"
mount: external mount: argv[1] = "128.224.95.140:/gv0"
mount: external mount: argv[2] = "/tmp/e"
mount: external mount: argv[3] = "-v"
mount: external mount: argv[4] = "-o"
mount: external mount: argv[5] = "rw,acl,vers=3"
mount.nfs: timeout set for Mon May  2 16:58:58 2016
mount.nfs: trying text-based options
'acl,vers=3,addr=128.224.95.140'
mount.nfs: prog 13, trying vers=3, prot=6
mount.nfs: trying 128.224.95.140 prog 13 vers 3 prot TCP
port 38465
mount.nfs: prog 15, trying vers=3, prot=17
mount.nfs: portmap query retrying: RPC: Program not registered
mount.nfs: prog 15, trying vers=3, prot=6
mount.nfs: trying 128.224.95.140 prog 15 vers 3 prot TCP
port 38465


On Mon, May 2, 2016 at 4:36 PM, Niels de Vos > wrote:

On Mon, May 02, 2016 at 04:14:01PM +0530, ABHISHEK PALIWAL
wrote:
 > HI Team,
 >
 > I am exporting gluster volume using GlusterNFS  with ACL
support but at NFS
 > client while running 'setfacl' command getting "setfacl:
/tmp/e: Remote I/O
 > error"
 >
 >
 > Following is the NFS option status for Volume:
 >
 > nfs.enable-ino32
 > no
 > nfs.mem-factor
 > 15
 > nfs.export-dirs
 > on
 > nfs.export-volumes
 > on
 > nfs.addr-namelookup
 > off
 > nfs.dynamic-volumes
 > off
 > nfs.register-with-portmap
 > on
 > nfs.outstanding-rpc-limit
 > 16
 > nfs.port
 > 38465
 > nfs.rpc-auth-unix
 > on
 > nfs.rpc-auth-null
 > on
 > nfs.rpc-auth-allow
 > all
 > nfs.rpc-auth-reject
 > none
 > nfs.ports-insecure
 > off
 > nfs.trusted-sync
 > off
 > nfs.trusted-write
 > off
 > nfs.volume-access
 > read-write
 > nfs.export-dir
 >
 > nfs.disable
 > off
 > nfs.nlm
 > on
 > nfs.acl
 > on
 > nfs.mount-udp
 > off
 > nfs.mount-rmtab
 > /var/lib/glusterd/nfs/rmtab
 > nfs.rpc-statd
 > /sbin/rpc.statd
 > nfs.server-aux-gids
 > off
 > nfs.drc
 > off
 > nfs.drc-size
 > 0x2
 > nfs.read-size   (1 *
 > 1048576ULL)
 > nfs.write-size  (1 *
 > 1048576ULL)
 > nfs.readdir-size(1 *
 > 1048576ULL)
 > nfs.exports-auth-enable
 > (null)
   

Re: [Gluster-devel] Review request for leases patches

2016-03-08 Thread Soumya Koduri

Hi Poornima,

On 03/07/2016 11:24 AM, Poornima Gurusiddaiah wrote:

Hi All,

Here is the link to feature page: http://review.gluster.org/#/c/11980/

Patches can be found @:
http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:leases


This link displays only one patch [1]. Probably other patches are not 
marked under topic:leases. Please verify the same.


Also please confirm if the list is complete to be consumed by any 
application or if there are still any pending patches (apart from the 
open items mentioned in the design doc) to be/being worked upon.


Thanks,
Soumya

[1] http://review.gluster.org/11721





Regards,
Poornima


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

2016-03-04 Thread Soumya Koduri

Hi Raghavendra,

On 03/04/2016 03:09 PM, Raghavendra G wrote:



On Fri, Mar 4, 2016 at 2:02 PM, Raghavendra G <raghaven...@gluster.com
<mailto:raghaven...@gluster.com>> wrote:



On Thu, Mar 3, 2016 at 6:26 PM, Kotresh Hiremath Ravishankar
<khire...@redhat.com <mailto:khire...@redhat.com>> wrote:

Hi,

Yes, with this patch we need not set conn->trans to NULL in
rpc_clnt_disable


While [1] fixes the crash, things can be improved in the way how
changelog is using rpc.

1. In the current code, there is an rpc_clnt object leak during
disconnect event.
2. Also, freed "mydata" of changelog is still associated with
rpc_clnt object (corollary of 1), though change log might not get
any events with "mydata" (as connection is dead).

I've discussed with Kotresh about changes needed, offline. So,
following are the action items.
1. Soumya's patch [2] is valid and is needed for 3.7 branch too.
2. [2] can be accepted. However, someone might want to re-use an rpc
object after disabling it, like introducing a new api
rpc_clnt_enable_again (though no of such use-cases is very less).
But [2] doesn't allow it. The point is as long as rpc-clnt object is
alive, transport object is alive (though disconnected) and we can
re-use it. So, I would prefer not to accept it.


[2] will be accepted now.

The link mentioned seems to be invalid. Just to be clear, we have 3 
patches in question here -


[1] Original patch (merged in master but not in release-3.7) 
http://review.gluster.org/13456


There are two patches proposed to fix the regression
[2] http://review.gluster.org/#/c/13587/
[3] http://review.gluster.org/#/c/13592/

Since the patch by Kotresh [3] completely fixes the regression, we have 
decided to abandon [2]. In addition, since these fixes look very 
intricate, till we are sure that the code is stable, we thought it may 
be better not to back-port these patches to stable release branch.


Please confirm if you are implying to revive [2] and  back-port all 
these 3 patches to release-3.7 branch?


Thanks,
Soumya


3. Kotresh will work on new changes to make sure changelog makes
correct use of rpc-clnt.

[1] http://review.gluster.org/#/c/13592
[2] http://review.gluster.org/#/c/1359

regards,
Raghavendra.


Thanks and Regards,
Kotresh H R

    - Original Message -
> From: "Soumya Koduri" <skod...@redhat.com <mailto:skod...@redhat.com>>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com 
<mailto:khire...@redhat.com>>, "Raghavendra
G" <raghaven...@gluster.com <mailto:raghaven...@gluster.com>>
> Cc: "Gluster Devel" <gluster-devel@gluster.org 
<mailto:gluster-devel@gluster.org>>
 > Sent: Thursday, March 3, 2016 5:06:00 PM
 > Subject: Re: [Gluster-devel] Cores generated with
./tests/geo-rep/georep-basic-dr-tarssh.t
 >
 >
 >
 > On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
 > > [Replying on top of my own reply]
 > >
 > > Hi,
 > >
 > > I have submitted the below patch [1] to avoid the issue of
 > > 'rpc_clnt_submit'
 > > getting reconnected. But it won't take care of memory leak
problem you were
 > > trying to fix. That we have to carefully go through all
cases and fix it.
 > > Please have a look at it.
 > >
 > Looks good. IIUC, with this patch, we need not set
conn->trans to NULL
 > in 'rpc_clnt_disable()'. Right? If yes, then it takes care of
memleak as
 > the transport object shall then get freed as part of
 > 'rpc_clnt_trigger_destroy'.
 >
 >
 > > http://review.gluster.org/#/c/13592/
     > >
 > > Thanks and Regards,
 > > Kotresh H R
 > >
 > > - Original Message -
 > >> From: "Kotresh Hiremath Ravishankar" <khire...@redhat.com
<mailto:khire...@redhat.com>>
 > >> To: "Soumya Koduri" <skod...@redhat.com
<mailto:skod...@redhat.com>>
 > >> Cc: "Raghavendra G" <raghaven...@gluster.com
<mailto:raghaven...@gluster.com>>, "Gluster Devel"
 > >> <gluster-devel@gluster.org <mailto:gluster-devel@gluster.org>>
 > >> Sent: Thursday, March 3, 2016 3:39:11 PM
 > >> Subject: Re: [Gluster-devel] Cores generated with
 > >> ./tests/geo-rep/georep-basic-dr-tarssh.t
 

Re: [Gluster-devel] Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

2016-03-03 Thread Soumya Koduri



On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:

[Replying on top of my own reply]

Hi,

I have submitted the below patch [1] to avoid the issue of 'rpc_clnt_submit'
getting reconnected. But it won't take care of memory leak problem you were
trying to fix. That we have to carefully go through all cases and fix it.
Please have a look at it.

Looks good. IIUC, with this patch, we need not set conn->trans to NULL 
in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak as 
the transport object shall then get freed as part of 
'rpc_clnt_trigger_destroy'.




http://review.gluster.org/#/c/13592/

Thanks and Regards,
Kotresh H R

- Original Message -

From: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
To: "Soumya Koduri" <skod...@redhat.com>
Cc: "Raghavendra G" <raghaven...@gluster.com>, "Gluster Devel" 
<gluster-devel@gluster.org>
Sent: Thursday, March 3, 2016 3:39:11 PM
Subject: Re: [Gluster-devel] Cores generated with 
./tests/geo-rep/georep-basic-dr-tarssh.t

Hi Soumya,

I tested the lastes patch [2] on master where your previous patch [1] in
merged.
I see crashes at different places.

1. If there are code paths that are holding rpc object without taking ref on
it, all those
code path will crash on invoking rpc submit on that object as rpc object
would have freed
by last unref on DISCONNECT event. I see this kind of use-case in
chagnelog rpc code.
Need to check on other users of rpc.
Agree. We should fix all such code-paths. Since this seem to be an 
intricate fix, shall we take these patches only in master branch and not 
in 3.7 release for now till we fix all such paths as we encounter?




2. And also we need to take care of reconnect timers that are being set and
are re-tried to
connect back on expiration. In those cases also, we might crash as rpc
object would have freed.

Your patch addresses this..right?

Thanks,
Soumya




[1] http://review.gluster.org/#/c/13507/
[2] http://review.gluster.org/#/c/13587/

Thanks and Regards,
Kotresh H R

- Original Message -

From: "Soumya Koduri" <skod...@redhat.com>
To: "Raghavendra G" <raghaven...@gluster.com>, "Kotresh Hiremath
Ravishankar" <khire...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Thursday, March 3, 2016 12:24:00 PM
Subject: Re: [Gluster-devel] Cores generated with
./tests/geo-rep/georep-basic-dr-tarssh.t

Thanks a lot Kotresh.

On 03/03/2016 08:47 AM, Raghavendra G wrote:

Hi Soumya,

Can you send a fix to this regression on upstream master too? This patch
is merged there.


I have submitted below patch.
http://review.gluster.org/#/c/13587/

Kindly review the same.

Thanks,
Soumya


regards,
Raghavendra

On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
<khire...@redhat.com <mailto:khire...@redhat.com>> wrote:

 Hi Soumya,

 I analysed the issue and found out that crash has happened because
 of the patch [1].

 The patch doesn't set transport object to NULL in 'rpc_clnt_disable'
 but instead does it on
 'rpc_clnt_trigger_destroy'. So if there are pending rpc invocations
 on the rpc object that
 is disabled (those instances are possible as happening now in
 changelog), it will trigger a
 CONNECT notify again with 'mydata' that is freed causing a crash.
 This happens because
 'rpc_clnt_submit' reconnects if rpc is not connected.

   rpc_clnt_submit (...) {
 ...
  if (conn->connected == 0) {
  ret = rpc_transport_connect (conn->trans,

   conn->config.remote_port);
  }
 ...
   }

 Without your patch, conn->trans was set NULL and hence CONNECT fails
 not resulting with
 CONNECT notify call. And also the cleanup happens in failure path.

 So the memory leak can happen, if there is no try for rpc invocation
 after DISCONNECT.
 It will be cleaned up otherwise.


 [1] http://review.gluster.org/#/c/13507/

 Thanks and Regards,
 Kotresh H R

 - Original Message -
  > From: "Kotresh Hiremath Ravishankar" <khire...@redhat.com
 <mailto:khire...@redhat.com>>
  > To: "Soumya Koduri" <skod...@redhat.com
  > <mailto:skod...@redhat.com>>
  > Cc: avish...@redhat.com <mailto:avish...@redhat.com>, "Gluster
 Devel" <gluster-devel@gluster.org <mailto:gluster-devel@gluster.org>>
  > Sent: Monday, February 29, 2016 4:15:22 PM
  > Subject: Re: Cores generated with
 ./tests/geo-rep/georep-basic-dr-tarssh.t
  >
  > Hi Soumya,
  >
  > I just tested that it is reproducible only with your patch both
 in master and
  > 3.76 branch.
  > The ge

Re: [Gluster-devel] Regarding default_forget/releasedir/release() fops

2016-02-23 Thread Soumya Koduri



On 02/23/2016 05:02 PM, Jeff Darcy wrote:

Recently while doing some tests (which involved lots of inode_forget()),
I have noticed that my log file got flooded with below messages -

[2016-02-22 08:57:44.025565] W [defaults.c:2889:default_forget] (-->
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x231)[0x7fd00f63c15d]
(-->
/usr/local/lib/libglusterfs.so.0(default_forget+0x44)[0x7fd00f6cda2b]
(--> /usr/local/lib/libglusterfs.so.0(+0x39706)[0x7fd00f64b706] (-->
/usr/local/lib/libglusterfs.so.0(+0x397d2)[0x7fd00f64b7d2] (-->
/usr/local/lib/libglusterfs.so.0(+0x3be08)[0x7fd00f64de08] )
0-gfapi: xlator does not implement forget_cbk

  From the code, looks like we throw a warning in default-tmpl.c if any
xlator hasn't implemented forget(), releasedir() and release().

Though I agree it warns us about possible leaks which may happen if
these fops are not supported, it is annoying to have these messages
flooded in the log file which grew >1GB within few minutes.

Could you please confirm if it was intentional to throw this warning so
that all xlators shall have these fops implemented or if we can change
the log level to DEBUG?


It is intentional, and I would prefer that it be resolved by having
translators implement these calls, but it doesn't need to be a warning.
DEBUG would be fine.


Thanks for the confirmation. I have posted below patches -

http://review.gluster.org/#/q/status:open+project:glusterfs+branch:master+topic:bug-1311124

Thanks,
Soumya




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Regarding default_forget/releasedir/release() fops

2016-02-22 Thread Soumya Koduri

Hi Jeff,

Recently while doing some tests (which involved lots of inode_forget()), 
I have noticed that my log file got flooded with below messages -


[2016-02-22 08:57:44.025565] W [defaults.c:2889:default_forget] (--> 
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x231)[0x7fd00f63c15d] (--> 
/usr/local/lib/libglusterfs.so.0(default_forget+0x44)[0x7fd00f6cda2b] 
(--> /usr/local/lib/libglusterfs.so.0(+0x39706)[0x7fd00f64b706] (--> 
/usr/local/lib/libglusterfs.so.0(+0x397d2)[0x7fd00f64b7d2] (--> 
/usr/local/lib/libglusterfs.so.0(+0x3be08)[0x7fd00f64de08] ) 
0-gfapi: xlator does not implement forget_cbk


From the code, looks like we throw a warning in default-tmpl.c if any 
xlator hasn't implemented forget(), releasedir() and release().


Though I agree it warns us about possible leaks which may happen if 
these fops are not supported, it is annoying to have these messages 
flooded in the log file which grew >1GB within few minutes.


Could you please confirm if it was intentional to throw this warning so 
that all xlators shall have these fops implemented or if we can change 
the log level to DEBUG?


Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] libgfapi 3.7.8 still memory leak

2016-02-17 Thread Soumya Koduri

Hi Piotr,

On 02/17/2016 08:20 PM, Piotr Rybicki wrote:

Hi all.

I'm trying hard to diagnose memory leaks in libgfapi access.

gluster 3.7.8

For this purpose, i've created simplest C code (basically only calling
glfs_new() and glfs_fini() ):


#include 

int main (int argc, char** argv) {
 glfs_t *fs = NULL;

 fs = glfs_new ("pool");

 glfs_fini (fs);
 return 0;
}


CFLAGS="-O0 -g -pipe -fomit-frame-pointer -fpeel-loops"
CXXFLAGS="${CFLAGS}"

# gcc -v
Using built-in specs.
COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.9.3/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.3/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with:
/var/tmp/portage/sys-devel/gcc-4.9.3/work/gcc-4.9.3/configure
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr
--bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.9.3
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.9.3
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.9.3/man
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.9.3/info
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include/g++-v4
--with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.9.3/python
--enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt
--disable-werror --with-system-zlib --enable-nls
--without-included-gettext --enable-checking=release
--with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.9.3
p1.4, pie-0.6.4' --enable-libstdcxx-time --enable-shared
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
--enable-multilib --with-multilib-list=m32,m64 --disable-altivec
--disable-fixed-point --enable-targets=all --disable-libgcj
--enable-libgomp --disable-libmudflap --disable-libssp
--disable-libcilkrts --enable-lto --with-cloog
--disable-isl-version-check --enable-libsanitizer
Thread model: posix
gcc version 4.9.3 (Gentoo 4.9.3 p1.4, pie-0.6.4)

# gcc hellogluster.c -lgfapi

I've patched (client) with:
http://review.gluster.org/#/c/13456/
http://review.gluster.org/#/c/13125/
http://review.gluster.org/#/c/13096/

Latest patchset versions.

Still leaks...

running valgrind on this code, produces:

# valgrind --leak-check=full --show-reachable=yes ./a.out
==20881== Memcheck, a memory error detector
==20881== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==20881== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright
info
==20881== Command: ./a.out
==20881==
==20881==
==20881== HEAP SUMMARY:
==20881== in use at exit: 35,938 bytes in 10 blocks
==20881==   total heap usage: 94 allocs, 84 frees, 9,048,615 bytes
allocated
==20881==
==20881== 8 bytes in 1 blocks are still reachable in loss record 1 of 9
==20881==at 0x4C2C0D0: calloc (vg_replace_malloc.c:711)
==20881==by 0x5A8A806: __gf_default_calloc (mem-pool.h:118)
==20881==by 0x5A8A806: __glusterfs_this_location (globals.c:141)
==20881==by 0x4E3D1F4: glfs_new@@GFAPI_3.4.0 (glfs.c:650)
==20881==by 0x400746: main (in /root/gf-test2/a.out)
==20881==
==20881== 82 bytes in 1 blocks are definitely lost in loss record 2 of 9
==20881==at 0x4C2C0D0: calloc (vg_replace_malloc.c:711)
==20881==by 0x5A862D9: __gf_calloc (mem-pool.c:117)
==20881==by 0x5A54224: gf_strdup (mem-pool.h:185)
==20881==by 0x5A54224: gf_log_init (logging.c:736)
==20881==by 0x4E3D7D1: glfs_set_logging@@GFAPI_3.4.0 (glfs.c:786)
==20881==by 0x4E3D429: glfs_new@@GFAPI_3.4.0 (glfs.c:665)
==20881==by 0x400746: main (in /root/gf-test2/a.out)
==20881==
==20881== 240 bytes in 1 blocks are definitely lost in loss record 3 of 9
==20881==at 0x4C2C0D0: calloc (vg_replace_malloc.c:711)
==20881==by 0x5A862D9: __gf_calloc (mem-pool.c:117)
==20881==by 0x5A733D2: gf_timer_registry_init (timer.c:244)
==20881==by 0x5A734A6: gf_timer_call_after (timer.c:52)
==20881==by 0x5A557B2: __gf_log_inject_timer_event (logging.c:1791)
==20881==by 0x5A557B2: gf_log_inject_timer_event (logging.c:1813)
==20881==by 0x4E3D7F8: glfs_set_logging@@GFAPI_3.4.0 (glfs.c:790)
==20881==by 0x4E3D429: glfs_new@@GFAPI_3.4.0 (glfs.c:665)
==20881==by 0x400746: main (in /root/gf-test2/a.out)
==20881==
==20881== 280 bytes in 1 blocks are definitely lost in loss record 4 of 9
==20881==at 0x4C2C0D0: calloc (vg_replace_malloc.c:711)
==20881==by 0x5A862D9: __gf_calloc (mem-pool.c:117)
==20881==by 0x5A88A6A: iobuf_create_stdalloc_arena (iobuf.c:367)
==20881==by 0x5A88A6A: iobuf_pool_new (iobuf.c:431)
==20881==by 0x4E3D257: glusterfs_ctx_defaults_init (glfs.c:95)
==20881==by 0x4E3D257: glfs_new@@GFAPI_3.4.0 (glfs.c:659)
==20881==by 0x400746: main (in /root/gf-test2/a.out)
==20881==
==20881== 288 bytes in 1 blocks are possibly lost in loss record 5 of 9
==20881==at 0x4C2C0D0: calloc (vg_replace_malloc.c:711)
==20881==by 0x4011741: allocate_dtv (in /lib64/ld-2.21.so)
==20881==by 0x401206D: _dl_allocate_tls (in 

Re: [Gluster-devel] [Gluster-users] GlusterFS v3.7.8 client leaks summary — part II

2016-02-16 Thread Soumya Koduri



On 02/16/2016 08:06 PM, Oleksandr Natalenko wrote:

Hmm, OK. I've rechecked 3.7.8 with the following patches (latest
revisions):

===
Soumya Koduri (3):
   gfapi: Use inode_forget in case of handle objects
   inode: Retire the inodes from the lru list in inode_table_destroy
   rpc: Fix for rpc_transport_t leak
===

Here is Valgrind output: [1]

It seems that all leaks are gone, and that is very nice.


At least major chunk of leaks seem to have gone. Many thanks to you too 
for very detailed tests and analysis :)


-Soumya



Many thanks to all devs.

[1] https://gist.github.com/anonymous/eddfdaf3eb7bff458326

16.02.2016 15:30, Soumya Koduri wrote:

I have tested using your API app (I/Os done - create,write and stat).
I still do not see any inode related leaks. However I posted another
fix for rpc_transport object related leak [1].

I request you to re-check if you have the latest patch of [2] applied
in your build.

[1] http://review.gluster.org/#/c/13456/
[2] http://review.gluster.org/#/c/13125/

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] GlusterFS v3.7.8 client leaks summary — part II

2016-02-16 Thread Soumya Koduri



On 02/12/2016 11:27 AM, Soumya Koduri wrote:



On 02/11/2016 08:33 PM, Oleksandr Natalenko wrote:

And "API" test.

I used custom API app [1] and did brief file manipulations through it
(create/remove/stat).

Then I performed drop_caches, finished API [2] and got the following
Valgrind log [3].

I believe there are still some leaks occurring in glfs_lresolve() call
chain.


glfs_fini() should have ideally destroyed all the inodes in the inode
table. I shall try to use your app and check if anything is missed out.




I have tested using your API app (I/Os done - create,write and stat). I 
still do not see any inode related leaks. However I posted another fix 
for rpc_transport object related leak [1].


I request you to re-check if you have the latest patch of [2] applied in 
your build.


[1] http://review.gluster.org/#/c/13456/
[2] http://review.gluster.org/#/c/13125/

Thanks,
Soumya


Thanks,
Soumya



Soumya?

[1] https://github.com/pfactum/xglfs
[2] https://github.com/pfactum/xglfs/blob/master/xglfs_destroy.c#L30
[3] https://gist.github.com/aec72b6164a695cf2d44

11.02.2016 10:12, Oleksandr Natalenko написав:

And here goes "rsync" test results (v3.7.8 + two patches by Soumya).

2 volumes involved: source and target.

=== Common indicators ===

slabtop before drop_caches: [1]
slabtop after drop_caches: [2]

=== Source volume (less interesting part) ===

RAM usage before drop_caches: [3]
statedump before drop_caches: [4]
RAM usage after drop_caches: [5]
statedump after drop_caches: [6]

=== Target volume (most interesting part) ===

RAM usage before drop_caches: [7]
statedump before drop_caches: [8]
RAM usage after drop_caches: [9]
statedump after drop_caches: [10]
Valgrind output: [11]

=== Conclusion ===

Again, see no obvious leaks.

[1] https://gist.github.com/e72fd30a4198dd630299
[2] https://gist.github.com/78ef9eae3dc16fd79c1b
[3] https://gist.github.com/4ed75e8d6cb40a1369d8
[4] https://gist.github.com/20a75d32db76795b90d4
[5] https://gist.github.com/0772959834610dfdaf2d
[6] https://gist.github.com/a71684bd3745c77c41eb
[7] https://gist.github.com/2c9be083cfe3bffe6cec
[8] https://gist.github.com/0102a16c94d3d8eb82e3
[9] https://gist.github.com/23f057dc8e4b2902bba1
[10] https://gist.github.com/385bbb95ca910ec9766f
[11] https://gist.github.com/685c4d3e13d31f597722

10.02.2016 15:37, Oleksandr Natalenko написав:

Hi, folks.

Here go new test results regarding client memory leak.

I use v3.7.8 with the following patches:

===
Soumya Koduri (2):
  inode: Retire the inodes from the lru list in inode_table_destroy
  gfapi: Use inode_forget in case of handle objects
===

Those are the only 2 not merged yet.

So far, I've performed only "find" test, and here are the results:

RAM usage before drop_caches: [1]
statedump before drop_caches: [2]
slabtop before drop_caches: [3]
RAM usage after drop_caches: [4]
statedump after drop_caches: [5]
slabtop after drop_caches: [6]
Valgrind output: [7]

No leaks either via statedump or via valgrind. However, statedump
stats still suffer from integer overflow.

Next steps I'm going to take:

1) "rsync" test;
2) API test.

[1] https://gist.github.com/88d2fa95c28baeb2543f
[2] https://gist.github.com/4f3e93ff2db6e3cf4081
[3] https://gist.github.com/62791a2c4258041ba821
[4] https://gist.github.com/1d3ce95a493d054bbac2
[5] https://gist.github.com/fa855a2752d3691365a7
[6] https://gist.github.com/84e9e27d2a2e5ff5dc33
[7] https://gist.github.com/f35bd32a5159d3571d3a
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] libgfapi libvirt memory leak version 3.7.8

2016-02-11 Thread Soumya Koduri

Hi Piotr,

Could you apply below gfAPI patch and check the valgrind output -

   http://review.gluster.org/13125

Thanks,
Soumya


On 02/11/2016 09:40 PM, Piotr Rybicki wrote:

Hi All

I have to report, that there is a mem leak latest version of gluster

gluster: 3.7.8
libvirt 1.3.1

mem leak exists when starting domain (virsh start DOMAIN) which acesses
drivie via libgfapi (although leak is much smaller than with gluster
3.5.X).

I believe libvirt itself uses libgfapi only to check existence of a disk
(via libgfapi). Libvirt calls glfs_ini and glfs_fini when doing this check.

When using drive via file (gluster fuse mount), there is no mem leak
when starting domain.

my drive definition (libgfapi):

 
   
   
  # connection is still
via tcp. Defining 'tcp' here doesn't make any difference.
   
   
   
   
 

I've at first reported to libvirt developers, but they blame gluster.

valgrind details (libgfapi):

# valgrind --leak-check=full --show-reachable=yes
--child-silent-after-fork=yes libvirtd --listen 2> libvirt-gfapi.log

On the other console:
virsh start DOMAIN
...wait...
virsh shutdown DOMAIN
...wait and stop valgrind/libvirtd

valgrind log:

==5767== LEAK SUMMARY:
==5767==definitely lost: 19,666 bytes in 96 blocks
==5767==indirectly lost: 21,194 bytes in 123 blocks
==5767==  possibly lost: 2,699,140 bytes in 68 blocks
==5767==still reachable: 986,951 bytes in 15,038 blocks
==5767== suppressed: 0 bytes in 0 blocks
==5767==
==5767== For counts of detected and suppressed errors, rerun with: -v
==5767== ERROR SUMMARY: 96 errors from 96 contexts (suppressed: 0 from 0)

full log:
http://195.191.233.1/libvirt-gfapi.log
http://195.191.233.1/libvirt-gfapi.log.bz2

Best regards
Piotr Rybicki
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] GlusterFS v3.7.8 client leaks summary — part II

2016-02-11 Thread Soumya Koduri



On 02/11/2016 08:33 PM, Oleksandr Natalenko wrote:

And "API" test.

I used custom API app [1] and did brief file manipulations through it
(create/remove/stat).

Then I performed drop_caches, finished API [2] and got the following
Valgrind log [3].

I believe there are still some leaks occurring in glfs_lresolve() call
chain.


glfs_fini() should have ideally destroyed all the inodes in the inode 
table. I shall try to use your app and check if anything is missed out.


Thanks,
Soumya



Soumya?

[1] https://github.com/pfactum/xglfs
[2] https://github.com/pfactum/xglfs/blob/master/xglfs_destroy.c#L30
[3] https://gist.github.com/aec72b6164a695cf2d44

11.02.2016 10:12, Oleksandr Natalenko написав:

And here goes "rsync" test results (v3.7.8 + two patches by Soumya).

2 volumes involved: source and target.

=== Common indicators ===

slabtop before drop_caches: [1]
slabtop after drop_caches: [2]

=== Source volume (less interesting part) ===

RAM usage before drop_caches: [3]
statedump before drop_caches: [4]
RAM usage after drop_caches: [5]
statedump after drop_caches: [6]

=== Target volume (most interesting part) ===

RAM usage before drop_caches: [7]
statedump before drop_caches: [8]
RAM usage after drop_caches: [9]
statedump after drop_caches: [10]
Valgrind output: [11]

=== Conclusion ===

Again, see no obvious leaks.

[1] https://gist.github.com/e72fd30a4198dd630299
[2] https://gist.github.com/78ef9eae3dc16fd79c1b
[3] https://gist.github.com/4ed75e8d6cb40a1369d8
[4] https://gist.github.com/20a75d32db76795b90d4
[5] https://gist.github.com/0772959834610dfdaf2d
[6] https://gist.github.com/a71684bd3745c77c41eb
[7] https://gist.github.com/2c9be083cfe3bffe6cec
[8] https://gist.github.com/0102a16c94d3d8eb82e3
[9] https://gist.github.com/23f057dc8e4b2902bba1
[10] https://gist.github.com/385bbb95ca910ec9766f
[11] https://gist.github.com/685c4d3e13d31f597722

10.02.2016 15:37, Oleksandr Natalenko написав:

Hi, folks.

Here go new test results regarding client memory leak.

I use v3.7.8 with the following patches:

===
Soumya Koduri (2):
  inode: Retire the inodes from the lru list in inode_table_destroy
  gfapi: Use inode_forget in case of handle objects
===

Those are the only 2 not merged yet.

So far, I've performed only "find" test, and here are the results:

RAM usage before drop_caches: [1]
statedump before drop_caches: [2]
slabtop before drop_caches: [3]
RAM usage after drop_caches: [4]
statedump after drop_caches: [5]
slabtop after drop_caches: [6]
Valgrind output: [7]

No leaks either via statedump or via valgrind. However, statedump
stats still suffer from integer overflow.

Next steps I'm going to take:

1) "rsync" test;
2) API test.

[1] https://gist.github.com/88d2fa95c28baeb2543f
[2] https://gist.github.com/4f3e93ff2db6e3cf4081
[3] https://gist.github.com/62791a2c4258041ba821
[4] https://gist.github.com/1d3ce95a493d054bbac2
[5] https://gist.github.com/fa855a2752d3691365a7
[6] https://gist.github.com/84e9e27d2a2e5ff5dc33
[7] https://gist.github.com/f35bd32a5159d3571d3a
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] glusterfsd core on NetBSD (https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14139/consoleFull)

2016-02-10 Thread Soumya Koduri

Thanks Manu.

Kotresh,

Is this issue related to bug1221629 as well?

Thanks,
Soumya

On 02/10/2016 02:10 PM, Emmanuel Dreyfus wrote:

On Wed, Feb 10, 2016 at 12:17:23PM +0530, Soumya Koduri wrote:

I see a core generated in this regression run though all the tests seem to
have passed. I do not have a netbsd machine to analyze the core.
Could you please take a look and let me know what the issue could have been?


changelog bug. I am not sure how this could become NULL after it has been
checked at the beginning of gf_history_changelog().

I note this uses readdir() which is not thread-safe. readdir_r() should
probably be used instead.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xb99912b4 in gf_history_changelog (changelog_dir=0xb7b160f0 "\003",
 start=3081873456, end=0, n_parallel=-1217773520, actual_end=0xb7b05310)
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/changelog/lib/src/gf-history-changelog.c:834
834 gf_log (this->name, GF_LOG_ERROR,
(gdb) print this
$1 = (xlator_t *) 0x0
#0  0xb99912b4 in gf_history_changelog (changelog_dir=0xb7b160f0 "\003",
 start=3081873456, end=0, n_parallel=-1217773520, actual_end=0xb7b05310)
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/changelog/lib/src/gf-history-changelog.c:834
#1  0xbb6fec17 in rpcsvc_record_build_header (recordstart=0x0,
 rlen=3077193776, reply=..., payload=3081855216)
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/rpcsvc.c:857
#2  0xbb6fec95 in rpcsvc_record_build_header (recordstart=0xb7b10030 "",
 rlen=3077193776, reply=..., payload=3081855216)
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/rpcsvc.c:874
#3  0xbb6ffa81 in rpcsvc_submit_generic (req=0xb7b10030, proghdr=0xb7b160f0,
 hdrcount=0, payload=0xb76a4030, payloadcount=1, iobref=0x0)
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/rpcsvc.c:1316
#4  0xbb70506c in xdr_to_rpc_reply (msgbuf=0xb7b10030 "", len=0,
 reply=0xb76a4030, payload=0xb76a4030,
 verfbytes=0x1 )
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/xdr-rpcclnt.c:40
#5  0xbb26cbb5 in socket_server_event_handler (fd=16, idx=3, data=0xb7b10030,
 poll_in=1, poll_out=0, poll_err=0)
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-transport/socket/src/socket.c:2765
#6  0xbb7908da in syncop_rename (subvol=0xbb143030, oldloc=0xba45b4b0,
 newloc=0x3, xdata_in=0x75, xdata_out=0xbb7e8000)
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/syncop.c:2225
#7  0xbb790c21 in syncop_ftruncate (subvol=0xbb143030, fd=0x8062cc0 ,
 offset=-4647738537632864458, xdata_in=0xbb7efe75 <_rtld_bind_start+17>,
 xdata_out=0xbb7e8000)
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/syncop.c:2265
#8  0xbb75f6d1 in inode_table_dump (itable=0xbb143030,
 prefix=0x2 )
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/inode.c:2352
#9  0x08050e20 in main (argc=12, argv=0xbf7feaac)
 at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/glusterfsd/src/glusterfsd.c:2345


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfsd core on NetBSD (https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14139/consoleFull)

2016-02-09 Thread Soumya Koduri

Hi Emmanuel,

I see a core generated in this regression run though all the tests seem 
to have passed. I do not have a netbsd machine to analyze the core.

Could you please take a look and let me know what the issue could have been?

Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Rebalance data migration and corruption

2016-02-09 Thread Soumya Koduri



On 02/09/2016 12:30 PM, Raghavendra G wrote:

   Right. But if there are simultaneous access to the same file from


 any other client and rebalance process, delegations shall
not be
 granted or revoked if granted even though they are operating at
 different offsets. So if you rely only on delegations,
migration may
 not proceed if an application has held a lock or doing any
I/Os.


Does the brick process wait for the response of delegation holder
(rebalance process here) before it wipes out the
delegation/locks? If
that's the case, rebalance process can complete one transaction of
(read, src) and (write, dst) before responding to a delegation
recall.
That way there is no starvation for both applications and rebalance
process (though this makes both of them slower, but that cannot
helped I
think).


yes. Brick process should wait for certain period before revoking
the delegations forcefully in case if it is not returned by the
client. Also if required (like done by NFS servers) we can choose to
increase this timeout value at run time if the client is diligently
flushing the data.


hmm.. I would prefer an infinite timeout. The only scenario where brick
process can forcefully flush leases would be connection lose with
rebalance process. The more scenarios where brick can flush leases
without knowledge of rebalance process, we open up more race-windows for
this bug to occur.

In fact at least in theory to be correct, rebalance process should
replay all the transactions that happened during the lease which got
flushed out by brick (after re-acquiring that lease). So, we would like
to avoid any such scenarios.

Btw, what is the necessity of timeouts? Is it an insurance against rogue
clients who won't respond back to lease recalls?
yes. It is to protect from rogue clients and prevent starvation of other 
clients.


In the current design, every lease is associated with lease-id (like 
lockowner in case of locks) and all the further fops (I/Os) have to be 
done using this lease-id. So in case if any fop comes to brick process 
with the lease-id of the lease which got flushed by the brick process, 
we can send special error and rebalance process can then replay all 
those fops. Will that be sufficient?


CCin Poornima who has been implementing it.


Thanks,
Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Rebalance data migration and corruption

2016-02-08 Thread Soumya Koduri



On 02/08/2016 09:13 AM, Shyam wrote:

On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote:



- Original Message -

From: "Raghavendra Gowdappa" 
To: "Sakshi Bansal" , "Susant Palai"

Cc: "Gluster Devel" , "Nithya
Balachandran" , "Shyamsundar
Ranganathan" 
Sent: Friday, February 5, 2016 4:32:40 PM
Subject: Re: Rebalance data migration and corruption

+gluster-devel



Hi Sakshi/Susant,

- There is a data corruption issue in migration code. Rebalance
process,
   1. Reads data from src
   2. Writes (say w1) it to dst

   However, 1 and 2 are not atomic, so another write (say w2) to
same region
   can happen between 1. But these two writes can reach dst in the
order
   (w2,
   w1) resulting in a subtle corruption. This issue is not fixed yet
and can
   cause subtle data corruptions. The fix is simple and involves
rebalance
   process acquiring a mandatory lock to make 1 and 2 atomic.


We can make use of compound fop framework to make sure we don't suffer a
significant performance hit. Following will be the sequence of
operations
done by rebalance process:

1. issues a compound (mandatory lock, read) operation on src.
2. writes this data to dst.
3. issues unlock of lock acquired in 1.

Please co-ordinate with Anuradha for implementation of this compound
fop.

Following are the issues I see with this approach:
1. features/locks provides mandatory lock functionality only for
posix-locks
(flock and fcntl based locks). So, mandatory locks will be
posix-locks which
will conflict with locks held by application. So, if an application
has held
an fcntl/flock, migration cannot proceed.


What if the file is opened with O_NONBLOCK? Cant rebalance process skip 
the file and continue in case if mandatory lock acquisition fails?




We can implement a "special" domain for mandatory internal locks.
These locks will behave similar to posix mandatory locks in that
conflicting fops (like write, read) are blocked/failed if they are
done while a lock is held.


So is the only difference between mandatory internal locks and posix 
mandatory locks is that internal locks shall not conflict with other 
application locks(advisory/mandatory)?





2. data migration will be less efficient because of an extra unlock
(with
compound lock + read) or extra lock and unlock (for non-compound fop
based
implementation) for every read it does from src.


Can we use delegations here? Rebalance process can acquire a
mandatory-write-delegation (an exclusive lock with a functionality
that delegation is recalled when a write operation happens). In that
case rebalance process, can do something like:

1. Acquire a read delegation for entire file.
2. Migrate the entire file.
3. Remove/unlock/give-back the delegation it has acquired.

If a recall is issued from brick (when a write happens from mount), it
completes the current write to dst (or throws away the read from src)
to maintain atomicity. Before doing next set of (read, src) and
(write, dst) tries to reacquire lock.


With delegations this simplifies the normal path, when a file is
exclusively handled by rebalance. It also improves the case where a
client and rebalance are conflicting on a file, to degrade to mandatory
locks by either parties.

I would prefer we take the delegation route for such needs in the future.

Right. But if there are simultaneous access to the same file from any 
other client and rebalance process, delegations shall not be granted or 
revoked if granted even though they are operating at different offsets. 
So if you rely only on delegations, migration may not proceed if an 
application has held a lock or doing any I/Os.


Also ideally rebalance process has to take write delegation as it would 
end up writing the data on destination brick which shall affect READ 
I/Os, (though of course we can have special checks/hacks for internal 
generated fops).


That said, having delegations shall definitely ensure correctness with 
respect to exclusive file access.


Thanks,
Soumya



@Soumyak, can something like this be done with delegations?

@Pranith,
Afr does transactions for writing to its subvols. Can you suggest any
optimizations here so that rebalance process can have a transaction
for (read, src) and (write, dst) with minimal performance overhead?

regards,
Raghavendra.



Comments?



regards,
Raghavendra.



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Rebalance data migration and corruption

2016-02-08 Thread Soumya Koduri



On 02/09/2016 10:27 AM, Raghavendra G wrote:



On Mon, Feb 8, 2016 at 4:31 PM, Soumya Koduri <skod...@redhat.com
<mailto:skod...@redhat.com>> wrote:



On 02/08/2016 09:13 AM, Shyam wrote:

On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote:



- Original Message -

From: "Raghavendra Gowdappa" <rgowd...@redhat.com
<mailto:rgowd...@redhat.com>>
To: "Sakshi Bansal" <saban...@redhat.com
<mailto:saban...@redhat.com>>, "Susant Palai"
<spa...@redhat.com <mailto:spa...@redhat.com>>
Cc: "Gluster Devel" <gluster-devel@gluster.org
<mailto:gluster-devel@gluster.org>>, "Nithya
Balachandran" <nbala...@redhat.com
<mailto:nbala...@redhat.com>>, "Shyamsundar
Ranganathan" <srang...@redhat.com
<mailto:srang...@redhat.com>>
Sent: Friday, February 5, 2016 4:32:40 PM
Subject: Re: Rebalance data migration and corruption

+gluster-devel


Hi Sakshi/Susant,

- There is a data corruption issue in migration
code. Rebalance
process,
1. Reads data from src
2. Writes (say w1) it to dst

However, 1 and 2 are not atomic, so another
write (say w2) to
same region
can happen between 1. But these two writes can
reach dst in the
order
(w2,
w1) resulting in a subtle corruption. This issue
is not fixed yet
and can
cause subtle data corruptions. The fix is simple
and involves
rebalance
process acquiring a mandatory lock to make 1 and
2 atomic.


We can make use of compound fop framework to make sure
we don't suffer a
significant performance hit. Following will be the
sequence of
operations
done by rebalance process:

1. issues a compound (mandatory lock, read) operation on
src.
2. writes this data to dst.
3. issues unlock of lock acquired in 1.

Please co-ordinate with Anuradha for implementation of
this compound
fop.

Following are the issues I see with this approach:
1. features/locks provides mandatory lock functionality
only for
posix-locks
(flock and fcntl based locks). So, mandatory locks will be
posix-locks which
will conflict with locks held by application. So, if an
application
has held
an fcntl/flock, migration cannot proceed.


What if the file is opened with O_NONBLOCK? Cant rebalance process
skip the file and continue in case if mandatory lock acquisition fails?


Similar functionality can be achieved by acquiring non-blocking inodelk
like SETLK (as opposed to SETLKW). However whether rebalance process
should block or not depends on the use case. In Some use-cases (like
remove-brick) rebalance process _has_ to migrate all the files. Even for
other scenarios skipping too many files is not a good idea as it beats
the purpose of running rebalance. So one of the design goals is to
migrate as many files as possible without making design too complex.




We can implement a "special" domain for mandatory internal
locks.
These locks will behave similar to posix mandatory locks in that
conflicting fops (like write, read) are blocked/failed if
they are
done while a lock is held.


So is the only difference between mandatory internal locks and posix
mandatory locks is that internal locks shall not conflict with other
application locks(advisory/mandatory)?


Yes. Mandatory internal locks (aka Mandatory inodelk for this
discussion) will conflict only in their domain. They also conflict with
any fops that might change the file (primarily write here, but different
fops can be added based on requirement). So in a fop like writev we need
to check in two lists - external lock (posix lock) list _and_ mandatory
inodelk list.

The reason (if not clear) for using mandatory locks by rebalance process
is that clients need not be bothered with acquiring a lock (which will
unnecessarily degrade performance of I/O when there is no rebalance
going on). Th

Re: [Gluster-devel] GlusterFS FUSE client leaks summary — part I

2016-02-01 Thread Soumya Koduri



On 02/01/2016 01:39 PM, Oleksandr Natalenko wrote:

Wait. It seems to be my bad.

Before unmounting I do drop_caches (2), and glusterfs process CPU usage
goes to 100% for a while. I haven't waited for it to drop to 0%, and
instead perform unmount. It seems glusterfs is purging inodes and that's
why it uses 100% of CPU. I've re-tested it, waiting for CPU usage to
become normal, and got no leaks.

Will verify this once again and report more.

BTW, if that works, how could I limit inode cache for FUSE client? I do
not want it to go beyond 1G, for example, even if I have 48G of RAM on
my server.


Its hard-coded for now. For fuse the lru limit (of the inodes which are 
not active) is (32*1024).
One of the ways to address this (which we were discussing earlier) is to 
have an option to configure inode cache limit. If that sounds good, we 
can then check on if it has to be global/volume-level, client/server/both.


Thanks,
Soumya



01.02.2016 09:54, Soumya Koduri написав:

On 01/31/2016 03:05 PM, Oleksandr Natalenko wrote:

Unfortunately, this patch doesn't help.

RAM usage on "find" finish is ~9G.

Here is statedump before drop_caches: https://gist.github.com/
fc1647de0982ab447e20


[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
size=706766688
num_allocs=2454051



And after drop_caches: https://gist.github.com/5eab63bc13f78787ed19


[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
size=550996416
num_allocs=1913182

There isn't much significant drop in inode contexts. One of the
reasons could be because of dentrys holding a refcount on the inodes
which shall result in inodes not getting purged even after
fuse_forget.


pool-name=fuse:dentry_t
hot-count=32761

if  '32761' is the current active dentry count, it still doesn't seem
to match up to inode count.

Thanks,
Soumya


And here is Valgrind output:
https://gist.github.com/2490aeac448320d98596

On субота, 30 січня 2016 р. 22:56:37 EET Xavier Hernandez wrote:

There's another inode leak caused by an incorrect counting of
lookups on directory reads.

Here's a patch that solves the problem for
3.7:

http://review.gluster.org/13324

Hopefully with this patch the
memory leaks should disapear.

Xavi

On 29.01.2016 19:09, Oleksandr

Natalenko wrote:

Here is intermediate summary of current memory


leaks in FUSE client


investigation.

I use GlusterFS v3.7.6


release with the following patches:

===



Kaleb S KEITHLEY (1):

fuse: use-after-free fix in fuse-bridge, revisited


Pranith Kumar K


(1):

mount/fuse: Fix use-after-free crash



Soumya Koduri (3):

gfapi: Fix inode nlookup counts


inode: Retire the inodes from the lru


list in inode_table_destroy


upcall: free the xdr* allocations
===


With those patches we got API leaks fixed (I hope, brief tests show


that) and


got rid of "kernel notifier loop terminated" message.


Nevertheless, FUSE


client still leaks.

I have several test


volumes with several million of small files (100K…2M in


average). I


do 2 types of FUSE client testing:

1) find /mnt/volume -type d
2)


rsync -av -H /mnt/source_volume/* /mnt/target_volume/


And most


up-to-date results are shown below:

=== find /mnt/volume -type d


===


Memory consumption: ~4G



Statedump:

https://gist.github.com/10cde83c63f1b4f1dd7a


Valgrind:

https://gist.github.com/097afb01ebb2c5e9e78d


I guess,


fuse-bridge/fuse-resolve. related.


=== rsync -av -H


/mnt/source_volume/* /mnt/target_volume/ ===


Memory consumption:

~3.3...4G


Statedump (target volume):

https://gist.github.com/31e43110eaa4da663435


Valgrind (target volume):

https://gist.github.com/f8e0151a6878cacc9b1a


I guess,


DHT-related.


Give me more patches to test :).


___


Gluster-devel mailing


list


Gluster-devel@gluster.org


http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client leaks summary — part I

2016-02-01 Thread Soumya Koduri



On 02/01/2016 02:48 PM, Xavier Hernandez wrote:

Hi,

On 01/02/16 09:54, Soumya Koduri wrote:



On 02/01/2016 01:39 PM, Oleksandr Natalenko wrote:

Wait. It seems to be my bad.

Before unmounting I do drop_caches (2), and glusterfs process CPU usage
goes to 100% for a while. I haven't waited for it to drop to 0%, and
instead perform unmount. It seems glusterfs is purging inodes and that's
why it uses 100% of CPU. I've re-tested it, waiting for CPU usage to
become normal, and got no leaks.

Will verify this once again and report more.

BTW, if that works, how could I limit inode cache for FUSE client? I do
not want it to go beyond 1G, for example, even if I have 48G of RAM on
my server.


Its hard-coded for now. For fuse the lru limit (of the inodes which are
not active) is (32*1024).


This is not exact for current implementation. The inode memory pool is
configured with 32*1024 entries, but the lru limit is set to infinite:
currently inode_table_prune() takes lru_limit == 0 as infinite, and the
inode table created by fuse is initialized with 0.

Anyway this should not be a big problem in normal conditions. After
having fixed the incorrect nlookup count for "." and ".." directory
entries, when the kernel detects memory pressure and sends inode
forgets, the memory will be released.


One of the ways to address this (which we were discussing earlier) is to
have an option to configure inode cache limit.


I think this will need more thinking. I've made a fast test forcing
lru_limit to a small value and weird errors have appeared (probably from
inodes being expected to exist when kernel sends new requests). Anyway I
haven't spent time on this. I haven't tested in on master either.


Oh okay. Thanks for checking.

-Soumya



Xavi


If that sounds good, we
can then check on if it has to be global/volume-level,
client/server/both.

Thanks,
Soumya



01.02.2016 09:54, Soumya Koduri написав:

On 01/31/2016 03:05 PM, Oleksandr Natalenko wrote:

Unfortunately, this patch doesn't help.

RAM usage on "find" finish is ~9G.

Here is statedump before drop_caches: https://gist.github.com/
fc1647de0982ab447e20


[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
size=706766688
num_allocs=2454051



And after drop_caches: https://gist.github.com/5eab63bc13f78787ed19


[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
size=550996416
num_allocs=1913182

There isn't much significant drop in inode contexts. One of the
reasons could be because of dentrys holding a refcount on the inodes
which shall result in inodes not getting purged even after
fuse_forget.


pool-name=fuse:dentry_t
hot-count=32761

if  '32761' is the current active dentry count, it still doesn't seem
to match up to inode count.

Thanks,
Soumya


And here is Valgrind output:
https://gist.github.com/2490aeac448320d98596

On субота, 30 січня 2016 р. 22:56:37 EET Xavier Hernandez wrote:

There's another inode leak caused by an incorrect counting of
lookups on directory reads.

Here's a patch that solves the problem for
3.7:

http://review.gluster.org/13324

Hopefully with this patch the
memory leaks should disapear.

Xavi

On 29.01.2016 19:09, Oleksandr

Natalenko wrote:

Here is intermediate summary of current memory


leaks in FUSE client


investigation.

I use GlusterFS v3.7.6


release with the following patches:

===



Kaleb S KEITHLEY (1):

fuse: use-after-free fix in fuse-bridge, revisited


Pranith Kumar K


(1):

mount/fuse: Fix use-after-free crash



Soumya Koduri (3):

gfapi: Fix inode nlookup counts


inode: Retire the inodes from the lru


list in inode_table_destroy


upcall: free the xdr* allocations
===


With those patches we got API leaks fixed (I hope, brief tests show


that) and


got rid of "kernel notifier loop terminated" message.


Nevertheless, FUSE


client still leaks.

I have several test


volumes with several million of small files (100K…2M in


average). I


do 2 types of FUSE client testing:

1) find /mnt/volume -type d
2)


rsync -av -H /mnt/source_volume/* /mnt/target_volume/


And most


up-to-date results are shown below:

=== find /mnt/volume -type d


===


Memory consumption: ~4G



Statedump:

https://gist.github.com/10cde83c63f1b4f1dd7a


Valgrind:

https://gist.github.com/097afb01ebb2c5e9e78d


I guess,


fuse-bridge/fuse-resolve. related.


=== rsync -av -H


/mnt/source_volume/* /mnt/target_volume/ ===


Memory consumption:

~3.3...4G


Statedump (target volume):

https://gist.github.com/31e43110eaa4da663435


Valgrind (target volume):

https://gist.github.com/f8e0151a6878cacc9b1a


I guess,


DHT-related.


Give me more patches to test :).


___


Gluster-devel mailing


list


Gluster-devel@gluster.org


http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gl

Re: [Gluster-devel] GlusterFS FUSE client leaks summary — part I

2016-01-31 Thread Soumya Koduri



On 01/31/2016 03:05 PM, Oleksandr Natalenko wrote:

Unfortunately, this patch doesn't help.

RAM usage on "find" finish is ~9G.

Here is statedump before drop_caches: https://gist.github.com/
fc1647de0982ab447e20


[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
size=706766688
num_allocs=2454051



And after drop_caches: https://gist.github.com/5eab63bc13f78787ed19


[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
size=550996416
num_allocs=1913182

There isn't much significant drop in inode contexts. One of the reasons 
could be because of dentrys holding a refcount on the inodes which shall 
result in inodes not getting purged even after fuse_forget.



pool-name=fuse:dentry_t
hot-count=32761

if  '32761' is the current active dentry count, it still doesn't seem to 
match up to inode count.


Thanks,
Soumya


And here is Valgrind output: https://gist.github.com/2490aeac448320d98596

On субота, 30 січня 2016 р. 22:56:37 EET Xavier Hernandez wrote:

There's another inode leak caused by an incorrect counting of
lookups on directory reads.

Here's a patch that solves the problem for
3.7:

http://review.gluster.org/13324

Hopefully with this patch the
memory leaks should disapear.

Xavi

On 29.01.2016 19:09, Oleksandr

Natalenko wrote:

Here is intermediate summary of current memory


leaks in FUSE client


investigation.

I use GlusterFS v3.7.6


release with the following patches:

===



Kaleb S KEITHLEY (1):

fuse: use-after-free fix in fuse-bridge, revisited


Pranith Kumar K


(1):

mount/fuse: Fix use-after-free crash



Soumya Koduri (3):

gfapi: Fix inode nlookup counts


inode: Retire the inodes from the lru


list in inode_table_destroy


upcall: free the xdr* allocations
===


With those patches we got API leaks fixed (I hope, brief tests show


that) and


got rid of "kernel notifier loop terminated" message.


Nevertheless, FUSE


client still leaks.

I have several test


volumes with several million of small files (100K…2M in


average). I


do 2 types of FUSE client testing:

1) find /mnt/volume -type d
2)


rsync -av -H /mnt/source_volume/* /mnt/target_volume/


And most


up-to-date results are shown below:

=== find /mnt/volume -type d


===


Memory consumption: ~4G



Statedump:

https://gist.github.com/10cde83c63f1b4f1dd7a


Valgrind:

https://gist.github.com/097afb01ebb2c5e9e78d


I guess,


fuse-bridge/fuse-resolve. related.


=== rsync -av -H


/mnt/source_volume/* /mnt/target_volume/ ===


Memory consumption:

~3.3...4G


Statedump (target volume):

https://gist.github.com/31e43110eaa4da663435


Valgrind (target volume):

https://gist.github.com/f8e0151a6878cacc9b1a


I guess,


DHT-related.


Give me more patches to test :).


___


Gluster-devel mailing


list


Gluster-devel@gluster.org


http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Core from gNFS process

2016-01-15 Thread Soumya Koduri



On 01/14/2016 08:41 PM, Vijay Bellur wrote:

On 01/14/2016 04:11 AM, Jiffin Tony Thottan wrote:



On 14/01/16 14:28, Jiffin Tony Thottan wrote:

Hi,

The core generated when encryption xlator is enabled

[2016-01-14 08:13:15.740835] E
[crypt.c:4298:master_set_master_vol_key] 0-test1-crypt: FATAL: missing
master key
[2016-01-14 08:13:15.740859] E [MSGID: 101019]
[xlator.c:429:xlator_init] 0-test1-crypt: Initialization of volume
'test1-crypt' failed, review your volfile again
[2016-01-14 08:13:15.740890] E [MSGID: 101066]
[graph.c:324:glusterfs_graph_init] 0-test1-crypt: initializing
translator failed
[2016-01-14 08:13:15.740904] E [MSGID: 101176]
[graph.c:670:glusterfs_graph_activate] 0-graph: init failed
[2016-01-14 08:13:15.741676] W [glusterfsd.c:1231:cleanup_and_exit]
(-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x307) [0x40d287]
-->/usr/sbin/glusterfs(glusterfs_process_volfp+0x117) [0x4086c7]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x4d) [0x407e1d] ) 0-:
received signum (0), shutting down




Forgot to mention this last mail,  for crypt xlator needs master key
before enabling the translator which cause the issue
--


Irrespective of the problem, the nfs process should not crash. Can we
check why there is a memory corruption during cleanup_and_exit()?

That's right. This issue was reported quite a few times earlier in 
gluster-devel and it is not specific to gluster-nfs process. As updated 
in [1], we have raised bug1293594[2] against lib-gcc team to further 
investigate this.


As requested in [1], kindly upload the core in the bug along with bt 
taken with gcc debuginfo packages installed. Might help to get their 
attention and get a closure on this issue sooner.


Thanks,
Soumya
[1] http://article.gmane.org/gmane.comp.file-systems.gluster.devel/13298


-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Core from gNFS process

2016-01-15 Thread Soumya Koduri




On 01/15/2016 06:52 PM, Soumya Koduri wrote:



On 01/14/2016 08:41 PM, Vijay Bellur wrote:

On 01/14/2016 04:11 AM, Jiffin Tony Thottan wrote:



On 14/01/16 14:28, Jiffin Tony Thottan wrote:

Hi,

The core generated when encryption xlator is enabled

[2016-01-14 08:13:15.740835] E
[crypt.c:4298:master_set_master_vol_key] 0-test1-crypt: FATAL: missing
master key
[2016-01-14 08:13:15.740859] E [MSGID: 101019]
[xlator.c:429:xlator_init] 0-test1-crypt: Initialization of volume
'test1-crypt' failed, review your volfile again
[2016-01-14 08:13:15.740890] E [MSGID: 101066]
[graph.c:324:glusterfs_graph_init] 0-test1-crypt: initializing
translator failed
[2016-01-14 08:13:15.740904] E [MSGID: 101176]
[graph.c:670:glusterfs_graph_activate] 0-graph: init failed
[2016-01-14 08:13:15.741676] W [glusterfsd.c:1231:cleanup_and_exit]
(-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x307) [0x40d287]
-->/usr/sbin/glusterfs(glusterfs_process_volfp+0x117) [0x4086c7]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x4d) [0x407e1d] ) 0-:
received signum (0), shutting down




Forgot to mention this last mail,  for crypt xlator needs master key
before enabling the translator which cause the issue
--


Irrespective of the problem, the nfs process should not crash. Can we
check why there is a memory corruption during cleanup_and_exit()?


That's right. This issue was reported quite a few times earlier in
gluster-devel and it is not specific to gluster-nfs process. As updated
in [1], we have raised bug1293594[2] against lib-gcc team to further
investigate this.

As requested in [1], kindly upload the core in the bug along with bt
taken with gcc debuginfo packages installed. Might help to get their
attention and get a closure on this issue sooner.


Here is the bug link -
https://bugzilla.redhat.com/show_bug.cgi?id=1293594

Request Raghavendra/Ravi to update it.

Thanks,
Soumya


Thanks,
Soumya
[1] http://article.gmane.org/gmane.comp.file-systems.gluster.devel/13298


-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-13 Thread Soumya Koduri



On 01/13/2016 04:08 PM, Soumya Koduri wrote:



On 01/12/2016 12:46 PM, Oleksandr Natalenko wrote:

Just in case, here is Valgrind output on FUSE client with 3.7.6 +
API-related patches we discussed before:

https://gist.github.com/cd6605ca19734c1496a4



Thanks for sharing the results. I made changes to fix one leak reported
there wrt ' client_cbk_cache_invalidation' -
 - http://review.gluster.org/#/c/13232/

The other inode* related memory reported as lost is mainly (maybe)
because fuse client process doesn't cleanup its memory (doesn't use
fini()) while exiting the process.  Hence majority of those allocations
are listed as lost. But most of the inodes should have got purged when
we drop vfs cache. Did you do drop vfs cache before exiting the process?

I shall add some log statements and check that part


Also please take statedump of the fuse mount process (after dropping vfs 
cache) when you see high memory usage by issuing the following command -

'kill -USR1 '

The statedump will be copied to 'glusterdump..dump.tim
estamp` file in /var/run/gluster or /usr/local/var/run/gluster.
Please refer to [1] for more information.

Thanks,
Soumya
[1] http://review.gluster.org/#/c/8288/1/doc/debugging/statedump.md



Thanks,
Soumya


12.01.2016 08:24, Soumya Koduri написав:

For fuse client, I tried vfs drop_caches as suggested by Vijay in an
earlier mail. Though all the inodes get purged, I still doesn't see
much difference in the memory footprint drop. Need to investigate what
else is consuming so much memory here.

___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-13 Thread Soumya Koduri



On 01/12/2016 12:17 PM, Mathieu Chateau wrote:

I tried like suggested:

echo 3 > /proc/sys/vm/drop_caches
sync


It lower a bit usage:

before:

Images intégrées 2

after:

Images intégrées 1



Thanks Mathieu. There is a drop in memory usage after dropping vfs cache 
but doesn't seem significant. Not sure at this point what else may be 
consuming most of the memory.
Maybe after dropping the vfs cache, you could as well take valgrind 
results and see if there are huge chunks reported as lost from inode_new*.


I shall too look into it further and update.

-Soumya



Cordialement,

Mathieu CHATEAU
http://www.lotp.fr


2016-01-12 7:34 GMT+01:00 Mathieu Chateau <mathieu.chat...@lotp.fr
<mailto:mathieu.chat...@lotp.fr>>:

Hello,

I also experience high memory usage on my gluster clients. Sample :
Images intégrées 1

Can I help in testing/debugging ?



Cordialement,
Mathieu CHATEAU
http://www.lotp.fr

2016-01-12 7:24 GMT+01:00 Soumya Koduri <skod...@redhat.com
<mailto:skod...@redhat.com>>:



On 01/11/2016 05:11 PM, Oleksandr Natalenko wrote:

Brief test shows that Ganesha stopped leaking and crashing,
so it seems
to be good for me.

Thanks for checking.

Nevertheless, back to my original question: what about FUSE
client? It
is still leaking despite all the fixes applied. Should it be
considered
another issue?


For fuse client, I tried vfs drop_caches as suggested by Vijay
in an earlier mail. Though all the inodes get purged, I still
doesn't see much difference in the memory footprint drop. Need
to investigate what else is consuming so much memory here.

Thanks,
Soumya



        11.01.2016 12:26, Soumya Koduri написав:

I have made changes to fix the lookup leak in a
different way (as
discussed with Pranith) and uploaded them in the latest
patch set #4
 - http://review.gluster.org/#/c/13096/

Please check if it resolves the mem leak and hopefully
doesn't result
in any assertion :)

Thanks,
Soumya

    On 01/08/2016 05:04 PM, Soumya Koduri wrote:

I could reproduce while testing deep directories
with in the mount
point. I root caus'ed the issue & had discussion
with Pranith to
understand the purpose and recommended way of taking
nlookup on inodes.

I shall make changes to my existing fix and post the
patch soon.
Thanks for your patience!

-Soumya

On 01/07/2016 07:34 PM, Oleksandr Natalenko wrote:

OK, I've patched GlusterFS v3.7.6 with 43570a01
and 5cffb56b (the most
recent
revisions) and NFS-Ganesha v2.3.0 with 8685abfc
(most recent revision
too).

On traversing GlusterFS volume with many files
in one folder via NFS
mount I
get an assertion:

===
ganesha.nfsd: inode.c:716: __inode_forget:
Assertion `inode->nlookup >=
nlookup' failed.
===

I used GDB on NFS-Ganesha process to get
appropriate stacktraces:

1. short stacktrace of failed thread:

https://gist.github.com/7f63bb99c530d26ded18

2. full stacktrace of failed thread:

https://gist.github.com/d9bc7bc8f6a0bbff9e86

3. short stacktrace of all threads:

https://gist.github.com/f31da7725306854c719f

4. full stacktrace of all threads:

https://gist.github.com/65cbc562b01211ea5612

GlusterFS volume configuration:

https://gist.github.com/30f0129d16e25d4a5a52

ganesha.conf:

https://gist.github.com/9b5e59b8d6d8cb84c85d

How I mount NFS share:

===
mount -t nfs4 127.0.0.1:/mail_boxes /mnt/tmp -o

defaults,_netdev,minorversion=2,noac,noacl,lookupcache=none,timeo=100
===

On четвер, 7 січня 2016 р. 12:06:42 EET Soumya
Koduri wrote:

   

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-11 Thread Soumya Koduri
I have made changes to fix the lookup leak in a different way (as 
discussed with Pranith) and uploaded them in the latest patch set #4

- http://review.gluster.org/#/c/13096/

Please check if it resolves the mem leak and hopefully doesn't result in 
any assertion :)


Thanks,
Soumya

On 01/08/2016 05:04 PM, Soumya Koduri wrote:

I could reproduce while testing deep directories with in the mount
point. I root caus'ed the issue & had discussion with Pranith to
understand the purpose and recommended way of taking nlookup on inodes.

I shall make changes to my existing fix and post the patch soon.
Thanks for your patience!

-Soumya

On 01/07/2016 07:34 PM, Oleksandr Natalenko wrote:

OK, I've patched GlusterFS v3.7.6 with 43570a01 and 5cffb56b (the most
recent
revisions) and NFS-Ganesha v2.3.0 with 8685abfc (most recent revision
too).

On traversing GlusterFS volume with many files in one folder via NFS
mount I
get an assertion:

===
ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >=
nlookup' failed.
===

I used GDB on NFS-Ganesha process to get appropriate stacktraces:

1. short stacktrace of failed thread:

https://gist.github.com/7f63bb99c530d26ded18

2. full stacktrace of failed thread:

https://gist.github.com/d9bc7bc8f6a0bbff9e86

3. short stacktrace of all threads:

https://gist.github.com/f31da7725306854c719f

4. full stacktrace of all threads:

https://gist.github.com/65cbc562b01211ea5612

GlusterFS volume configuration:

https://gist.github.com/30f0129d16e25d4a5a52

ganesha.conf:

https://gist.github.com/9b5e59b8d6d8cb84c85d

How I mount NFS share:

===
mount -t nfs4 127.0.0.1:/mail_boxes /mnt/tmp -o
defaults,_netdev,minorversion=2,noac,noacl,lookupcache=none,timeo=100
===

On четвер, 7 січня 2016 р. 12:06:42 EET Soumya Koduri wrote:

Entries_HWMark = 500;




___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-11 Thread Soumya Koduri



On 01/11/2016 05:11 PM, Oleksandr Natalenko wrote:

Brief test shows that Ganesha stopped leaking and crashing, so it seems
to be good for me.


Thanks for checking.


Nevertheless, back to my original question: what about FUSE client? It
is still leaking despite all the fixes applied. Should it be considered
another issue?


For fuse client, I tried vfs drop_caches as suggested by Vijay in an 
earlier mail. Though all the inodes get purged, I still doesn't see much 
difference in the memory footprint drop. Need to investigate what else 
is consuming so much memory here.


Thanks,
Soumya



11.01.2016 12:26, Soumya Koduri написав:

I have made changes to fix the lookup leak in a different way (as
discussed with Pranith) and uploaded them in the latest patch set #4
- http://review.gluster.org/#/c/13096/

Please check if it resolves the mem leak and hopefully doesn't result
in any assertion :)

Thanks,
Soumya

On 01/08/2016 05:04 PM, Soumya Koduri wrote:

I could reproduce while testing deep directories with in the mount
point. I root caus'ed the issue & had discussion with Pranith to
understand the purpose and recommended way of taking nlookup on inodes.

I shall make changes to my existing fix and post the patch soon.
Thanks for your patience!

-Soumya

On 01/07/2016 07:34 PM, Oleksandr Natalenko wrote:

OK, I've patched GlusterFS v3.7.6 with 43570a01 and 5cffb56b (the most
recent
revisions) and NFS-Ganesha v2.3.0 with 8685abfc (most recent revision
too).

On traversing GlusterFS volume with many files in one folder via NFS
mount I
get an assertion:

===
ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >=
nlookup' failed.
===

I used GDB on NFS-Ganesha process to get appropriate stacktraces:

1. short stacktrace of failed thread:

https://gist.github.com/7f63bb99c530d26ded18

2. full stacktrace of failed thread:

https://gist.github.com/d9bc7bc8f6a0bbff9e86

3. short stacktrace of all threads:

https://gist.github.com/f31da7725306854c719f

4. full stacktrace of all threads:

https://gist.github.com/65cbc562b01211ea5612

GlusterFS volume configuration:

https://gist.github.com/30f0129d16e25d4a5a52

ganesha.conf:

https://gist.github.com/9b5e59b8d6d8cb84c85d

How I mount NFS share:

===
mount -t nfs4 127.0.0.1:/mail_boxes /mnt/tmp -o
defaults,_netdev,minorversion=2,noac,noacl,lookupcache=none,timeo=100
===

On четвер, 7 січня 2016 р. 12:06:42 EET Soumya Koduri wrote:

Entries_HWMark = 500;




___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-08 Thread Soumya Koduri
I could reproduce while testing deep directories with in the mount 
point. I root caus'ed the issue & had discussion with Pranith to 
understand the purpose and recommended way of taking nlookup on inodes.


I shall make changes to my existing fix and post the patch soon.
Thanks for your patience!

-Soumya

On 01/07/2016 07:34 PM, Oleksandr Natalenko wrote:

OK, I've patched GlusterFS v3.7.6 with 43570a01 and 5cffb56b (the most recent
revisions) and NFS-Ganesha v2.3.0 with 8685abfc (most recent revision too).

On traversing GlusterFS volume with many files in one folder via NFS mount I
get an assertion:

===
ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >=
nlookup' failed.
===

I used GDB on NFS-Ganesha process to get appropriate stacktraces:

1. short stacktrace of failed thread:

https://gist.github.com/7f63bb99c530d26ded18

2. full stacktrace of failed thread:

https://gist.github.com/d9bc7bc8f6a0bbff9e86

3. short stacktrace of all threads:

https://gist.github.com/f31da7725306854c719f

4. full stacktrace of all threads:

https://gist.github.com/65cbc562b01211ea5612

GlusterFS volume configuration:

https://gist.github.com/30f0129d16e25d4a5a52

ganesha.conf:

https://gist.github.com/9b5e59b8d6d8cb84c85d

How I mount NFS share:

===
mount -t nfs4 127.0.0.1:/mail_boxes /mnt/tmp -o
defaults,_netdev,minorversion=2,noac,noacl,lookupcache=none,timeo=100
===

On четвер, 7 січня 2016 р. 12:06:42 EET Soumya Koduri wrote:

Entries_HWMark = 500;




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-06 Thread Soumya Koduri



On 01/06/2016 01:58 PM, Oleksandr Natalenko wrote:

OK, here is valgrind log of patched Ganesha (I took recent version of
your patchset, 8685abfc6d) with Entries_HWMARK set to 500.

https://gist.github.com/5397c152a259b9600af0

See no huge runtime leaks now.


Glad to hear this :)

However, I've repeated this test with

another volume in replica and got the following Ganesha error:

===
ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >=
nlookup' failed.
===


I repeated the tests on replica volume as well. But haven't hit any 
assert. Could you confirm if you have taken the latest gluster patch set 
#3 ?

 - http://review.gluster.org/#/c/13096/3

If you are hitting the issue even then, please provide the core if possible.

Thanks,
Soumya



06.01.2016 08:40, Soumya Koduri написав:

On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote:

OK, I've repeated the same traversing test with patched GlusterFS
API, and
here is new Valgrind log:

https://gist.github.com/17ecb16a11c9aed957f5


Fuse mount doesn't use gfapi helper. Does your above GlusterFS API
application call glfs_fini() during exit? glfs_fini() is responsible
for freeing the memory consumed by gfAPI applications.

Could you repeat the test with nfs-ganesha (which for sure calls
glfs_fini() and purges inodes if exceeds its inode cache limit) if
possible.

Thanks,
Soumya


Still leaks.

On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote:

On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:

Unfortunately, both patches didn't make any difference for me.

I've patched 3.7.6 with both patches, recompiled and installed patched
GlusterFS package on client side and mounted volume with ~2M of files.
The I performed usual tree traverse with simple "find".

Memory RES value went from ~130M at the moment of mounting to ~1.5G
after traversing the volume for ~40 mins. Valgrind log still shows
lots
of leaks. Here it is:

https://gist.github.com/56906ca6e657c4ffa4a1


Looks like you had done fuse mount. The patches which I have pasted
below apply to gfapi/nfs-ganesha applications.

Also, to resolve the nfs-ganesha issue which I had mentioned below (in
case if Entries_HWMARK option gets changed), I have posted below fix -
https://review.gerrithub.io/#/c/258687

Thanks,
Soumya


Ideas?

05.01.2016 12:31, Soumya Koduri написав:

I tried to debug the inode* related leaks and seen some improvements
after applying the below patches when ran the same test (but will
smaller load). Could you please apply those patches & confirm the
same?

a) http://review.gluster.org/13125

This will fix the inodes & their ctx related leaks during unexport
and
the program exit. Please check the valgrind output after applying the
patch. It should not list any inodes related memory as lost.

b) http://review.gluster.org/13096

The reason the change in Entries_HWMARK (in your earlier mail) dint
have much effect is that the inode_nlookup count doesn't become zero
for those handles/inodes being closed by ganesha. Hence those inodes
shall get added to inode lru list instead of purge list which shall
get forcefully purged only when the number of gfapi inode table
entries reaches its limit (which is 137012).

This patch fixes those 'nlookup' counts. Please apply this patch and
reduce 'Entries_HWMARK' to much lower value and check if it decreases
the in-memory being consumed by ganesha process while being active.

CACHEINODE {

 Entries_HWMark = 500;

}


Note: I see an issue with nfs-ganesha during exit when the option
'Entries_HWMARK' gets changed. This is not related to any of the
above
patches (or rather Gluster) and I am currently debugging it.

Thanks,
Soumya

On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:

1. test with Cache_Size = 256 and Entries_HWMark = 4096

Before find . -type f:

root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

After:

root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

~250M leak.

2. test with default values (after ganesha restart)

Before:

root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

After:

root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

~159M leak.

No reasonable correlation detected. Second test was finished much
faster than
first (I guess, server-side GlusterFS cache or server kernel page
cache is the
cause).

There are ~1.8M files on this test volume.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes hug

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-05 Thread Soumya Koduri
I tried to debug the inode* related leaks and seen some improvements 
after applying the below patches when ran the same test (but will 
smaller load). Could you please apply those patches & confirm the same?


a) http://review.gluster.org/13125

This will fix the inodes & their ctx related leaks during unexport and 
the program exit. Please check the valgrind output after applying the 
patch. It should not list any inodes related memory as lost.


b) http://review.gluster.org/13096

The reason the change in Entries_HWMARK (in your earlier mail) dint have 
much effect is that the inode_nlookup count doesn't become zero for 
those handles/inodes being closed by ganesha. Hence those inodes shall 
get added to inode lru list instead of purge list which shall get 
forcefully purged only when the number of gfapi inode table entries 
reaches its limit (which is 137012).


This patch fixes those 'nlookup' counts. Please apply this patch and 
reduce 'Entries_HWMARK' to much lower value and check if it decreases 
the in-memory being consumed by ganesha process while being active.


CACHEINODE {
Entries_HWMark = 500;
}


Note: I see an issue with nfs-ganesha during exit when the option 
'Entries_HWMARK' gets changed. This is not related to any of the above 
patches (or rather Gluster) and I am currently debugging it.


Thanks,
Soumya


On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:

1. test with Cache_Size = 256 and Entries_HWMark = 4096

Before find . -type f:

root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00 /usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT

After:

root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39 /usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT

~250M leak.

2. test with default values (after ganesha restart)

Before:

root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00 /usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT

After:

root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40 /usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT

~159M leak.

No reasonable correlation detected. Second test was finished much faster than
first (I guess, server-side GlusterFS cache or server kernel page cache is the
cause).

There are ~1.8M files on this test volume.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes huge amount of memory while doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N NIV_EVENT
===

1.4G is too much for simple stat() :(.

Ideas?


nfs-ganesha also has cache layer which can scale to millions of entries
depending on the number of files/directories being looked upon. However
there are parameters to tune it. So either try stat with few entries or
add below block in nfs-ganesha.conf file, set low limits and check the
difference. That may help us narrow down how much memory actually
consumed by core nfs-ganesha and gfAPI.

CACHEINODE {
Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max no.
of entries in the cache.
}

Thanks,
Soumya


24.12.2015 16:32, Oleksandr Natalenko написав:

Still actual issue for 3.7.6. Any suggestions?

24.09.2015 10:14, Oleksandr Natalenko написав:

In our GlusterFS deployment we've encountered something like memory
leak in GlusterFS FUSE client.

We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
maildir format). Here is inode stats for both bricks and mountpoint:

===
Brick 1 (Server 1):

Filesystem InodesIUsed

  IFree IUse% Mounted on

/dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918

  5678132262% /bricks/r6sdLV08_vd1_mail

Brick 2 (Server 2):

Filesystem InodesIUsed

  IFree IUse% Mounted on

/dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913

  5678130712% /bricks/r6sdLV07_vd0_mail

Mountpoint (Server 3):

Filesystem  InodesIUsed  IFree
IUse% Mounted on
glusterfs.xxx:mail   578767760 10954915  567812845
2% /var/spool/mail/virtual
===

glusterfs.xxx domain has two A records for both Server 1 and Server 2.

Here is volume info:

===
Volume Name: mail
Type: Replicate
Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Brick2: s

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-05 Thread Soumya Koduri



On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:

Unfortunately, both patches didn't make any difference for me.

I've patched 3.7.6 with both patches, recompiled and installed patched
GlusterFS package on client side and mounted volume with ~2M of files.
The I performed usual tree traverse with simple "find".

Memory RES value went from ~130M at the moment of mounting to ~1.5G
after traversing the volume for ~40 mins. Valgrind log still shows lots
of leaks. Here it is:

https://gist.github.com/56906ca6e657c4ffa4a1


Looks like you had done fuse mount. The patches which I have pasted 
below apply to gfapi/nfs-ganesha applications.


Also, to resolve the nfs-ganesha issue which I had mentioned below  (in 
case if Entries_HWMARK option gets changed), I have posted below fix -

https://review.gerrithub.io/#/c/258687

Thanks,
Soumya



Ideas?

05.01.2016 12:31, Soumya Koduri написав:

I tried to debug the inode* related leaks and seen some improvements
after applying the below patches when ran the same test (but will
smaller load). Could you please apply those patches & confirm the
same?

a) http://review.gluster.org/13125

This will fix the inodes & their ctx related leaks during unexport and
the program exit. Please check the valgrind output after applying the
patch. It should not list any inodes related memory as lost.

b) http://review.gluster.org/13096

The reason the change in Entries_HWMARK (in your earlier mail) dint
have much effect is that the inode_nlookup count doesn't become zero
for those handles/inodes being closed by ganesha. Hence those inodes
shall get added to inode lru list instead of purge list which shall
get forcefully purged only when the number of gfapi inode table
entries reaches its limit (which is 137012).

This patch fixes those 'nlookup' counts. Please apply this patch and
reduce 'Entries_HWMARK' to much lower value and check if it decreases
the in-memory being consumed by ganesha process while being active.

CACHEINODE {
Entries_HWMark = 500;
}


Note: I see an issue with nfs-ganesha during exit when the option
'Entries_HWMARK' gets changed. This is not related to any of the above
patches (or rather Gluster) and I am currently debugging it.

Thanks,
Soumya


On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:

1. test with Cache_Size = 256 and Entries_HWMark = 4096

Before find . -type f:

root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

After:

root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

~250M leak.

2. test with default values (after ganesha restart)

Before:

root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

After:

root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

~159M leak.

No reasonable correlation detected. Second test was finished much
faster than
first (I guess, server-side GlusterFS cache or server kernel page
cache is the
cause).

There are ~1.8M files on this test volume.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes huge amount of memory while doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N NIV_EVENT
===

1.4G is too much for simple stat() :(.

Ideas?


nfs-ganesha also has cache layer which can scale to millions of entries
depending on the number of files/directories being looked upon. However
there are parameters to tune it. So either try stat with few entries or
add below block in nfs-ganesha.conf file, set low limits and check the
difference. That may help us narrow down how much memory actually
consumed by core nfs-ganesha and gfAPI.

CACHEINODE {
Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); #
cache size
Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10);
#Max no.
of entries in the cache.
}

Thanks,
Soumya


24.12.2015 16:32, Oleksandr Natalenko написав:

Still actual issue for 3.7.6. Any suggestions?

24.09.2015 10:14, Oleksandr Natalenko написав:

In our GlusterFS deployment we've encountered something like memory
leak in GlusterFS FUSE client.

We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
maildir format). Here is inode stats for both bricks and mountpoint:

===
Brick 1 (Server 1):

Filesystem Inodes IUsed

  IFree IUse% Mounted on

/dev/

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-05 Thread Soumya Koduri



On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote:

OK, I've repeated the same traversing test with patched GlusterFS API, and
here is new Valgrind log:

https://gist.github.com/17ecb16a11c9aed957f5

Fuse mount doesn't use gfapi helper. Does your above GlusterFS API 
application call glfs_fini() during exit? glfs_fini() is responsible for 
freeing the memory consumed by gfAPI applications.


Could you repeat the test with nfs-ganesha (which for sure calls 
glfs_fini() and purges inodes if exceeds its inode cache limit) if possible.


Thanks,
Soumya


Still leaks.

On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote:

On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:

Unfortunately, both patches didn't make any difference for me.

I've patched 3.7.6 with both patches, recompiled and installed patched
GlusterFS package on client side and mounted volume with ~2M of files.
The I performed usual tree traverse with simple "find".

Memory RES value went from ~130M at the moment of mounting to ~1.5G
after traversing the volume for ~40 mins. Valgrind log still shows lots
of leaks. Here it is:

https://gist.github.com/56906ca6e657c4ffa4a1


Looks like you had done fuse mount. The patches which I have pasted
below apply to gfapi/nfs-ganesha applications.

Also, to resolve the nfs-ganesha issue which I had mentioned below  (in
case if Entries_HWMARK option gets changed), I have posted below fix -
https://review.gerrithub.io/#/c/258687

Thanks,
Soumya


Ideas?

05.01.2016 12:31, Soumya Koduri написав:

I tried to debug the inode* related leaks and seen some improvements
after applying the below patches when ran the same test (but will
smaller load). Could you please apply those patches & confirm the
same?

a) http://review.gluster.org/13125

This will fix the inodes & their ctx related leaks during unexport and
the program exit. Please check the valgrind output after applying the
patch. It should not list any inodes related memory as lost.

b) http://review.gluster.org/13096

The reason the change in Entries_HWMARK (in your earlier mail) dint
have much effect is that the inode_nlookup count doesn't become zero
for those handles/inodes being closed by ganesha. Hence those inodes
shall get added to inode lru list instead of purge list which shall
get forcefully purged only when the number of gfapi inode table
entries reaches its limit (which is 137012).

This patch fixes those 'nlookup' counts. Please apply this patch and
reduce 'Entries_HWMARK' to much lower value and check if it decreases
the in-memory being consumed by ganesha process while being active.

CACHEINODE {

 Entries_HWMark = 500;

}


Note: I see an issue with nfs-ganesha during exit when the option
'Entries_HWMARK' gets changed. This is not related to any of the above
patches (or rather Gluster) and I am currently debugging it.

Thanks,
Soumya

On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:

1. test with Cache_Size = 256 and Entries_HWMark = 4096

Before find . -type f:

root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

After:

root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

~250M leak.

2. test with default values (after ganesha restart)

Before:

root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

After:

root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
NIV_EVENT

~159M leak.

No reasonable correlation detected. Second test was finished much
faster than
first (I guess, server-side GlusterFS cache or server kernel page
cache is the
cause).

There are ~1.8M files on this test volume.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes huge amount of memory while doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N NIV_EVENT
===

1.4G is too much for simple stat() :(.

Ideas?


nfs-ganesha also has cache layer which can scale to millions of entries
depending on the number of files/directories being looked upon. However
there are parameters to tune it. So either try stat with few entries or
add below block in nfs-ganesha.conf file, set low limits and check the
difference. That may help us narrow down how much memory actually
consumed by core nfs-ganesha and gfAPI.

CACHEINODE {

 Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); #

cac

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-12-28 Thread Soumya Koduri


- Original Message -
> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> To: "Oleksandr Natalenko" <oleksa...@natalenko.name>, "Soumya Koduri" 
> <skod...@redhat.com>
> Cc: gluster-us...@gluster.org, gluster-devel@gluster.org
> Sent: Monday, December 28, 2015 9:32:07 AM
> Subject: Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE 
> client
> 
> 
> 
> On 12/26/2015 04:45 AM, Oleksandr Natalenko wrote:
> > Also, here is valgrind output with our custom tool, that does GlusterFS
> > volume
> > traversing (with simple stats) just like find tool. In this case
> > NFS-Ganesha
> > is not used.
> >
> > https://gist.github.com/e4602a50d3c98f7a2766
> hi Oleksandr,
>I went through the code. Both NFS Ganesha and the custom tool use
> gfapi and the leak is stemming from that. I am not very familiar with
> this part of code but there seems to be one inode_unref() that is
> missing in failure path of resolution. Not sure if that is corresponding
> to the leaks.
> 
> Soumya,
> Could this be the issue? review.gluster.org seems to be down. So
> couldn't send the patch. Please ping me on IRC.
> diff --git a/api/src/glfs-resolve.c b/api/src/glfs-resolve.c
> index b5efcba..52b538b 100644
> --- a/api/src/glfs-resolve.c
> +++ b/api/src/glfs-resolve.c
> @@ -467,9 +467,11 @@ priv_glfs_resolve_at (struct glfs *fs, xlator_t
> *subvol, inode_t *at,
>  }
>  }
> 
> -   if (parent && next_component)
> +   if (parent && next_component) {
> +   inode_unref (parent);
> +   parent = NULL;
>  /* resolution failed mid-way */
>  goto out;
> +}
> 
>  /* At this point, all components up to the last parent directory
> have been resolved successfully (@parent). Resolution of
> basename
> 
yes. This could be one of the reasons. There are few leaks with respect to 
inode references in gfAPI. See below.


On GlusterFS side, looks like majority of the leaks are related to inodes and 
their contexts. Possible reasons which I can think of are:

1) When there is a graph switch, old inode table and their entries are not 
purged (this is a known issue). There was an effort put to fix this issue. But 
I think it had other side-effects and hence not been applied. Maybe we should 
revive those changes again.

2) With regard to above, old entries can be purged in case if any request comes 
with the reference to old inode (as part of 'glfs_resolve_inode'), provided 
their reference counts are properly decremented. But this is not happening at 
the moment in gfapi.

3) Applications should hold and release their reference as needed and required. 
There are certain fixes needed in this area as well (including the fix provided 
by Pranith above).

From code-inspection, have made changes to fix few leaks of case (2) & (3) with 
respect to gfAPI.
http://review.gluster.org/#/c/13096 (yet to test the changes)

I haven't yet narrowed down any suspects pertaining to only NFS-Ganesha. Will 
re-check and update.

Thanks,
Soumya


> Pranith
> >
> > One may see GlusterFS-related leaks here as well.
> >
> > On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
> >> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
> >>> Another addition: it seems to be GlusterFS API library memory leak
> >>> because NFS-Ganesha also consumes huge amount of memory while doing
> >>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
> >>> usage:
> >>>
> >>> ===
> >>> root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
> >>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
> >>> /etc/ganesha/ganesha.conf -N NIV_EVENT
> >>> ===
> >>>
> >>> 1.4G is too much for simple stat() :(.
> >>>
> >>> Ideas?
> >> nfs-ganesha also has cache layer which can scale to millions of entries
> >> depending on the number of files/directories being looked upon. However
> >> there are parameters to tune it. So either try stat with few entries or
> >> add below block in nfs-ganesha.conf file, set low limits and check the
> >> difference. That may help us narrow down how much memory actually
> >> consumed by core nfs-ganesha and gfAPI.
> >>
> >> CACHEINODE {
> >>Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
> >>Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max no.
> >> of entries in the cache.
> >>

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-12-26 Thread Soumya Koduri

Thanks for sharing the results. Shall look at the leaks and update.

-Soumya

On 12/26/2015 04:45 AM, Oleksandr Natalenko wrote:

Also, here is valgrind output with our custom tool, that does GlusterFS volume
traversing (with simple stats) just like find tool. In this case NFS-Ganesha
is not used.

https://gist.github.com/e4602a50d3c98f7a2766

One may see GlusterFS-related leaks here as well.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes huge amount of memory while doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N NIV_EVENT
===

1.4G is too much for simple stat() :(.

Ideas?


nfs-ganesha also has cache layer which can scale to millions of entries
depending on the number of files/directories being looked upon. However
there are parameters to tune it. So either try stat with few entries or
add below block in nfs-ganesha.conf file, set low limits and check the
difference. That may help us narrow down how much memory actually
consumed by core nfs-ganesha and gfAPI.

CACHEINODE {
Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max no.
of entries in the cache.
}

Thanks,
Soumya


24.12.2015 16:32, Oleksandr Natalenko написав:

Still actual issue for 3.7.6. Any suggestions?

24.09.2015 10:14, Oleksandr Natalenko написав:

In our GlusterFS deployment we've encountered something like memory
leak in GlusterFS FUSE client.

We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
maildir format). Here is inode stats for both bricks and mountpoint:

===
Brick 1 (Server 1):

Filesystem InodesIUsed

  IFree IUse% Mounted on

/dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918

  5678132262% /bricks/r6sdLV08_vd1_mail

Brick 2 (Server 2):

Filesystem InodesIUsed

  IFree IUse% Mounted on

/dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913

  5678130712% /bricks/r6sdLV07_vd0_mail

Mountpoint (Server 3):

Filesystem  InodesIUsed  IFree
IUse% Mounted on
glusterfs.xxx:mail   578767760 10954915  567812845
2% /var/spool/mail/virtual
===

glusterfs.xxx domain has two A records for both Server 1 and Server 2.

Here is volume info:

===
Volume Name: mail
Type: Replicate
Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
Options Reconfigured:
nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
features.cache-invalidation-timeout: 10
performance.stat-prefetch: off
performance.quick-read: on
performance.read-ahead: off
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 4
performance.cache-max-file-size: 1048576
performance.cache-size: 67108864
performance.readdir-ahead: off
===

Soon enough after mounting and exim/dovecot start, glusterfs client
process begins to consume huge amount of RAM:

===
user@server3 ~$ ps aux | grep glusterfs | grep mail
root 28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03 4310:05
/usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
--volfile-server=glusterfs.xxx --volfile-id=mail
/var/spool/mail/virtual
===

That is, ~15 GiB of RAM.

Also we've tried to use mountpoint withing separate KVM VM with 2 or 3
GiB of RAM, and soon after starting mail daemons got OOM killer for
glusterfs client process.

Mounting same share via NFS works just fine. Also, we have much less
iowait and loadavg on client side with NFS.

Also, we've tried to change IO threads count and cache size in order
to limit memory usage with no luck. As you can see, total cache size
is 4×64==256 MiB (compare to 15 GiB).

Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't
help as well.

Here are volume memory stats:

===
Memory status for volume : mail
--
Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Mallinfo

Arena: 36859904
Ordblks  : 10357
Smblks   : 519
Hblks: 21
Hblkhd   : 30515200
Usmblks  : 0
Fsmblks  : 53440
Uordblks : 18604144
Fordblks : 18255760
Keepcost : 114112

Mempool Stats
-
NameHotCount ColdCount PaddedSizeof
AllocCount MaxAlloc   Misses Max-StdAlloc
 - 
--   
mail-server:fd_t   0  1024  108
3077312

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-12-25 Thread Soumya Koduri



On 12/25/2015 08:56 PM, Oleksandr Natalenko wrote:

What units Cache_Size is measured in? Bytes?

Its actually (Cache_Size * sizeof_ptr) bytes. If possible, could you 
please run ganesha process under valgrind? Will help in detecting leaks.


Thanks,
Soumya


25.12.2015 16:58, Soumya Koduri написав:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes huge amount of memory while doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N NIV_EVENT
===

1.4G is too much for simple stat() :(.

Ideas?

nfs-ganesha also has cache layer which can scale to millions of
entries depending on the number of files/directories being looked
upon. However there are parameters to tune it. So either try stat with
few entries or add below block in nfs-ganesha.conf file, set low
limits and check the difference. That may help us narrow down how much
memory actually consumed by core nfs-ganesha and gfAPI.

CACHEINODE {
Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache
size
Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max
no. of entries in the cache.
}

Thanks,
Soumya



24.12.2015 16:32, Oleksandr Natalenko написав:

Still actual issue for 3.7.6. Any suggestions?

24.09.2015 10:14, Oleksandr Natalenko написав:

In our GlusterFS deployment we've encountered something like memory
leak in GlusterFS FUSE client.

We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
maildir format). Here is inode stats for both bricks and mountpoint:

===
Brick 1 (Server 1):

Filesystem Inodes IUsed
 IFree IUse% Mounted on
/dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918
 5678132262% /bricks/r6sdLV08_vd1_mail

Brick 2 (Server 2):

Filesystem Inodes IUsed
 IFree IUse% Mounted on
/dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913
 5678130712% /bricks/r6sdLV07_vd0_mail

Mountpoint (Server 3):

Filesystem  InodesIUsed  IFree
IUse% Mounted on
glusterfs.xxx:mail   578767760 10954915  567812845
2% /var/spool/mail/virtual
===

glusterfs.xxx domain has two A records for both Server 1 and Server 2.

Here is volume info:

===
Volume Name: mail
Type: Replicate
Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
Options Reconfigured:
nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
features.cache-invalidation-timeout: 10
performance.stat-prefetch: off
performance.quick-read: on
performance.read-ahead: off
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 4
performance.cache-max-file-size: 1048576
performance.cache-size: 67108864
performance.readdir-ahead: off
===

Soon enough after mounting and exim/dovecot start, glusterfs client
process begins to consume huge amount of RAM:

===
user@server3 ~$ ps aux | grep glusterfs | grep mail
root 28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03 4310:05
/usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
--volfile-server=glusterfs.xxx --volfile-id=mail
/var/spool/mail/virtual
===

That is, ~15 GiB of RAM.

Also we've tried to use mountpoint withing separate KVM VM with 2 or 3
GiB of RAM, and soon after starting mail daemons got OOM killer for
glusterfs client process.

Mounting same share via NFS works just fine. Also, we have much less
iowait and loadavg on client side with NFS.

Also, we've tried to change IO threads count and cache size in order
to limit memory usage with no luck. As you can see, total cache size
is 4×64==256 MiB (compare to 15 GiB).

Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't
help as well.

Here are volume memory stats:

===
Memory status for volume : mail
--
Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Mallinfo

Arena: 36859904
Ordblks  : 10357
Smblks   : 519
Hblks: 21
Hblkhd   : 30515200
Usmblks  : 0
Fsmblks  : 53440
Uordblks : 18604144
Fordblks : 18255760
Keepcost : 114112

Mempool Stats
-
NameHotCount ColdCount PaddedSizeof
AllocCount MaxAlloc   Misses Max-StdAlloc
 - 
--   
mail-server:fd_t   0  1024  108
30773120  13700
mail-server:dentry_t   16110   274   84
23567614816384  1106499 1152
mail-server:inode_t16363

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-12-25 Thread Soumya Koduri



On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes huge amount of memory while doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N NIV_EVENT
===

1.4G is too much for simple stat() :(.

Ideas?
nfs-ganesha also has cache layer which can scale to millions of entries 
depending on the number of files/directories being looked upon. However 
there are parameters to tune it. So either try stat with few entries or 
add below block in nfs-ganesha.conf file, set low limits and check the 
difference. That may help us narrow down how much memory actually 
consumed by core nfs-ganesha and gfAPI.


CACHEINODE {
Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
	Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max no. 
of entries in the cache.

}

Thanks,
Soumya



24.12.2015 16:32, Oleksandr Natalenko написав:

Still actual issue for 3.7.6. Any suggestions?

24.09.2015 10:14, Oleksandr Natalenko написав:

In our GlusterFS deployment we've encountered something like memory
leak in GlusterFS FUSE client.

We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
maildir format). Here is inode stats for both bricks and mountpoint:

===
Brick 1 (Server 1):

Filesystem InodesIUsed
 IFree IUse% Mounted on
/dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918
 5678132262% /bricks/r6sdLV08_vd1_mail

Brick 2 (Server 2):

Filesystem InodesIUsed
 IFree IUse% Mounted on
/dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913
 5678130712% /bricks/r6sdLV07_vd0_mail

Mountpoint (Server 3):

Filesystem  InodesIUsed  IFree
IUse% Mounted on
glusterfs.xxx:mail   578767760 10954915  567812845
2% /var/spool/mail/virtual
===

glusterfs.xxx domain has two A records for both Server 1 and Server 2.

Here is volume info:

===
Volume Name: mail
Type: Replicate
Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
Options Reconfigured:
nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
features.cache-invalidation-timeout: 10
performance.stat-prefetch: off
performance.quick-read: on
performance.read-ahead: off
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 4
performance.cache-max-file-size: 1048576
performance.cache-size: 67108864
performance.readdir-ahead: off
===

Soon enough after mounting and exim/dovecot start, glusterfs client
process begins to consume huge amount of RAM:

===
user@server3 ~$ ps aux | grep glusterfs | grep mail
root 28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03 4310:05
/usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
--volfile-server=glusterfs.xxx --volfile-id=mail
/var/spool/mail/virtual
===

That is, ~15 GiB of RAM.

Also we've tried to use mountpoint withing separate KVM VM with 2 or 3
GiB of RAM, and soon after starting mail daemons got OOM killer for
glusterfs client process.

Mounting same share via NFS works just fine. Also, we have much less
iowait and loadavg on client side with NFS.

Also, we've tried to change IO threads count and cache size in order
to limit memory usage with no luck. As you can see, total cache size
is 4×64==256 MiB (compare to 15 GiB).

Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't
help as well.

Here are volume memory stats:

===
Memory status for volume : mail
--
Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Mallinfo

Arena: 36859904
Ordblks  : 10357
Smblks   : 519
Hblks: 21
Hblkhd   : 30515200
Usmblks  : 0
Fsmblks  : 53440
Uordblks : 18604144
Fordblks : 18255760
Keepcost : 114112

Mempool Stats
-
NameHotCount ColdCount PaddedSizeof
AllocCount MaxAlloc   Misses Max-StdAlloc
 - 
--   
mail-server:fd_t   0  1024  108
30773120  13700
mail-server:dentry_t   16110   274   84
23567614816384  1106499 1152
mail-server:inode_t1636321  156
23721687616384  1876651 1169
mail-trash:fd_t0  1024  108
  0000
mail-trash:dentry_t0 32768   84
  0000
mail-trash:inode_t 4

[Gluster-devel] REMINDER: Gluster Community Bug Triage meeting at 12:00 UTC (~in 90 minutes)

2015-12-22 Thread Soumya Koduri

Hi all,

This meeting is scheduled for anyone that is interested in learning more
about, or assisting with the Bug Triage.

Meeting details:
- location: #gluster-meeting on Freenode IRC
 (https://webchat.freenode.net/?channels=gluster-meeting  )
- date: every Tuesday
- time: 12:00 UTC
 (in your terminal, run: date -d "12:00 UTC")
- agenda: https://public.pad.fsfe.org/p/gluster-bug-triage

Currently the following items are listed:
* Roll Call
* Status of last weeks action items
* Group Triage
* Open Floor

The last two topics have space for additions. If you have a suitable bug
or topic to discuss, please add it to the agenda.

Appreciate your participation.

Thank you!

-Soumya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] REMINDER: Gluster Bug Triage timing-poll

2015-12-22 Thread Soumya Koduri

+gluster-users

On 12/22/2015 06:03 PM, Hari Gowtham wrote:

Hi all,

There was a poll conducted to find the timing that suits best for the people 
who want to participate
in the weekly Gluster bug triage meeting. The result for the poll is yet to be 
announced but we would
like to get more polls. So participants who haven't voted could vote the timing 
they think that suits
best for them. Please do vote soon so we can do the needed

The link to take part 
:https://doodle.com/poll/tsywtwfngfk4ssr8?tmail=poll_invitecontact_participant_invitation_with_message=pollbtn




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Minutes of today's Gluster Community Bug Triage meeting (22nd Dec 2015)

2015-12-22 Thread Soumya Koduri

Hi,

Please find the minutes of today's Gluster Community Bug Triage meeting 
below. Thanks to everyone who have attend the meeting.


Minutes: 
http://meetbot.fedoraproject.org/gluster-meeting/2015-12-22/gluster_bug_triage.2015-12-22-12.00.html
Minutes (text): 
http://meetbot.fedoraproject.org/gluster-meeting/2015-12-22/gluster_bug_triage.2015-12-22-12.00.txt
Log: 
http://meetbot.fedoraproject.org/gluster-meeting/2015-12-22/gluster_bug_triage.2015-12-22-12.00.log.html




#gluster-meeting: Gluster Bug Triage



Meeting started by skoduri at 12:00:13 UTC. The full logs are available
at
http://meetbot.fedoraproject.org/gluster-meeting/2015-12-22/gluster_bug_triage.2015-12-22-12.00.log.html
.



Meeting summary
---
* Roll Call  (skoduri, 12:00:25)
  * agenda: https://public.pad.fsfe.org/p/gluster-bug-triage  (skoduri,
12:02:56)

* Last week's action items  (skoduri, 12:03:05)

* kkeithley_ will come up with a proposal to reduce the number of bugs
  against "mainline" in NEW state  (skoduri, 12:03:30)

* skoduri and ndevos will document how people can get bug notifications
  for specific components  (skoduri, 12:04:36)
  * LINK:

https://github.com/gluster/glusterdocs/blob/master/Developer-guide/Bugzilla%20Notifications.md
(skoduri, 12:04:54)
  * ACTION: kkeithley_ will come up with a proposal to reduce the number
of bugs against "mainline" in NEW state  (skoduri, 12:07:03)

* hagarth start/sync email on regular (nightly) automated tests
  (skoduri, 12:07:20)
  * ACTION: hagarth start/sync email on regular (nightly) automated
tests  (skoduri, 12:07:58)

* lalatenduM 's automated Coverity setup in Jenkins need assistance from
  an admin with more permissions  (skoduri, 12:08:25)
  * ACTION: lalatenduM 's automated Coverity setup in Jenkins need
assistance from an admin with more permissions  (skoduri, 12:09:30)

* ndevos needs to look into building nightly debug rpms that can be used
  for testing  (skoduri, 12:09:57)
  * ACTION: lalatenduM and ndevos need to think about and decide how to
provide/use debug builds  (skoduri, 12:12:29)

* hgowtham to send a reminder about the poll regrading the timing for
  the triage meeting.  (skoduri, 12:12:45)
  * ACTION: hgowtham to send a reminder about the poll regrading the
timing for the triage meeting.  (skoduri, 12:13:51)

* lalatenduM provide a simple step/walk-through on how to provide
  testcases for the nightly rpm tests  (skoduri, 12:14:02)

* ndevos to propose some test-cases for minimal libgfapi test  (skoduri,
  12:14:10)
  * ACTION: ndevos to propose some test-cases for minimal libgfapi test
(skoduri, 12:15:34)
  * ACTION: msvbhat  will look into using nightly builds for automated
testing, and will report issues/success to the mailinglist
(skoduri, 12:20:49)
  * ACTION: lalatenduM 's automated Coverity setup in Jenkins need
assistance from an admin with more permissions .. msvbhat shall look
into it  (skoduri, 12:21:17)
  * ACTION: msvbhat  and ndevos need to think about and decide how to
provide/use debug builds  (skoduri, 12:21:31)
  * ACTION: msvbhat  provide a simple step/walk-through on how to
provide testcases for the nightly rpm tests  (skoduri, 12:21:56)

* Group Triage  (skoduri, 12:22:31)
  * LINK: https://public.pad.fsfe.org/p/gluster-bugs-to-triage
(skoduri, 12:22:44)

* Open Floor  (skoduri, 12:46:21)

Meeting ended at 12:57:02 UTC.




Action Items

* kkeithley_ will come up with a proposal to reduce the number of bugs
  against "mainline" in NEW state
* hagarth start/sync email on regular (nightly) automated tests
* lalatenduM 's automated Coverity setup in Jenkins need assistance from
  an admin with more permissions
* lalatenduM and ndevos need to think about and decide how to
  provide/use debug builds
* hgowtham to send a reminder about the poll regrading the timing for
  the triage meeting.
* ndevos to propose some test-cases for minimal libgfapi test
* msvbhat  will look into using nightly builds for automated testing,
  and will report issues/success to the mailinglist
* lalatenduM 's automated Coverity setup in Jenkins need assistance from
  an admin with more permissions .. msvbhat shall look into it
* msvbhat  and ndevos need to think about and decide how to provide/use
  debug builds
* msvbhat  provide a simple step/walk-through on how to provide
  testcases for the nightly rpm tests




Action Items, by person
---
* hgowtham
  * hgowtham to send a reminder about the poll regrading the timing for
the triage meeting.
* kkeithley_
  * kkeithley_ will come up with a proposal to reduce the number of bugs
against "mainline" in NEW state
* lalatenduM
  * lalatenduM 's automated Coverity setup in Jenkins need assistance
from an admin with more permissions
  * lalatenduM and ndevos need to think about and decide how to
provide/use debug builds
  * 

[Gluster-devel] crash in '_Unwind_Backtrace () from ./lib64/libgcc_s.so.1'

2015-12-22 Thread Soumya Koduri

Hi,

I have raised BZ#1293594 to get inputs from gcc team to further debug 
the crash we have been seeing (especially with the test 
-./tests/bugs/snapshot/bug-1140162-file-snapshot-features-encrypt-opts-validation.t 
).
If anyone happen to run into this issue again, kindly install 
gcc-debuginfo package on that machine to get full backtrace (as 
requested in [1]) and update the bug with details.


Thanks,
Soumya

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1293594#c4

On 11/26/2015 03:07 PM, Soumya Koduri wrote:

Below are the findings from the core and the logs

1) [2015-11-25 19:06:41.592905] E
[crypt.c:4298:master_set_master_vol_key] 0-patchy-crypt: FATAL: missing
master key

xlator_init() of crypt xlator fails, which I assume gets loaded when
features.encryption is on (which the below mentioned .t test does)

Request anyone familiar with crypt xlator to take a look at it.

This results in the shutdown of NFS (glusterfsd) process calling
cleanup_an_exit()


2) There is a crash in libgcc

Thread 1 (LWP 11584):
#0  0x7f922ad34867 in ?? () from ./lib64/libgcc_s.so.1
#1  0x7f922ad35119 in _Unwind_Backtrace () from ./lib64/libgcc_s.so.1
#2  0x7f923660b936 in backtrace () from ./lib64/libc.so.6
---Type  to continue, or q  to quit---
#3  0x7f92379abf73 in _gf_msg_backtrace_nomem (level=GF_LOG_ALERT,
stacksize=200)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/logging.c:1090

#4  0x7f92379b1d38 in gf_print_trace (signum=11, ctx=0x2331010)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/common-utils.c:740

#5  0x004098d6 in glusterfsd_print_trace (signum=11)
 at
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:2033

#6  
#7  0x7f9229eb8cec in ?? ()
#8  0x7f9236c8ba51 in start_thread () from ./lib64/libpthread.so.0
#9  0x7f92365f593d in clone () from ./lib64/libc.so.6
(gdb)

This looks similar to the issue pointed earlier by Kotresh [1]

As mentioned in the last mail of that thread, [2] doesn't seem to have
fixed it completely.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10866
[2] http://review.gluster.org/#/c/10417/


Thanks,
Soumya

On 11/26/2015 10:43 AM, Nithya Balachandran wrote:

Hi,


The test
tests/bugs/snapshot/bug-1140162-file-snapshot-features-encrypt-opts-validation.t
has failed with a core.Can you please take a look?


The NFS log says:
gluster/02c803ff8630bd12cc8dc9dc043a6103.socket)
[2015-11-25 19:06:41.561937] I [MSGID: 101190]
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started
thread with index 1
[2015-11-25 19:06:41.566469] I
[rpcsvc.c:2211:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service:
Configured rpc.outstanding-rpc-limit with value 16
[2015-11-25 19:06:41.568896] I [MSGID: 112153]
[mount3.c:3924:mnt3svc_init] 0-nfs-mount: Exports auth has been disabled!
[2015-11-25 19:06:41.592393] I [rpc-drc.c:689:rpcsvc_drc_init]
0-rpc-service: DRC is turned OFF
[2015-11-25 19:06:41.592441] I [MSGID: 112110] [nfs.c:1513:init]
0-nfs: NFS service started
[2015-11-25 19:06:41.592905] E
[crypt.c:4298:master_set_master_vol_key] 0-patchy-crypt: FATAL:
missing master key
[2015-11-25 19:06:41.592928] E [MSGID: 101019]
[xlator.c:429:xlator_init] 0-patchy-crypt: Initialization of volume
'patchy-crypt' failed, review your volfile again
[2015-11-25 19:06:41.592953] E [MSGID: 101066]
[graph.c:324:glusterfs_graph_init] 0-patchy-crypt: initializing
translator failed
[2015-11-25 19:06:41.592976] E [MSGID: 101176]
[graph.c:670:glusterfs_graph_activate] 0-graph: init failed
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-11-25 19:06:41
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-11-25 19:06:41
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8dev
[2015-11-25 19:06:41.594542] W [glusterfsd.c:1231:cleanup_and_exit]
(-->/build/install/sbin/glusterfs(mgmt_getspec_cbk+0x34d) [0x40e71d]
-->/b


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Storing pNFS related state on GlusterFS

2015-12-09 Thread Soumya Koduri

Hi,

pNFS is a feature introduced as part of NFSv4.1 protocol to allow direct 
client access to storage devices containing file data (in short parallel 
I/O). Client request for the layouts of entire file or specific range. 
On receiving the layout information, they shall directly contact the 
server containing the data for the I/O.


In case of a cluster of (NFS)servers,
* Meta-data servers (MDS) are responsible to provide layouts of the
  file and recall them in case of any change in the layout.
* Data servers (DS) contain the actual data and process the I/O.

For more information, kindly refer to [1].

Currently with NFS-Ganesha+GlusterFS, we support FILE_LAYOUTs but with 
single MDS.


So
* to avoid single point of failure & be able to support multiple MDS and
* to recall the layout in case of cluster of (NFS)servers,
we need to store the layouts on the back-end filesystem(GlusterFS) and 
recall them in case of any conflicting access which may change the file 
layout.


Since it is on similar lines to storing and recalling lease state (with 
slightly different semantics), we are planning to store and process them 
as a special type of lease ('LAYOUT') in the lease xlator being worked 
upon as part of [2].


More details are captured in the below spec [3] :
http://review.gluster.org/#/c/12367

Kindly review the same and provide your inputs/comments.

Thanks,
Soumya

[1] https://tools.ietf.org/rfc/rfc5661.txt (Section 12. Parallel NFS (pNFS))

[2] http://review.gluster.org/#/c/11980/

[3] http://review.gluster.org/#/c/12367
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] compound fop design first cut

2015-12-08 Thread Soumya Koduri



On 12/09/2015 11:44 AM, Pranith Kumar Karampuri wrote:



On 12/09/2015 06:37 AM, Vijay Bellur wrote:

On 12/08/2015 03:45 PM, Jeff Darcy wrote:




On December 8, 2015 at 12:53:04 PM, Ira Cooper (i...@redhat.com) wrote:

Raghavendra Gowdappa writes:
I propose that we define a "compound op" that contains ops.

Within each op, there are fields that can be "inherited" from the
previous op, via use of a sentinel value.

Sentinel is -1, for all of these examples.

So:

LOOKUP (1, "foo") (Sets the gfid value to be picked up by
compounding, 1
is the root directory, as a gfid, by convention.)
OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound value.)
WRITE(-1, "foo", 3) (Uses the glfd compound value.)
CLOSE(-1) (Uses the glfd compound value)


So, basically, what the programming-language types would call futures
and promises.  It’s a good and well studied concept, which is necessary
to solve the second-order problem of how to specify an argument in
sub-operation N+1 that’s not known until sub-operation N completes.

To be honest, some of the highly general approaches suggested here scare
me too.  Wrapping up the arguments for one sub-operation in xdata for
another would get pretty hairy if we ever try to go beyond two
sub-operations and have to nest sub-operation #3’s args within
sub-operation #2’s xdata which is itself encoded within sub-operation
#1’s xdata.  There’s also not much clarity about how to handle errors in
that model.  Encoding N sub-operations’ arguments in a linear structure
as Shyam proposes seems a bit cleaner that way.  If I were to continue
down that route I’d suggest just having start_compound and end-compound
fops, plus an extra field (or by-convention xdata key) that either the
client-side or server-side translator could use to build whatever
structure it wants and schedule sub-operations however it wants.

However, I’d be even more comfortable with an even simpler approach that
avoids the need to solve what the database folks (who have dealt with
complex transactions for years) would tell us is a really hard problem.
Instead of designing for every case we can imagine, let’s design for the
cases that we know would be useful for improving performance. Open plus
read/write plus close is an obvious one.  Raghavendra mentions
create+inodelk as well.  For each of those, we can easily define a
structure that contains the necessary fields, we don’t need a
client-side translator, and the server-side translator can take care of
“forwarding” results from one sub-operation to the next.  We could even
use GF_FOP_IPC to prototype this.  If we later find that the number of
“one-off” compound requests is growing too large, then at least we’ll
have some experience to guide our design of a more general alternative.
Right now, I think we’re trying to look further ahead than we can see
clearly.

Yes Agree. This makes implementation on the client side simpler as well.
So it is welcome.

Just updating the solution.
1) New RPCs are going to be implemented.
2) client stack will use these new fops.
3) On the server side we have server xlator implementing these new fops
to decode the RPC request then resolve_resume and
compound-op-receiver(Better name for this is welcome) which sends one op
after other and send compound fop response.

List of compound fops identified so far:
Swift/S3:
PUT: creat(), write()s, setxattr(), fsync(), close(), rename()

Dht:
mkdir + inodelk

Afr:
xattrop+writev, xattrop+unlock to begin with.

Could everyone who needs compound fops add to this list?

I see that Niels is back on 14th. Does anyone else know the list of
compound fops he has in mind?

From the discussions we had with Niels regarding the kerberos support 
on GlusterFS, I think below are the set of compound fops which are required.


set_uid +
set_gid +
set_lkowner (or kerberos principal name) +
actual_fop

Also gfapi does lookup (first time/to refresh inode) before performing 
actual fops most of the times. It may really help if we can club such fops -


LOOKUP + FOP (OPEN etc)

Coming to the design proposed, I agree with Shyam, Ira and Jeff's 
thoughts. Defining different compound fops for each specific set of 
operations and wrapping up those arguments in xdata seem rather complex 
and difficult to maintain going further. Having being worked with NFS, 
may I suggest why not we follow (or in similar lines)  the approach 
being taken by NFS protocol to define and implement compound procedures.


   The basic structure of the NFS COMPOUND procedure is:

   +-+--++---+---+---+--
   | tag | minorversion | numops | op + args | op + args | op + args |
   +-+--++---+---+---+--

   and the reply's structure is:

  ++-++---+--
  |last status | tag | numres | status + op + results |
  ++-++---+--

Each compound procedure will 

Re: [Gluster-devel] Upstream regression crash : https://build.gluster.org/job/rackspace-regression-2GB-triggered/16191/consoleFull

2015-11-26 Thread Soumya Koduri

Below are the findings from the core and the logs

1) [2015-11-25 19:06:41.592905] E 
[crypt.c:4298:master_set_master_vol_key] 0-patchy-crypt: FATAL: missing 
master key


xlator_init() of crypt xlator fails, which I assume gets loaded when 
features.encryption is on (which the below mentioned .t test does)


Request anyone familiar with crypt xlator to take a look at it.

This results in the shutdown of NFS (glusterfsd) process calling 
cleanup_an_exit()



2) There is a crash in libgcc

Thread 1 (LWP 11584):
#0  0x7f922ad34867 in ?? () from ./lib64/libgcc_s.so.1
#1  0x7f922ad35119 in _Unwind_Backtrace () from ./lib64/libgcc_s.so.1
#2  0x7f923660b936 in backtrace () from ./lib64/libc.so.6
---Type  to continue, or q  to quit---
#3  0x7f92379abf73 in _gf_msg_backtrace_nomem (level=GF_LOG_ALERT, 
stacksize=200)
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/logging.c:1090

#4  0x7f92379b1d38 in gf_print_trace (signum=11, ctx=0x2331010)
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/common-utils.c:740

#5  0x004098d6 in glusterfsd_print_trace (signum=11)
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:2033

#6  
#7  0x7f9229eb8cec in ?? ()
#8  0x7f9236c8ba51 in start_thread () from ./lib64/libpthread.so.0
#9  0x7f92365f593d in clone () from ./lib64/libc.so.6
(gdb)

This looks similar to the issue pointed earlier by Kotresh [1]

As mentioned in the last mail of that thread, [2] doesn't seem to have 
fixed it completely.


[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10866
[2] http://review.gluster.org/#/c/10417/


Thanks,
Soumya

On 11/26/2015 10:43 AM, Nithya Balachandran wrote:

Hi,


The test 
tests/bugs/snapshot/bug-1140162-file-snapshot-features-encrypt-opts-validation.t
 has failed with a core.Can you please take a look?


The NFS log says:
gluster/02c803ff8630bd12cc8dc9dc043a6103.socket)
[2015-11-25 19:06:41.561937] I [MSGID: 101190] 
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1
[2015-11-25 19:06:41.566469] I [rpcsvc.c:2211:rpcsvc_set_outstanding_rpc_limit] 
0-rpc-service: Configured rpc.outstanding-rpc-limit with value 16
[2015-11-25 19:06:41.568896] I [MSGID: 112153] [mount3.c:3924:mnt3svc_init] 
0-nfs-mount: Exports auth has been disabled!
[2015-11-25 19:06:41.592393] I [rpc-drc.c:689:rpcsvc_drc_init] 0-rpc-service: 
DRC is turned OFF
[2015-11-25 19:06:41.592441] I [MSGID: 112110] [nfs.c:1513:init] 0-nfs: NFS 
service started
[2015-11-25 19:06:41.592905] E [crypt.c:4298:master_set_master_vol_key] 
0-patchy-crypt: FATAL: missing master key
[2015-11-25 19:06:41.592928] E [MSGID: 101019] [xlator.c:429:xlator_init] 
0-patchy-crypt: Initialization of volume 'patchy-crypt' failed, review your 
volfile again
[2015-11-25 19:06:41.592953] E [MSGID: 101066] 
[graph.c:324:glusterfs_graph_init] 0-patchy-crypt: initializing translator 
failed
[2015-11-25 19:06:41.592976] E [MSGID: 101176] 
[graph.c:670:glusterfs_graph_activate] 0-graph: init failed
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-11-25 19:06:41
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-11-25 19:06:41
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8dev
[2015-11-25 19:06:41.594542] W [glusterfsd.c:1231:cleanup_and_exit] 
(-->/build/install/sbin/glusterfs(mgmt_getspec_cbk+0x34d) [0x40e71d] -->/b


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Caching support in glusterfs

2015-11-26 Thread Soumya Koduri



On 11/26/2015 03:35 PM, Avik Sil wrote:



On Tuesday 24 November 2015 09:58 PM, Vijay Bellur wrote:



- Original Message -

From: "Avik Sil" 
To: gluster-devel@gluster.org
Sent: Tuesday, November 24, 2015 6:47:44 AM
Subject: [Gluster-devel] Caching support in glusterfs

While searching for caching support in glusterfs I stumbled upon this
link:
http://www.gluster.org/community/documentation/index.php/Features/caching


But I didn't get much info from it. What is plan ahead?



What aspects of caching support would you be interested in?


We'll be interested in client-side caching support. As Jeff mentioned,
can you please elaborate more on supporting caching in Ganesha and Samba?


As you may be aware of latest versions of NFS/SMB protocol have added 
the support for Delegation/Leases which shall guarantee certain 
semantics to their clients with respect to file sharing. Protocol 
clients thus are guaranteed cache consistency and can do aggressive data 
caching.


There is on-going effort to support such Leases on Gluster.

Apart from that, NFS-Ganesha server currently does attribute caching of 
the files and directories being accessed. I am not sure if there is any 
caching being done by SMB server. Request others to comment.


Thanks,
Soumya



Regards,
Avik
HGST E-mail Confidentiality Notice & Disclaimer:
This e-mail and any files transmitted with it may contain confidential
or legally privileged information of HGST and are intended solely for
the use of the individual or entity to which they are addressed. If you
are not the intended recipient, any disclosure, copying, distribution or
any action taken or omitted to be taken in reliance on it, is
prohibited.  If you have received this e-mail in error, please notify
the sender immediately and delete the e-mail in its entirety from your
system.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Logging improvements needed in nfs-ganesha

2015-11-24 Thread Soumya Koduri

Hi Sac,

While we understand the intent of this mail, please note that most of 
the operations performed by ganesha related CLI are executed by the 
runner threads. AFAIK, apart from the return status, we cannot read any 
error messages from these threads (request glusterd team to confirm that).


In addition, since ganesha setup related operations are performed by 
multiple tools or services, the error messages also gets logged at 
multiple places. For example,


HA script logs at - /var/log/messages'
nfs-ganesha service logs at - /var/log/ganesha.log
pacemaker logs - /var/log/pacemaker.log
pcsd logs - /var/log/pcsd.log

We have already documented steps to refer to for troubleshooting 
NFS-Ganesha related issues. But it hasn't been ported to upstream yet.


IMO, it may be good to post those FAQ in upstream as well and maybe 
provide the link in the log messages to refer to. Any thoughts?


Thanks,
Soumya

On 11/23/2015 05:25 PM, Sachidananda URS wrote:

Hi,

Recently while we came across some messages in log files which said
"Please see log files for details". We encountered these while trying to
setup NFS-Ganesha, the messages are very vague and doesn't add any value.

I've raised a bug for this:

https://bugzilla.redhat.com/show_bug.cgi?id=1284449

We need quite a bit of improvement in the logs. For example, on console
message is printed:

msg: volume set: failed: Failed to create NFS-Ganesha export config file.

It would be great if the logs mentioned why it failed to create export
config file.

-sac.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


  1   2   >