Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Vijay Bellur

On 05/19/2015 12:21 PM, Raghavendra Gowdappa wrote:






Yes, this is a possible scenario. There is a finite time window between,

1. Querying the size of a directory. In other words checking whether
current
write can be allowed
2. The effect of this write getting reflected in size of all the parent
directories of a file till root

If 1 and 2 were atomic, another parallel write which could've exceed the
quota-limit could not have slipped through. Unfortunately, in the current
scheme of things they are not atomic. Now there can be parallel writes in
this test case because of nfs-client and/or glusterfs write-back (though
we've one single threaded application - dd - running). One way of testing
this hypothesis is to disable nfs and glusterfs write-back and run the same
(unmodified) test and the test should succeed always (dd should fail). To
disable write-back in nfs you can use noac option while mounting.

The situation becomes worse in real-life scenarios because of parallelism
involved at many layers:

1. multiple applications, each possibly being multithreaded writing to
possibly many/or single file(s) in a quota subtree
2. write-back in NFS-client and glusterfs
3. Multiple bricks holding files of a quota-subtree. Each brick processing
simultaneously many write requests through io-threads.


4. Background accounting of directory sizes _after_ a write is complete.



I've tried in past to fix the issue, though unsuccessfully. It seems to me
that one effective strategy is to make enforcement and updation of size of
parents atomic. But if we do that we end up adding latency of accounting to
latency of fop. Other options can be explored. But, our Quota functionality
requirements allow a buffer of 10% while enforcing limits. So, this issue
has not been high on our priority list till now. So, our tests should also
expect failures allowing for this 10% buffer.


Since most of our tests are a single instance of single threaded dd running on 
a single mount, if the hypothesis turns out true, we can turn off nfs-client 
and glusterfs write-back in all tests related to Quota. Comments?



Even with write-behind enabled, dd should get a failure upon close() if 
quota were to return EDQUOT for any of the writes. I suspect that 
flush-behind being enabled by default in write-behind can mask a failure 
for close(). Disabling flush-behind in the tests might take care of 
fixing the tests.


It would be good to have nfs + quota coverage in the tests. So let us 
not disable nfs tests for quota.


Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Vijaikumar M



On Tuesday 19 May 2015 06:13 AM, Shyam wrote:

On 05/18/2015 07:05 PM, Shyam wrote:

On 05/18/2015 03:49 PM, Shyam wrote:

On 05/18/2015 10:33 AM, Vijay Bellur wrote:

The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t
which did not have an owner, and so I took a stab at it and below are
the results.

I also think failure in ./tests/bugs/quota/bug-1038598.t is the same as
the observation below.

NOTE: Anyone with better knowledge of Quota can possibly chip in as to
what should we expect in this case and how to correct the expectation
from these test cases.

(Details of ./tests/bugs/distribute/bug-1161156.t)
1) Failure is in TEST #20
Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k
count=10240 conv=fdatasync

2) The above line is expected to fail (i.e dd is expected to fail) as,
the set quota is 20MB and we are attempting to exceed it by another 5MB
at this point in the test case.

3) The failure is easily reproducible in my laptop, 2/10 times

4) On debugging, I see that when the above dd succeeds (or the test
fails, which means dd succeeded in writing more than the set quota),
there are no write errors from the bricks or any errors on the final
COMMIT RPC call to NFS.

As a result the expectation of this test fails.

NOTE: Sometimes there is a write failure from one of the bricks (the
above test uses AFR as well), but AFR self healing kicks in and fixes
the problem, as expected, as the write succeeded on one of the 
replicas.

I add this observation, as the failed regression run logs, has some
EDQUOT errors reported in the client xlator, but only from one of the
client bricks, and there are further AFR self heal logs noted in the
logs.

5) When the test case succeeds the writes fail with EDQUOT as expected.
There are times when the quota is exceeded by say 1MB - 4.8MB, but the
test case still passes. Which means that, if we were to try to exceed
the quota by 1MB (instead of the 5MB as in the test case), this test
case may fail always.


Here is why I think this passes by quota sometime and not others making
this and the other test case mentioned below spurious.
- Each write is 256K from the client (that is what is sent over the 
wire)

- If more IO was queued by io-threads after passing quota checks, which
in this 5MB case requires 20 IOs to be queued (16 IOs could be active
in io-threads itself), we could end up writing more than the quota 
amount


So, if quota checks to see if a write is violating the quota, and let's
it through, and updates on the UNWIND the space used for future checks,
we could have more IO outstanding than what the quota allows, and as a
result allow such a larger write to pass through, considering IO threads
queue and active IOs as well. Would this be a fair assumption of how
quota works?

I believe this is what is happening in this case. Checking a fix on my
machine, and will post the same if it proves to be help the situation.


Posted a patch to fix the problem: http://review.gluster.org/#/c/10811/

There are arguably other ways to fix/overcome the same, this seemed 
apt for this test case though.






6) Note on dd with conv=fdatasync
As one of the fixes attempts to overcome this issue with the 
addition of

conv=fdatasync, wanted to cover that behavior here.

What the above parameter does is to send an NFS_COMMIT (which 
internally

becomes a flush FOP) at the end of writing the blocks to the NFS share.
This commit as a result triggers any pending writes for this file and
sends the flush to the brick, all of which succeeds at times, resulting
in the failure of the test case.

NOTE: In the TC ./tests/bugs/quota/bug-1038598.t the failed line is
pretty much in the same context (LINE 26: TEST ! dd if=/dev/zero
of=$M0/test_dir/file1.txt bs=1024k count=15 (expecting hard limit to be
exceeded and there are no write failures in the logs (which should be
expected with EDQUOT (122))).


Currently we are not accounting in-progress writes (It is bit 
complicated to account in-progress writes).
When a write is successful, the accounting for this is done by marker 
asynchronously. We can get other writes before the marker completes 
accounting the previously written size.
So there is small window where we exceed the quota limit. In the 
testcase we are attempting to write 5MB more, we may need to change this 
to write few more MBs.


Thanks,
Vijay



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Raghavendra Gowdappa


- Original Message -
 From: Shyam srang...@redhat.com
 To: gluster-devel@gluster.org
 Sent: Tuesday, May 19, 2015 6:13:06 AM
 Subject: Re: [Gluster-devel] Moratorium on new patch acceptance
 
 On 05/18/2015 07:05 PM, Shyam wrote:
  On 05/18/2015 03:49 PM, Shyam wrote:
  On 05/18/2015 10:33 AM, Vijay Bellur wrote:
 
  The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t
  which did not have an owner, and so I took a stab at it and below are
  the results.
 
  I also think failure in ./tests/bugs/quota/bug-1038598.t is the same as
  the observation below.
 
  NOTE: Anyone with better knowledge of Quota can possibly chip in as to
  what should we expect in this case and how to correct the expectation
  from these test cases.
 
  (Details of ./tests/bugs/distribute/bug-1161156.t)
  1) Failure is in TEST #20
  Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k
  count=10240 conv=fdatasync
 
  2) The above line is expected to fail (i.e dd is expected to fail) as,
  the set quota is 20MB and we are attempting to exceed it by another 5MB
  at this point in the test case.
 
  3) The failure is easily reproducible in my laptop, 2/10 times
 
  4) On debugging, I see that when the above dd succeeds (or the test
  fails, which means dd succeeded in writing more than the set quota),
  there are no write errors from the bricks or any errors on the final
  COMMIT RPC call to NFS.
 
  As a result the expectation of this test fails.
 
  NOTE: Sometimes there is a write failure from one of the bricks (the
  above test uses AFR as well), but AFR self healing kicks in and fixes
  the problem, as expected, as the write succeeded on one of the replicas.
  I add this observation, as the failed regression run logs, has some
  EDQUOT errors reported in the client xlator, but only from one of the
  client bricks, and there are further AFR self heal logs noted in the
  logs.
 
  5) When the test case succeeds the writes fail with EDQUOT as expected.
  There are times when the quota is exceeded by say 1MB - 4.8MB, but the
  test case still passes. Which means that, if we were to try to exceed
  the quota by 1MB (instead of the 5MB as in the test case), this test
  case may fail always.
 
  Here is why I think this passes by quota sometime and not others making
  this and the other test case mentioned below spurious.
  - Each write is 256K from the client (that is what is sent over the wire)
  - If more IO was queued by io-threads after passing quota checks, which
  in this 5MB case requires 20 IOs to be queued (16 IOs could be active
  in io-threads itself), we could end up writing more than the quota amount
 
  So, if quota checks to see if a write is violating the quota, and let's
  it through, and updates on the UNWIND the space used for future checks,
  we could have more IO outstanding than what the quota allows, and as a
  result allow such a larger write to pass through, considering IO threads
  queue and active IOs as well. Would this be a fair assumption of how
  quota works?

Yes, this is a possible scenario. There is a finite time window between,

1. Querying the size of a directory. In other words checking whether current 
write can be allowed
2. The effect of this write getting reflected in size of all the parent 
directories of a file till root

If 1 and 2 were atomic, another parallel write which could've exceed the 
quota-limit could not have slipped through. Unfortunately, in the current 
scheme of things they are not atomic. Now there can be parallel writes in this 
test case because of nfs-client and/or glusterfs write-back (though we've one 
single threaded application - dd - running). One way of testing this hypothesis 
is to disable nfs and glusterfs write-back and run the same (unmodified) test 
and the test should succeed always (dd should fail). To disable write-back in 
nfs you can use noac option while mounting.

The situation becomes worse in real-life scenarios because of parallelism 
involved at many layers:

1. multiple applications, each possibly being multithreaded writing to possibly 
many/or single file(s) in a quota subtree
2. write-back in NFS-client and glusterfs
3. Multiple bricks holding files of a quota-subtree. Each brick processing 
simultaneously many write requests through io-threads.

I've tried in past to fix the issue, though unsuccessfully. It seems to me that 
one effective strategy is to make enforcement and updation of size of parents 
atomic. But if we do that we end up adding latency of accounting to latency of 
fop. Other options can be explored. But, our Quota functionality requirements 
allow a buffer of 10% while enforcing limits. So, this issue has not been high 
on our priority list till now. So, our tests should also expect failures 
allowing for this 10% buffer.

 
  I believe this is what is happening in this case. Checking a fix on my
  machine, and will post the same if it proves to be help the situation.
 
 Posted a patch to 

Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Raghavendra Gowdappa


- Original Message -
 From: Raghavendra Gowdappa rgowd...@redhat.com
 To: Shyam srang...@redhat.com
 Cc: gluster-devel@gluster.org
 Sent: Tuesday, May 19, 2015 11:46:19 AM
 Subject: Re: [Gluster-devel] Moratorium on new patch acceptance
 
 
 
 - Original Message -
  From: Shyam srang...@redhat.com
  To: gluster-devel@gluster.org
  Sent: Tuesday, May 19, 2015 6:13:06 AM
  Subject: Re: [Gluster-devel] Moratorium on new patch acceptance
  
  On 05/18/2015 07:05 PM, Shyam wrote:
   On 05/18/2015 03:49 PM, Shyam wrote:
   On 05/18/2015 10:33 AM, Vijay Bellur wrote:
  
   The etherpad did not call out, ./tests/bugs/distribute/bug-1161156.t
   which did not have an owner, and so I took a stab at it and below are
   the results.
  
   I also think failure in ./tests/bugs/quota/bug-1038598.t is the same as
   the observation below.
  
   NOTE: Anyone with better knowledge of Quota can possibly chip in as to
   what should we expect in this case and how to correct the expectation
   from these test cases.
  
   (Details of ./tests/bugs/distribute/bug-1161156.t)
   1) Failure is in TEST #20
   Failed line: TEST ! dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=1k
   count=10240 conv=fdatasync
  
   2) The above line is expected to fail (i.e dd is expected to fail) as,
   the set quota is 20MB and we are attempting to exceed it by another 5MB
   at this point in the test case.
  
   3) The failure is easily reproducible in my laptop, 2/10 times
  
   4) On debugging, I see that when the above dd succeeds (or the test
   fails, which means dd succeeded in writing more than the set quota),
   there are no write errors from the bricks or any errors on the final
   COMMIT RPC call to NFS.
  
   As a result the expectation of this test fails.
  
   NOTE: Sometimes there is a write failure from one of the bricks (the
   above test uses AFR as well), but AFR self healing kicks in and fixes
   the problem, as expected, as the write succeeded on one of the replicas.
   I add this observation, as the failed regression run logs, has some
   EDQUOT errors reported in the client xlator, but only from one of the
   client bricks, and there are further AFR self heal logs noted in the
   logs.
  
   5) When the test case succeeds the writes fail with EDQUOT as expected.
   There are times when the quota is exceeded by say 1MB - 4.8MB, but the
   test case still passes. Which means that, if we were to try to exceed
   the quota by 1MB (instead of the 5MB as in the test case), this test
   case may fail always.
  
   Here is why I think this passes by quota sometime and not others making
   this and the other test case mentioned below spurious.
   - Each write is 256K from the client (that is what is sent over the wire)
   - If more IO was queued by io-threads after passing quota checks, which
   in this 5MB case requires 20 IOs to be queued (16 IOs could be active
   in io-threads itself), we could end up writing more than the quota amount
  
   So, if quota checks to see if a write is violating the quota, and let's
   it through, and updates on the UNWIND the space used for future checks,
   we could have more IO outstanding than what the quota allows, and as a
   result allow such a larger write to pass through, considering IO threads
   queue and active IOs as well. Would this be a fair assumption of how
   quota works?
 
 Yes, this is a possible scenario. There is a finite time window between,
 
 1. Querying the size of a directory. In other words checking whether current
 write can be allowed
 2. The effect of this write getting reflected in size of all the parent
 directories of a file till root
 
 If 1 and 2 were atomic, another parallel write which could've exceed the
 quota-limit could not have slipped through. Unfortunately, in the current
 scheme of things they are not atomic. Now there can be parallel writes in
 this test case because of nfs-client and/or glusterfs write-back (though
 we've one single threaded application - dd - running). One way of testing
 this hypothesis is to disable nfs and glusterfs write-back and run the same
 (unmodified) test and the test should succeed always (dd should fail). To
 disable write-back in nfs you can use noac option while mounting.
 
 The situation becomes worse in real-life scenarios because of parallelism
 involved at many layers:
 
 1. multiple applications, each possibly being multithreaded writing to
 possibly many/or single file(s) in a quota subtree
 2. write-back in NFS-client and glusterfs
 3. Multiple bricks holding files of a quota-subtree. Each brick processing
 simultaneously many write requests through io-threads.

4. Background accounting of directory sizes _after_ a write is complete.

 
 I've tried in past to fix the issue, though unsuccessfully. It seems to me
 that one effective strategy is to make enforcement and updation of size of
 parents atomic. But if we do that we end up adding latency of accounting to
 latency of fop. Other options can be 

Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Raghavendra Gowdappa


- Original Message -
 From: Vijay Bellur vbel...@redhat.com
 To: Raghavendra Gowdappa rgowd...@redhat.com, Shyam 
 srang...@redhat.com
 Cc: gluster-devel@gluster.org
 Sent: Tuesday, May 19, 2015 1:29:57 PM
 Subject: Re: [Gluster-devel] Moratorium on new patch acceptance
 
 On 05/19/2015 12:21 PM, Raghavendra Gowdappa wrote:
 
 
 
  Yes, this is a possible scenario. There is a finite time window between,
 
  1. Querying the size of a directory. In other words checking whether
  current
  write can be allowed
  2. The effect of this write getting reflected in size of all the parent
  directories of a file till root
 
  If 1 and 2 were atomic, another parallel write which could've exceed the
  quota-limit could not have slipped through. Unfortunately, in the current
  scheme of things they are not atomic. Now there can be parallel writes in
  this test case because of nfs-client and/or glusterfs write-back (though
  we've one single threaded application - dd - running). One way of testing
  this hypothesis is to disable nfs and glusterfs write-back and run the
  same
  (unmodified) test and the test should succeed always (dd should fail). To
  disable write-back in nfs you can use noac option while mounting.
 
  The situation becomes worse in real-life scenarios because of parallelism
  involved at many layers:
 
  1. multiple applications, each possibly being multithreaded writing to
  possibly many/or single file(s) in a quota subtree
  2. write-back in NFS-client and glusterfs
  3. Multiple bricks holding files of a quota-subtree. Each brick
  processing
  simultaneously many write requests through io-threads.
 
  4. Background accounting of directory sizes _after_ a write is complete.
 
 
  I've tried in past to fix the issue, though unsuccessfully. It seems to
  me
  that one effective strategy is to make enforcement and updation of size
  of
  parents atomic. But if we do that we end up adding latency of accounting
  to
  latency of fop. Other options can be explored. But, our Quota
  functionality
  requirements allow a buffer of 10% while enforcing limits. So, this issue
  has not been high on our priority list till now. So, our tests should
  also
  expect failures allowing for this 10% buffer.
 
  Since most of our tests are a single instance of single threaded dd running
  on a single mount, if the hypothesis turns out true, we can turn off
  nfs-client and glusterfs write-back in all tests related to Quota.
  Comments?
 
 
 Even with write-behind enabled, dd should get a failure upon close() if
 quota were to return EDQUOT for any of the writes. I suspect that
 flush-behind being enabled by default in write-behind can mask a failure
 for close(). Disabling flush-behind in the tests might take care of
 fixing the tests.

No, my suggestion was aimed at not having parallel writes. In this case quota 
won't even fail the writes with EDQUOT because of reasons explained above. Yes, 
we need to disable flush-behind along with this so that errors are delivered to 
application.

 
 It would be good to have nfs + quota coverage in the tests. So let us
 not disable nfs tests for quota.

The suggestion was to continue using nfs, but preventing nfs-clients from using 
a write-back cache.

 
 Thanks,
 Vijay
 
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Regarding the size parameter in readdir(p) fops

2015-05-19 Thread Krutika Dhananjay
Hi, 

The following patch fixes an issue with readdir(p) in shard xlator: 
http://review.gluster.org/#/c/10809/ whose details can be found in the commit 
message. 

One side effect of this is that from shard xlator, the size of the dirents list 
returned to the translators above it could be greater than the 
requested size in the wind path (thanks to Pranith for pointing this out during 
the review of this patch), with the worst-case scenario returning (2 * 
requested_size) worth of entries. 
For example, if fuse requests readdirp with 128k as the size, in the worst 
case, 256k worth of entries could be unwound in return. 
How important is it to strictly adhere to this size limit in each iteration of 
readdir(p)? And what are the repercussions of such behavior? 

Note: 
I tried my hand at simulating this issue on my volume but I have so far been 
unsuccessful at hitting this test case. 
Creating large number of files on the root of a sharded volume, triggering 
readdirp on it until .shard becomes the last entry read in a given iteration, 
winding the next 
readdirp from shard xlator, and then concatenating the results of two readdirps 
into one is proving to be an exercise in futility. 
Therefore, I am asking this question here to know what could happen in theory 
in such situations. 

-Krutika 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Jeff Darcy
 No, my suggestion was aimed at not having parallel writes. In this case quota
 won't even fail the writes with EDQUOT because of reasons explained above.
 Yes, we need to disable flush-behind along with this so that errors are
 delivered to application.

Would conv=sync help here?  That should prevent any kind of write parallelism.
If it doesn't, I'd say that's a true test failure somewhere in our stack.  A
similar possibility would be to invoke dd multiple times with oflag=append.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Jenkins will now abort regression-tests (on Linux) after 4 hours runtime

2015-05-19 Thread Niels de Vos
I just installed the Build-timeout Plugin in our Jenkins environment.
This plugin can be used to configure a timeout for jobs that take too
long to complete. There is at least one test that seems to take much
more time on occasion, and after several hours it would still not
complete:

http://build.gluster.org/job/rackspace-regression-2GB-triggered/9241/console

The regression job for Linux tests now has a timeout of 4 hours. Other
jobs can be configured with a timeout like this too:

 - open the job configuration screen
 - scroll to Build Environment
 - [x] Abort the build if it's stuck
 - pick the wanted options, time-out action should be set to fail
 - click the [save] button on the bottom

Cheers,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] REMINDER: Gluster Community Bug Triage meeting today at 12:00 UTC (~in 20 minutes)

2015-05-19 Thread Niels de Vos
Hi all,

This meeting is scheduled for anyone that is interested in learning more
about, or assisting with the Bug Triage.

Meeting details:
- location: #gluster-meeting on Freenode IRC
( https://webchat.freenode.net/?channels=gluster-meeting )
- date: every Tuesday
- time: 12:00 UTC
(in your terminal, run: date -d 12:00 UTC)
- agenda: https://public.pad.fsfe.org/p/gluster-bug-triage

Currently the following items are listed:
* Roll Call
* Status of last weeks action items
* Group Triage
* Open Floor

The last two topics have space for additions. If you have a suitable bug
or topic to discuss, please add it to the agenda.

Appreciate your participation.

Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Raghavendra G
On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy jda...@redhat.com wrote:

  No, my suggestion was aimed at not having parallel writes. In this case
 quota
  won't even fail the writes with EDQUOT because of reasons explained
 above.
  Yes, we need to disable flush-behind along with this so that errors are
  delivered to application.

 Would conv=sync help here?  That should prevent any kind of write
 parallelism.


An strace of dd shows that

* fdatasync is issued only once at the end of all writes when conv=fdatasync
* for some strange reason no fsync or fdatasync is issued at all when
conv=sync

So, using conv=fdatasync in the test cannot prevent write-parallelism
induced by write-behind. Parallelism would've been prevented only if dd had
issued fdatasync after each write or opened the file with O_SYNC.

If it doesn't, I'd say that's a true test failure somewhere in our stack.  A
 similar possibility would be to invoke dd multiple times with oflag=append.


Yes, appending writes curb parallelism (at least in glusterfs, but not sure
how nfs client behaves) and hence can be used  as an alternative solution.

On a slightly unrelated note flush-behind is immaterial in this test since
fdatasync is anyways acting as a barrier.

___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel




-- 
Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Raghavendra G
After discussion with Vijaykumar mallikarjuna and other inputs in this
thread, we are proposing all quota tests to comply to following criteria:

* use dd always with oflag=append (to make sure there are no parallel
writes) and conv=fdatasync (to make sure errors, if any are delivered to
application. Turning off flush-behind is optional since fdatasync acts as a
barrier)

OR

* turn off write-behind in nfs client and glusterfs server.

What do you people think is a better test scenario?

Also, we don't have confirmation on the RCA that parallel writes are indeed
the culprits. We are trying to reproduce the issue locally. @Shyam, it
would be helpful if you can confirm the hypothesis :).

regards,
Raghavendra.

On Tue, May 19, 2015 at 5:27 PM, Raghavendra G raghaven...@gluster.com
wrote:



 On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy jda...@redhat.com wrote:

  No, my suggestion was aimed at not having parallel writes. In this case
 quota
  won't even fail the writes with EDQUOT because of reasons explained
 above.
  Yes, we need to disable flush-behind along with this so that errors are
  delivered to application.

 Would conv=sync help here?  That should prevent any kind of write
 parallelism.


 An strace of dd shows that

 * fdatasync is issued only once at the end of all writes when
 conv=fdatasync
 * for some strange reason no fsync or fdatasync is issued at all when
 conv=sync

 So, using conv=fdatasync in the test cannot prevent write-parallelism
 induced by write-behind. Parallelism would've been prevented only if dd had
 issued fdatasync after each write or opened the file with O_SYNC.

 If it doesn't, I'd say that's a true test failure somewhere in our stack.
 A
 similar possibility would be to invoke dd multiple times with
 oflag=append.


 Yes, appending writes curb parallelism (at least in glusterfs, but not
 sure how nfs client behaves) and hence can be used  as an alternative
 solution.

 On a slightly unrelated note flush-behind is immaterial in this test since
 fdatasync is anyways acting as a barrier.

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel




 --
 Raghavendra G




-- 
Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Raghavendra G
On Tue, May 19, 2015 at 5:40 PM, Raghavendra G raghaven...@gluster.com
wrote:

 After discussion with Vijaykumar mallikarjuna and other inputs in this
 thread, we are proposing all quota tests to comply to following criteria:

 * use dd always with oflag=append (to make sure there are no parallel
 writes) and conv=fdatasync (to make sure errors, if any are delivered to
 application. Turning off flush-behind is optional since fdatasync acts as a
 barrier)

 OR

 * turn off write-behind in nfs client and glusterfs server.


s/glusterfs server/glusterfs nfs server.



 What do you people think is a better test scenario?

 Also, we don't have confirmation on the RCA that parallel writes are
 indeed the culprits. We are trying to reproduce the issue locally. @Shyam,
 it would be helpful if you can confirm the hypothesis :).

 regards,
 Raghavendra.

 On Tue, May 19, 2015 at 5:27 PM, Raghavendra G raghaven...@gluster.com
 wrote:



 On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy jda...@redhat.com wrote:

  No, my suggestion was aimed at not having parallel writes. In this
 case quota
  won't even fail the writes with EDQUOT because of reasons explained
 above.
  Yes, we need to disable flush-behind along with this so that errors are
  delivered to application.

 Would conv=sync help here?  That should prevent any kind of write
 parallelism.


 An strace of dd shows that

 * fdatasync is issued only once at the end of all writes when
 conv=fdatasync
 * for some strange reason no fsync or fdatasync is issued at all when
 conv=sync

 So, using conv=fdatasync in the test cannot prevent write-parallelism
 induced by write-behind. Parallelism would've been prevented only if dd had
 issued fdatasync after each write or opened the file with O_SYNC.

 If it doesn't, I'd say that's a true test failure somewhere in our
 stack.  A
 similar possibility would be to invoke dd multiple times with
 oflag=append.


 Yes, appending writes curb parallelism (at least in glusterfs, but not
 sure how nfs client behaves) and hence can be used  as an alternative
 solution.

 On a slightly unrelated note flush-behind is immaterial in this test
 since fdatasync is anyways acting as a barrier.

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel




 --
 Raghavendra G




 --
 Raghavendra G




-- 
Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Jeff Darcy
 * fdatasync is issued only once at the end of all writes when conv=fdatasync
 * for some strange reason no fsync or fdatasync is issued at all when
 conv=sync

That's because of my typo.  I meant oflag=sync, not conv=sync.  Sorry.

 So, using conv=fdatasync in the test cannot prevent write-parallelism induced
 by write-behind. Parallelism would've been prevented only if dd had issued
 fdatasync after each write or opened the file with O_SYNC.

See above.  I just checked with strace, and oflag=sync does cause the output
file to be opened with O_SYNC.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Meeting minutes of todays Gluster Community Bug Triage meeting

2015-05-19 Thread Raghavendra Talur

On Tuesday 19 May 2015 05:08 PM, Niels de Vos wrote:

Hi all,

This meeting is scheduled for anyone that is interested in learning more
about, or assisting with the Bug Triage.

Meeting details:
- location: #gluster-meeting on Freenode IRC
 ( https://webchat.freenode.net/?channels=gluster-meeting )
- date: every Tuesday
- time: 12:00 UTC
 (in your terminal, run: date -d 12:00 UTC)
- agenda: https://public.pad.fsfe.org/p/gluster-bug-triage

Currently the following items are listed:
* Roll Call
* Status of last weeks action items
* Group Triage
* Open Floor

The last two topics have space for additions. If you have a suitable bug
or topic to discuss, please add it to the agenda.

Appreciate your participation.

Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




Minutes: 
http://meetbot.fedoraproject.org/gluster-meeting/2015-05-19/gluster-meeting.2015-05-19-12.01.html
Minutes (text): 
http://meetbot.fedoraproject.org/gluster-meeting/2015-05-19/gluster-meeting.2015-05-19-12.01.txt
Log: 
http://meetbot.fedoraproject.org/gluster-meeting/2015-05-19/gluster-meeting.2015-05-19-12.01.log.html


Meeting summary

Agenda: https://public.pad.fsfe.org/p/gluster-bug-triage (ndevos, 12:01:58)

Roll Call (ndevos, 12:02:03)
Group Triage (ndevos, 12:06:44)
Bugs for 3.4 got a comment, asking for retesting and updating of the 
version in case the problem exists on newer versions (ndevos, 12:07:51)
Bugs for 3.4 will get closed at the end of this month if there are no 
updates/corrections (ndevos, 12:08:24)

111 bugs were updated with the 3.4 re-confirm note (ndevos, 12:08:52)
there are 40 untriaged bugs since the last meeting: http://goo.gl/WuDQun 
(ndevos, 12:13:07)



Meeting ended at 13:34:56 UTC (full logs).

Action items

(none)


People present (lines said)

ndevos (70)
rafi (41)
kkeithley_ (24)
RaSTar (24)
zodbot (3)
rafi1 (1)





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Vijaikumar M



On Tuesday 19 May 2015 08:36 PM, Shyam wrote:

On 05/19/2015 08:10 AM, Raghavendra G wrote:

After discussion with Vijaykumar mallikarjuna and other inputs in this
thread, we are proposing all quota tests to comply to following 
criteria:


* use dd always with oflag=append (to make sure there are no parallel
writes) and conv=fdatasync (to make sure errors, if any are delivered to
application. Turning off flush-behind is optional since fdatasync acts
as a barrier)

OR

* turn off write-behind in nfs client and glusterfs server.

What do you people think is a better test scenario?

Also, we don't have confirmation on the RCA that parallel writes are
indeed the culprits. We are trying to reproduce the issue locally.
@Shyam, it would be helpful if you can confirm the hypothesis :).


Ummm... I thought we acknowledge that quota checks are done during the 
WIND and updated during UNWIND, and we have io threads doing in flight 
IOs (as well as possible IOs in io threads queue) and we have 256K 
writes in the case mentioned. Put together, in my head this forms a 
good RCA that we write more than needed due to the in flight IOs on 
the brick. We need to control the in flight IOs as a resolution for 
this from the application.


In terms of actual proof, we would need to instrument the code and 
check. When you say it does not fail for you, does the file stop once 
quota is reached or is a random size greater than quota? Which itself 
may explain or point to the RCA.


The basic thing needed from an application is,
- Sync IOs, so that there aren't too many in flight IOs and the 
application waits for each IO to complete
- Based on tests below if we keep block size in dd lower and use 
oflag=sync we can achieve the same, if we use higher block sizes we 
cannot


Test results:
1) noac:
  - NFS sends a COMMIT (internally translates to a flush) post each IO 
request (NFS WRITES are still with the UNSTABLE flag)
  - Ensures prior IO is complete before next IO request is sent (due 
to waiting on the COMMIT)
  - Fails if IO size is large, i.e in the test case being discussed I 
changed the dd line that was failing as TEST ! dd if=/dev/zero 
of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync and this 
fails at times, as the writes here are sent as 256k chunks to the 
server and we still see the same behavior
  - noac + performance.nfs.flush-behind: off + 
performance.flush-behind: off + performance.nfs.strict-write-ordering: 
on + performance.strict-write-ordering: on + 
performance.nfs.write-behind: off + performance.write-behind: off
- Still see similar failures, i.e at times 10MB file is created 
successfully in the modified dd command above


Overall, the switch works, but not always. If we are to use this 
variant then we need to announce that all quota tests using dd not try 
to go beyond the quota limit set in a single IO from dd.


2) oflag=sync:
  - Exactly the same behavior as above.

3) Added all (and possibly the kitches sink) to the test case, as 
attached, and still see failures,
  - Yes, I have made the test fail intentionally (of sorts) by using 
3M per dd IO and 2 IOs to go beyond the quota limit.
  - The intention is to demonstrate that we still get parallel IOs 
from NFS client
  - The test would work if we reduce the block size per IO (reliably 
is a border condition here, and we need specific rules like block size 
and how many blocks before we state quota is exceeded etc.)
  - The test would work if we just go beyond the quota, and then check 
a separate dd instance as being able to *not* exceed the quota. Which 
is why I put up that patch.


What next?


Hi Shyam,

I tried running the test with dd option 'oflag=append' and didn't see 
the issue.Can you please try this option and see if it works?


Thanks,
Vijay



regards,
Raghavendra.

On Tue, May 19, 2015 at 5:27 PM, Raghavendra G raghaven...@gluster.com
mailto:raghaven...@gluster.com wrote:



On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy jda...@redhat.com
mailto:jda...@redhat.com wrote:

 No, my suggestion was aimed at not having parallel writes. 
In this case quota
 won't even fail the writes with EDQUOT because of reasons 
explained above.
 Yes, we need to disable flush-behind along with this so 
that errors are

 delivered to application.

Would conv=sync help here?  That should prevent any kind of
write parallelism.


An strace of dd shows that

* fdatasync is issued only once at the end of all writes when
conv=fdatasync
* for some strange reason no fsync or fdatasync is issued at all
when conv=sync

So, using conv=fdatasync in the test cannot prevent
write-parallelism induced by write-behind. Parallelism would've been
prevented only if dd had issued fdatasync after each write or opened
the file with O_SYNC.

If it doesn't, I'd say that's a true test failure somewhere in
our stack.  A
similar possibility would be to 

Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Shyam

On 05/19/2015 08:10 AM, Raghavendra G wrote:

After discussion with Vijaykumar mallikarjuna and other inputs in this
thread, we are proposing all quota tests to comply to following criteria:

* use dd always with oflag=append (to make sure there are no parallel
writes) and conv=fdatasync (to make sure errors, if any are delivered to
application. Turning off flush-behind is optional since fdatasync acts
as a barrier)

OR

* turn off write-behind in nfs client and glusterfs server.

What do you people think is a better test scenario?

Also, we don't have confirmation on the RCA that parallel writes are
indeed the culprits. We are trying to reproduce the issue locally.
@Shyam, it would be helpful if you can confirm the hypothesis :).


Ummm... I thought we acknowledge that quota checks are done during the 
WIND and updated during UNWIND, and we have io threads doing in flight 
IOs (as well as possible IOs in io threads queue) and we have 256K 
writes in the case mentioned. Put together, in my head this forms a good 
RCA that we write more than needed due to the in flight IOs on the 
brick. We need to control the in flight IOs as a resolution for this 
from the application.


In terms of actual proof, we would need to instrument the code and 
check. When you say it does not fail for you, does the file stop once 
quota is reached or is a random size greater than quota? Which itself 
may explain or point to the RCA.


The basic thing needed from an application is,
- Sync IOs, so that there aren't too many in flight IOs and the 
application waits for each IO to complete
- Based on tests below if we keep block size in dd lower and use 
oflag=sync we can achieve the same, if we use higher block sizes we cannot


Test results:
1) noac:
  - NFS sends a COMMIT (internally translates to a flush) post each IO 
request (NFS WRITES are still with the UNSTABLE flag)
  - Ensures prior IO is complete before next IO request is sent (due to 
waiting on the COMMIT)
  - Fails if IO size is large, i.e in the test case being discussed I 
changed the dd line that was failing as TEST ! dd if=/dev/zero 
of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync and this fails 
at times, as the writes here are sent as 256k chunks to the server and 
we still see the same behavior
  - noac + performance.nfs.flush-behind: off + 
performance.flush-behind: off + performance.nfs.strict-write-ordering: 
on + performance.strict-write-ordering: on + 
performance.nfs.write-behind: off + performance.write-behind: off
- Still see similar failures, i.e at times 10MB file is created 
successfully in the modified dd command above


Overall, the switch works, but not always. If we are to use this variant 
then we need to announce that all quota tests using dd not try to go 
beyond the quota limit set in a single IO from dd.


2) oflag=sync:
  - Exactly the same behavior as above.

3) Added all (and possibly the kitches sink) to the test case, as 
attached, and still see failures,
  - Yes, I have made the test fail intentionally (of sorts) by using 3M 
per dd IO and 2 IOs to go beyond the quota limit.
  - The intention is to demonstrate that we still get parallel IOs from 
NFS client
  - The test would work if we reduce the block size per IO (reliably is 
a border condition here, and we need specific rules like block size and 
how many blocks before we state quota is exceeded etc.)
  - The test would work if we just go beyond the quota, and then check 
a separate dd instance as being able to *not* exceed the quota. Which is 
why I put up that patch.


What next?



regards,
Raghavendra.

On Tue, May 19, 2015 at 5:27 PM, Raghavendra G raghaven...@gluster.com
mailto:raghaven...@gluster.com wrote:



On Tue, May 19, 2015 at 4:26 PM, Jeff Darcy jda...@redhat.com
mailto:jda...@redhat.com wrote:

 No, my suggestion was aimed at not having parallel writes. In this 
case quota
 won't even fail the writes with EDQUOT because of reasons explained 
above.
 Yes, we need to disable flush-behind along with this so that errors 
are
 delivered to application.

Would conv=sync help here?  That should prevent any kind of
write parallelism.


An strace of dd shows that

* fdatasync is issued only once at the end of all writes when
conv=fdatasync
* for some strange reason no fsync or fdatasync is issued at all
when conv=sync

So, using conv=fdatasync in the test cannot prevent
write-parallelism induced by write-behind. Parallelism would've been
prevented only if dd had issued fdatasync after each write or opened
the file with O_SYNC.

If it doesn't, I'd say that's a true test failure somewhere in
our stack.  A
similar possibility would be to invoke dd multiple times with
oflag=append.


Yes, appending writes curb parallelism (at least in glusterfs, but
not sure how nfs client behaves) and hence can be used  as an
alternative solution.

[Gluster-devel] Requesting reviews

2015-05-19 Thread Hari Gowtham
Hi,

requesting someone to review the patch.
http://review.gluster.org/#/c/9893/

Regards,
Hari.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Vijay Bellur

On 05/18/2015 08:03 PM, Vijay Bellur wrote:

On 05/16/2015 03:34 PM, Vijay Bellur wrote:



I will send daily status updates from Monday (05/18) about this so that
we are clear about where we are and what needs to be done to remove this
moratorium. Appreciate your help in having a clean set of regression
tests going forward!



We have made some progress since Saturday. The problem with glupy.t has
been fixed - thanks to Niels! All but following tests have developers
looking into them:

 ./tests/basic/afr/entry-self-heal.t

 ./tests/bugs/replicate/bug-976800.t

 ./tests/bugs/replicate/bug-1015990.t

 ./tests/bugs/quota/bug-1038598.t

 ./tests/basic/ec/quota.t

 ./tests/basic/quota-nfs.t

 ./tests/bugs/glusterd/bug-974007.t

Can submitters of these test cases or current feature owners pick these
up and start looking into the failures please? Do update the spurious
failures etherpad [1] once you pick up a particular test.


[1] https://public.pad.fsfe.org/p/gluster-spurious-failures



Update for today - all tests that are known to fail have owners. Thanks 
everyone for chipping in! I think we should be able to lift this 
moratorium and resume normal patch acceptance shortly.


Cheers,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Gluster Summit recordings

2015-05-19 Thread Justin Clift
(Didn't see this mentioned elsewhere)

The video recordings (using a tablet resting on the desk) for
the Gluster Summit sessions in Barcelona are here:

  https://www.youtube.com/channel/UCngUyL3KPYz8M2n7rDJWU0w

Thanks to Spot for providing the tablet for most of them, and
uploading them too. :)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-19 Thread Shyam

On 05/19/2015 11:23 AM, Vijaikumar M wrote:



On Tuesday 19 May 2015 08:36 PM, Shyam wrote:

On 05/19/2015 08:10 AM, Raghavendra G wrote:

After discussion with Vijaykumar mallikarjuna and other inputs in this
thread, we are proposing all quota tests to comply to following
criteria:

* use dd always with oflag=append (to make sure there are no parallel
writes) and conv=fdatasync (to make sure errors, if any are delivered to
application. Turning off flush-behind is optional since fdatasync acts
as a barrier)

OR

* turn off write-behind in nfs client and glusterfs server.

What do you people think is a better test scenario?

Also, we don't have confirmation on the RCA that parallel writes are
indeed the culprits. We are trying to reproduce the issue locally.
@Shyam, it would be helpful if you can confirm the hypothesis :).


Ummm... I thought we acknowledge that quota checks are done during the
WIND and updated during UNWIND, and we have io threads doing in flight
IOs (as well as possible IOs in io threads queue) and we have 256K
writes in the case mentioned. Put together, in my head this forms a
good RCA that we write more than needed due to the in flight IOs on
the brick. We need to control the in flight IOs as a resolution for
this from the application.

In terms of actual proof, we would need to instrument the code and
check. When you say it does not fail for you, does the file stop once
quota is reached or is a random size greater than quota? Which itself
may explain or point to the RCA.

The basic thing needed from an application is,
- Sync IOs, so that there aren't too many in flight IOs and the
application waits for each IO to complete
- Based on tests below if we keep block size in dd lower and use
oflag=sync we can achieve the same, if we use higher block sizes we
cannot

Test results:
1) noac:
  - NFS sends a COMMIT (internally translates to a flush) post each IO
request (NFS WRITES are still with the UNSTABLE flag)
  - Ensures prior IO is complete before next IO request is sent (due
to waiting on the COMMIT)
  - Fails if IO size is large, i.e in the test case being discussed I
changed the dd line that was failing as TEST ! dd if=/dev/zero
of=$N0/$mydir/newfile_2 *bs=10M* count=1 conv=fdatasync and this
fails at times, as the writes here are sent as 256k chunks to the
server and we still see the same behavior
  - noac + performance.nfs.flush-behind: off +
performance.flush-behind: off + performance.nfs.strict-write-ordering:
on + performance.strict-write-ordering: on +
performance.nfs.write-behind: off + performance.write-behind: off
- Still see similar failures, i.e at times 10MB file is created
successfully in the modified dd command above

Overall, the switch works, but not always. If we are to use this
variant then we need to announce that all quota tests using dd not try
to go beyond the quota limit set in a single IO from dd.

2) oflag=sync:
  - Exactly the same behavior as above.

3) Added all (and possibly the kitches sink) to the test case, as
attached, and still see failures,
  - Yes, I have made the test fail intentionally (of sorts) by using
3M per dd IO and 2 IOs to go beyond the quota limit.
  - The intention is to demonstrate that we still get parallel IOs
from NFS client
  - The test would work if we reduce the block size per IO (reliably
is a border condition here, and we need specific rules like block size
and how many blocks before we state quota is exceeded etc.)
  - The test would work if we just go beyond the quota, and then check
a separate dd instance as being able to *not* exceed the quota. Which
is why I put up that patch.

What next?


Hi Shyam,

I tried running the test with dd option 'oflag=append' and didn't see
the issue.Can you please try this option and see if it works?


Did that (in the attached script that I sent) and it still failed.

Please note:
- This dd command passes (or fails with EDQUOT)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=512 count=10240 
oflag=append oflag=sync conv=fdatasync
  - We can even drop append and fdatasync, as sync sends a commit per 
block written which is better for the test and quota enforcement, 
whereas fdatasync does one in the end and sometimes fails (with larger 
block sizes, say 1M)

  - We can change bs to [512 - 256k]

- This dd command fails (or writes all the data)
  - dd if=/dev/zero of=$N0/$mydir/newfile_2 bs=3M count=2 oflag=append 
oflag=sync conv=fdatasync


The reasoning is that when we write a larger block size, NFS sends in 
multiple 256k chunks to write and then sends the commit before the next 
block. As a result if we exceed quota in the *last block* that we are 
writing, we *may* fail. If we exceed quota in the last but one block we 
will pass.


Hope this shorter version explains it better.

(VijayM is educating me on quota (over IM), and it looks like the quota 
update happens as a synctask in the background, so post the flush (NFS 
commit) we may still have a race)


Post education solution:
-