[Gluster-devel] failures in tests/basic/tier/tier.t

2015-05-07 Thread Pranith Kumar Karampuri

Dan/Joseph,
 Could you look into it please.
[22:04:31] ./tests/basic/tier/tier.t ..
not ok 25 Got 1 instead of 0
not ok 26 Got 1 instead of 0
Failed 2/34 subtests
[22:04:31]

Test Summary Report
---
./tests/basic/tier/tier.t (Wstat: 0 Tests: 34 Failed: 2)
  Failed tests:  25-26
Files=1, Tests=34, 72 wallclock secs ( 0.02 usr  0.00 sys +  1.68 cusr  0.81 
csys =  2.51 CPU)
Result: FAIL
./tests/basic/tier/tier.t: bad status 1
./tests/basic/tier/tier.t: 1 new core files

http://build.gluster.org/job/rackspace-regression-2GB-triggered/8588/consoleFull

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] failure in tests/bugs/glusterd/bug-974007.t

2015-05-07 Thread Pranith Kumar Karampuri

Nitya,
  Seems like rebalance is not completing in this test? Could you 
take a look.

http://build.gluster.org/job/rackspace-regression-2GB-triggered/8595/consoleFull

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t

2015-05-07 Thread Pranith Kumar Karampuri


On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote:

Pranith,

The above snippet says that the volume has to be stopped before deleted. It 
also says that
volume-stop failed. I would look into glusterd logs to see why volume-stop 
failed,
cmd-history.log tells us only so much.


http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull
 has the logs. I didn't find much information. Please feel free to take a look. 
What can we add to the code so that this failure can be debugged better in 
future? Please at least add that much for now?

Pranith



HTH,
KP

- Original Message -

hi,
  Volume delete is failing without logging much about why it is
failing. Know anything about this?
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull)
1 [2015-05-06 13:09:58.311519]  : volume heal patchy statistics
heal-count : SUCCESS
0 [2015-05-06 13:09:58.534917]  : volume stop patchy : FAILED :
1 [2015-05-06 13:09:58.904333]  : volume delete patchy : FAILED :
Volume patchy has been started.Volume needs to be stopped before deletion.

Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failure in tests/basic/quota-nfs.t

2015-05-07 Thread Pranith Kumar Karampuri

hi Du,
   Please help with this one?

 * tests/basic/quota-nfs.t

 * Happens in: master

 * Being investigated by: ?

 * Tried to re-create it for more than an hour and it is not failing.

 * 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8625/consoleFull

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] failures in tests/bugs/distribute/bug-1161156.t

2015-05-07 Thread Pranith Kumar Karampuri

Du,
   This seems like a quota issue as well. Could you look into this one.
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8582/consoleFull

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious test failure in tests/basic/quota-anon-fd-nfs.t

2015-05-07 Thread Pranith Kumar Karampuri

hi,
  I compared the logs with a failure run and a successful run of 
the test based on the time stamps. Seems like it is not able to find the 
parent on which quota contribution is supposed to be updated as per the 
following logs:
[2015-05-04 04:02:13.537672] E 
[marker-quota.c:2870:mq_start_quota_txn_v2] 0-patchy-marker: 
contribution node list is empty (c22c0d82-2027-46b3-8bd6-278df1b39774)
[2015-05-04 04:02:14.904655] E 
[marker-quota.c:2870:mq_start_quota_txn_v2] 0-patchy-marker: 
contribution node list is empty (c22c0d82-2027-46b3-8bd6-278df1b39774)
[2015-05-04 04:02:16.228797] E 
[marker-quota.c:2870:mq_start_quota_txn_v2] 0-patchy-marker: 
contribution node list is empty (c22c0d82-2027-46b3-8bd6-278df1b39774)


Does that help? Seems like a bug in quota?

This is the test that fails in the file: TEST ! $(dirname $0)/quota 
$N0/$deep/new_file_2 1048576


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t

2015-05-07 Thread Pranith Kumar Karampuri


On 05/07/2015 02:53 PM, Krishnan Parthasarathi wrote:


- Original Message -

On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote:

Pranith,

The above snippet says that the volume has to be stopped before deleted. It
also says that
volume-stop failed. I would look into glusterd logs to see why volume-stop
failed,
cmd-history.log tells us only so much.

http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull
has the logs. I didn't find much information. Please feel free to take a
look. What can we add to the code so that this failure can be debugged
better in future? Please at least add that much for now?

Atin is already looking into this. Without the root cause, it's not useful to
speculate how we could help debugging this. As we root cause, I am sure we will
find things that we could have logged to reduce time to root cause. Does that 
make sense?
Cool. Could you please update the pad: 
https://public.pad.fsfe.org/p/gluster-spurious-failures with latest info 
on this issue.


Pranith



Pranith


HTH,
KP

- Original Message -

hi,
   Volume delete is failing without logging much about why it is
failing. Know anything about this?
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull)
 1 [2015-05-06 13:09:58.311519]  : volume heal patchy statistics
heal-count : SUCCESS
 0 [2015-05-06 13:09:58.534917]  : volume stop patchy : FAILED :
 1 [2015-05-06 13:09:58.904333]  : volume delete patchy : FAILED :
Volume patchy has been started.Volume needs to be stopped before deletion.

Pranith





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] regarding ./tests/bugs/replicate/bug-1015990.t

2015-05-07 Thread Pranith Kumar Karampuri
Sorry wrong test. Correct test is: tests/bugs/quota/bug-1035576.t 
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/8329/consoleFull)


Pranith
On 05/07/2015 01:53 PM, Pranith Kumar Karampuri wrote:

Seems like the file $M0/a/f is not healed based on the execution log.

Ravi,
 Could you please help. build log: 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t

2015-05-07 Thread Pranith Kumar Karampuri

hi,
Volume delete is failing without logging much about why it is 
failing. Know anything about this? 
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull)
  1 [2015-05-06 13:09:58.311519]  : volume heal patchy statistics 
heal-count : SUCCESS

  0 [2015-05-06 13:09:58.534917]  : volume stop patchy : FAILED :
  1 [2015-05-06 13:09:58.904333]  : volume delete patchy : FAILED : 
Volume patchy has been started.Volume needs to be stopped before deletion.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] quota test failures

2015-05-07 Thread Pranith Kumar Karampuri

hi,
 It seems like the test failures in quota are happening because of 
feature bugs.

Sachin/Du,
 Please feel free to update the status of the problems, what your 
recommendations are for the release, etc.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] good job on fixing heavy hitters in spurious regressions

2015-05-07 Thread Pranith Kumar Karampuri

hi,
   I think we fixed quite a few heavy hitters in the past week and 
reasonable number of regression runs are passing which is a good sign. 
Most of the new heavy hitters in regression failures seem to be code 
problems in quota/afr/ec, not sure about tier.t (Need to get more info 
about arbiter.t, read-subvol.t etc). Do you guys have any ideas in 
keeping the regression failures under control?


Here are some of the things that I can think of:
0) Maintainers should also maintain tests that are in their component.
1) If you guys see a spurious failure that is not seen before, please 
add it to https://public.pad.fsfe.org/p/gluster-spurious-failures and 
send a mail on gluster-devel with relevant info. CC component owner.
2) If the same test fails on different patches more than 'x' number of 
times we should do something drastic. Let us decide on 'x' and what the 
drastic measure is.
3) tests that fail with less amount of information should at least be 
fixed with adding more info to the test or improving logs in the code so 
that when it happens next time we have more information. Other option is 
to enable DEBUG logs, I am not a big fan of this because when users 
report problems also we should have just enough information to debug the 
problem, and users are not going to enable DEBUG logs.



Some good things I found this time around compared to 3.6.0 release:
1) Failing the regression on first failure is helping locating the 
failure logs really fast
2) More people chipped in fixing the tests that are not at all their 
responsibility, which is always great to see.


I think we should remove if it is a known bad test treat it as success 
code in some time and never add it again in future.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t

2015-05-07 Thread Pranith Kumar Karampuri


On 05/08/2015 10:02 AM, Atin Mukherjee wrote:


On 05/07/2015 03:00 PM, Krishnan Parthasarathi wrote:

Atin would be doing this, since he is looking into it.

HTH,
KP

- Original Message -

On 05/07/2015 02:53 PM, Krishnan Parthasarathi wrote:

- Original Message -

On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote:

Pranith,

The above snippet says that the volume has to be stopped before deleted.
It
also says that
volume-stop failed. I would look into glusterd logs to see why
volume-stop
failed,
cmd-history.log tells us only so much.

http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull
has the logs. I didn't find much information. Please feel free to take a
look. What can we add to the code so that this failure can be debugged
better in future? Please at least add that much for now?

Atin is already looking into this. Without the root cause, it's not useful
to
speculate how we could help debugging this. As we root cause, I am sure we
will
find things that we could have logged to reduce time to root cause. Does
that make sense?

Cool. Could you please update the pad:
https://public.pad.fsfe.org/p/gluster-spurious-failures with latest info
on this issue.

glusterd did log the following failure when volume stop was executed:

[2015-05-06 13:09:58.534114] I [socket.c:3358:socket_submit_request]
0-management: not connected (priv-connected = 0)
[2015-05-06 13:09:58.534137] W [rpc-clnt.c:1566:rpc_clnt_submit]
0-management: failed to submit rpc-request (XID: 0x1 Program: brick
operations, ProgVers: 2, Proc: 1) to rpc-transport (management)

This indicates the underlying transport connection was broken and
glusterd failed to send the rpc request to the brick. For this case,
glusterd didn't populate errstr because of which in cmd_history.log
volume stop was logged with a failure and a blank error message. I've
sent patch [1] to populate errstr for this failure.
Thanks Atin, please move this test to resolved section in the pad if not 
already.


Pranith


[1] http://review.gluster.org/10659

~Atin

Pranith

Pranith


HTH,
KP

- Original Message -

hi,
Volume delete is failing without logging much about why it is
failing. Know anything about this?
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull)
  1 [2015-05-06 13:09:58.311519]  : volume heal patchy statistics
heal-count : SUCCESS
  0 [2015-05-06 13:09:58.534917]  : volume stop patchy : FAILED :
  1 [2015-05-06 13:09:58.904333]  : volume delete patchy : FAILED :
Volume patchy has been started.Volume needs to be stopped before
deletion.

Pranith





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] failure in tests/basic/afr/arbiter.t

2015-05-07 Thread Pranith Kumar Karampuri

hi Ravi,
  Could you look into 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8723/consoleFull


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regression status upate

2015-05-07 Thread Pranith Kumar Karampuri


On 05/08/2015 07:45 AM, Emmanuel Dreyfus wrote:

Emmanuel Dreyfus m...@netbsd.org wrote:


- tests/basic/ec/
This worked but with rare spurious faiures. They are the same as on
Linux and work have been done, hence I think I should probably enable
again, but it may have rotten a lot. I have to give it a try

It is rather grim. NetBSD ec tests went from rare spurious failures a
few weeks ago to complete reproductible failure (see below). Anyone
interested looking at it? A lot of errors ae preceded by Input/Output
error messages that suggest a common root.
I just sent a mail about the known issues we found in ec :-). We have 
fix for one submitted by Xavi but the other one will take a bit of time. 
These bugs were there in 3.6.0 as well, so they are not really 
regressions. Just that they are failing more often.


Pranith


Test Summary Report
---
./tests/basic/ec/ec-3-1.t(Wstat: 0 Tests: 217 Failed: 4)
   Failed tests:  133-134, 138-139
./tests/basic/ec/ec-4-1.t(Wstat: 0 Tests: 253 Failed: 6)
   Failed tests:  152-153, 157-158, 162-163
./tests/basic/ec/ec-5-1.t(Wstat: 0 Tests: 289 Failed: 8)
   Failed tests:  171-172, 176-177, 181-182, 186-187
./tests/basic/ec/ec-readdir.t(Wstat: 0 Tests: 9 Failed: 1)
   Failed test:  9
./tests/basic/ec/quota.t (Wstat: 0 Tests: 24 Failed: 1)
   Failed test:  24
./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 5)
   Failed tests:  184, 195, 206, 217, 228
Files=15, Tests=2711, 3306 wallclock secs




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] failure in tests/basic/afr/read-subvol-entry.t

2015-05-07 Thread Pranith Kumar Karampuri

Ravi,
  Please look into 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8735/consoleFull


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] failure in tests/bugs/snapshot/bug-1166197.t

2015-05-07 Thread Pranith Kumar Karampuri

hi,
Could you look into 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8734/consoleFull


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious ec failures

2015-05-07 Thread Pranith Kumar Karampuri


On 05/08/2015 09:46 AM, Emmanuel Dreyfus wrote:

Pranith Kumar Karampuri pkara...@redhat.com wrote:


1) Fops failing with EIO when locks are failing with errno other than
EAGAIN (mostly ESTALE at the moment). http://review.gluster.com/9407
should fix it.
2) Fop failing with EIO because of race with lookup and version update
code which leads to less than 'ec-fragments' number of bricks agreeing
on the version of the file. We are still working on this issue

On NetBSD, I have EIO because of this, does it falls into the second case?

[2015-05-08 03:15:41.046889] W [socket.c:642:__socket_rwv] 0-patchy-client-1: 
readv on 23.253.160.60:49153 failed (No message available)
[2015-05-08 03:15:41.047012] I [client.c:2086:client_rpc_notify] 
0-patchy-client-1: disconnected from patchy-client-1. Client process will keep 
trying to connect to glusterd until brick's port is available
[2015-05-08 03:15:41.095988] W [ec-common.c:412:ec_child_select] 
0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2)
[2015-05-08 03:15:41.218894] W [ec-combine.c:811:ec_combine_check] 
0-patchy-disperse-0: Mismatching xdata in answers of 'LOOKUP'

Yes versions are obtained in xdata.

Pranith

[2015-05-08 03:15:41.219466] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 
0-fuse: ----0001/dir1: failed to resolve 
(Input/output error)
[2015-05-08 03:15:41.219624] W [ec-common.c:412:ec_child_select] 
0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2)
[2015-05-08 03:15:41.223435] W [ec-common.c:412:ec_child_select] 
0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2)
[2015-05-08 03:15:41.227372] W [ec-common.c:412:ec_child_select] 
0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2)
[2015-05-08 03:15:41.232227] W [ec-combine.c:811:ec_combine_check] 
0-patchy-disperse-0: Mismatching xdata in answers of 'LOOKUP'
[2015-05-08 03:15:41.232770] W [fuse-bridge.c:484:fuse_entry_cbk] 
0-glusterfs-fuse: 2123: LOOKUP() /dir1/small = -1 (Input/output error)



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious ec failures

2015-05-07 Thread Pranith Kumar Karampuri
We have come to a point where the spurious failures in ec are because of 
bugs in code. There are two problems that need to be solved:
1) Fops failing with EIO when locks are failing with errno other than 
EAGAIN (mostly ESTALE at the moment). http://review.gluster.com/9407 
should fix it.
2) Fop failing with EIO because of race with lookup and version update 
code which leads to less than 'ec-fragments' number of bricks agreeing 
on the version of the file. We are still working on this issue


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] New regression failure

2015-05-08 Thread Pranith Kumar Karampuri


On 05/08/2015 03:47 PM, Atin Mukherjee wrote:

http://build.gluster.org/job/rackspace-regression-2GB-triggered/8782/consoleFull

Failed test case : tests/bugs/replicate/bug-976800.t

I've added it in the etherpad as well.
Thanks Atin! I see that the test doesn't disable flush-behind, which can 
lead to delayed closing of the file. For now adding that to see if it 
still shows up in future.

http://review.gluster.org/10666

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression failure release-3.7 for tests/basic/afr/entry-self-heal.t

2015-05-08 Thread Pranith Kumar Karampuri


On 05/08/2015 10:53 PM, Justin Clift wrote:

Seems like a new one, so it's been added to the Etherpad.

   http://build.gluster.org/job/regression-test-burn-in/23/console
This looks a lot similar to the data-self-heal.t test where healing 
fails to happen because both the threads end up not getting enough locks 
to perform heal in self-heal domain. taking blocking locks seem like an 
easy solution but that will decrease self-heal through put, so Ravi and 
I are still thinking about best way to solve this problem. Will take 
some time. I can add this and data-self-heal.t to badtests for now, if 
that helps.


Pranith


It's on a new slave VM (slave1), which has been disconnected in
Jenkins so it can be investigated.  It's using our standard
Jenkins auth.

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Proposal for improving throughput for regression test

2015-05-08 Thread Pranith Kumar Karampuri


On 05/08/2015 08:54 PM, Justin Clift wrote:

On 8 May 2015, at 10:02, Mohammed Rafi K C rkavu...@redhat.com wrote:

Hi All,

As we all know, our regression tests are killing us. An average, one
regression will take approximately two and half hours to complete the
run. So i guess this is the right time to think about enhancing our
regression.

Proposal 1:

Create a new option for the daemons to specify that it is running as
test mode, then we can skip fsync calls used for data durability.

Proposal 2:

Use ip address instead of host name, because it takes some good amount
of time to resolve from host name, and even some times causes spurious
failure.


Proposal 3:
Each component has a lot of .t files and there is redundancy in tests,
We can do a rework to reduce the .t files and make less number of tests
that covers unit testing for a component , and run regression runs once
in a day (nightly) .

Please provide your inputs for the proposed ideas , and feel free to add
a new idea.

Proposal 4:

Break the regression tests into parts that can be run in parallel.

So, instead of the regression testing for a particular CR going from the
first test to the last in a serial sequence, we break it up into a number
of chunks (dir based?) and make each of these a task.

That won't reduce the overall number of tests, but it should get the time
down for the result to be finished.

Caveat : We're going to need more VM's, as once we get into things
queueing up it's not going to help. :/
Raghavendra Talur(CCed) did some work on this earlier by using more 
docker isntances on a single VM to get the running time under an hour.


Pranith


+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ec spurious regression failures

2015-05-05 Thread Pranith Kumar Karampuri


On 05/05/2015 01:35 PM, Vijay Bellur wrote:

On 05/05/2015 11:40 AM, Emmanuel Dreyfus wrote:

Emmanuel Dreyfus m...@netbsd.org wrote:


I sent http://review.gluster.org/10540 to address it completely. Not
sure if it works on netBSD. Emmanuel help!!


I launched test runs in a loop on nbslave70. More later.


Failed on first pass:
Test Summary Report
---
./tests/basic/ec/ec-3-1.t(Wstat: 0 Tests: 217 Failed: 4)
   Failed tests:  133-134, 138-139
./tests/basic/ec/ec-4-1.t(Wstat: 0 Tests: 253 Failed: 6)
   Failed tests:  152-153, 157-158, 162-163
./tests/basic/ec/ec-5-1.t(Wstat: 0 Tests: 289 Failed: 8)
   Failed tests:  171-172, 176-177, 181-182, 186-187
./tests/basic/ec/ec-readdir.t(Wstat: 0 Tests: 9 Failed: 1)
   Failed test:  9
./tests/basic/ec/quota.t (Wstat: 0 Tests: 24 Failed: 1)
   Failed test:  24





In addition ec-12-4.t has started failing again [1]. Have added a note 
about this to the etherpad.
Already updated the status about this in the earlier mail. 
http://review.gluster.org/10539 is the fix.


-Vijay

[1] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8312/consoleFull


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ec spurious regression failures

2015-05-05 Thread Pranith Kumar Karampuri


On 05/05/2015 01:54 PM, Emmanuel Dreyfus wrote:

On Tue, May 05, 2015 at 01:45:03PM +0530, Pranith Kumar Karampuri wrote:

Already updated the status about this in the earlier mail.
http://review.gluster.org/10539 is the fix.

That one only touches bug-1202244-support-inode-quota.t ...

RCA: http://www.gluster.org/pipermail/gluster-devel/2015-May/044799.html

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] good job on fixing heavy hitters in spurious regressions

2015-05-08 Thread Pranith Kumar Karampuri


On 05/08/2015 09:14 AM, Krishnan Parthasarathi wrote:


- Original Message -

hi,
 I think we fixed quite a few heavy hitters in the past week and
reasonable number of regression runs are passing which is a good sign.
Most of the new heavy hitters in regression failures seem to be code
problems in quota/afr/ec, not sure about tier.t (Need to get more info
about arbiter.t, read-subvol.t etc). Do you guys have any ideas in
keeping the regression failures under control?

The deluge of regression failures is a direct consequence of last minute
merges during (extended) feature freeze. We did well to contain this. Great 
stuff!
If we want to avoid this we should not accept (large) feature merges just 
before feature freeze.
Hmm... I am not sure, most of the fixes in the last week I saw were bugs 
in tests or .rc files. The failures in afr and ec were problems that 
existed even in 3.6. They are showing up more now probably because 3.7 
is a bit more parallel.


Pranith



Here are some of the things that I can think of:
0) Maintainers should also maintain tests that are in their component.

It is not possible for me as glusterd co-maintainer to 'maintain' tests that 
are added
under tests/bugs/glusterd. Most of them don't test core glusterd functionality.
They are almost always tied to a particular feature whose implementation had 
bugs
in its glusterd code. I would expect the test authors (esp. the more recent 
ones) to chip in.
Thoughts/Suggestions?
How about moving these tests to the respective component and not 
accepting tests in other components to be under tests/bugs/glusterd in 
future?


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] good job on fixing heavy hitters in spurious regressions

2015-05-08 Thread Pranith Kumar Karampuri


On 05/08/2015 04:45 PM, Ravishankar N wrote:



On 05/08/2015 08:45 AM, Pranith Kumar Karampuri wrote:
Do you guys have any ideas in keeping the regression failures under 
control? 


I sent a patch to append the commands being run in the .t files to 
gluster logs @ http://review.gluster.org/#/c/10667/
While it certainly doesn't help check regression failures, I think it 
makes log analysis a bit easier. Comments welcome. :-)

Neat :-). Do you think we can also add .t before the test?
From:
[2015-05-08 11:02:43.062108594]:++ TEST: 47 abc cat 
/mnt/glusterfs/0/b ++

To:
[2015-05-08 11:02:43.062108594]:++ test-name.t:TEST: 47 abc 
cat /mnt/glusterfs/0/b ++


Pranith


-Ravi

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] good job on fixing heavy hitters in spurious regressions

2015-05-08 Thread Pranith Kumar Karampuri


On 05/09/2015 12:33 AM, Jeff Darcy wrote:

I submit a patch for new-component/changing log-level of one of the logs
for which there is not a single caller after you moved it from INFO -
DEBUG. So the code is not at all going to be executed. Yet the
regressions will fail. I am 100% sure it has nothing to do with my
patch. I neither have time nor expertise to debug the test that I have
no clue about, so the least I can do is to intimate people who may do
something about it i.e. owner of test or maintainer of module. You feel
lets ask the owner of the test about what the problem is, owner of the
test moves on to different component and is busy with their own work. So
you are left with going to the maintainer who tells you so and so is the
problem and so and so is the reason as soon as you show the test number,
so you end up feeling why didn't I ask him/her first.

What you describe sounds more like a problem than a solution.  The
component maintainers shouldn't be the only ones who have this
information.

I think this is already solved by having the public pad.

Both patch submitters and test owners should be able
to find it on a public test-status page.

Yes, they are already refering to the pad.

The test owner should be
*very* well aware of the problem, because it should be at or near the
top of their priority list.
What is so special about 'test' code? It is still code, if maintainers 
are maintaining feature code and held responsible, why not test code? It 
is not that maintainer is the only one who fixes all the problems in the 
code they maintain, but they are still responsible for maintaining 
quality of code. Why shouldn't they do the same for quality of tests 
that test the component they maintain?

By putting the onus on the test owner,
we achieve two positive things: we lessen the burden on component
(or release) maintainers, and we give other people a strong incentive
to fix problems in their own (test) code.
This has been successful only when people who wrote the tests are still 
working on same component.

Assigning primary
responsibility to maintainers has the exact opposite effects.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)

2015-05-09 Thread Pranith Kumar Karampuri


On 05/09/2015 03:19 PM, Krishnan Parthasarathi wrote:

Why not break glusterd into small parts and distribute the load to
different people? Did you guys plan anything for 4.0 for breaking glusterd?
It is going to be a maintenance hell if we don't break it sooner.

Good idea. We have thought about it. Just re-architecting glusterd doesn't
(and will not) solve the division of responsibility issue that is being 
discussed here.
It's already difficult to maintain glusterd. I have already explained the 
reasons
in the previous thread.
I was thinking *-cli xlators could be maintained by the respective fs 
team itself. It is easier to maintain it this way because each of those 
xls can be put in xlators/cluster/afr/cli, xlators/cluster/dht/cli, etc. 
There will be clear demarcation of who owns what this way is my feeling. 
Even the tests can be organized to tests/afr-cli, tests/dht-cli etc etc.





Glusterd does a lot of things: Lets see how we can break things up one
thing at a time. I would love to spend some quality time thinking about
this problem once I am done with ec work, but this is a rough idea I
have for glusterd.

1) CLI handling:
Glusterd-cli-xlator should act something like fuse in fs. It just gets
the commands and passes it down, just like fuse gets the fops and passes
it down. In glusterd process there should be snapshot.so, afr-cli.so,
ec-cli.so, dht-cli.so loaded as management-xlators.
Just like we have fops lets have mops (management operations).
LOCK/STAGE/BRICK-OP/COMMIT-OP if there are more add them as well. Every
time the top xlator in glusterd receives commands from cli, it converts
the params into the arguments (req, op, dict etc) which are needed to
carryout the cli. Now it winds the fop to all its children. One of the
children is going to handle it locally, while the other child will send
the cli to different glusterds that are in cluster. Second child of
gluster-cli-xlator (give it a better name, but for now lets call it:
mgmtcluster) will collate the responses and give the list of responses
to glusterd-cli-xlator, it will call COLLATE mop on the first-child(lets
call it local-handler) to collate the responses, i.e. logic for
collating responses should also be in snapshot.so, afr-cli.so,
dht-cli.so etc etc. Once the top translator does LOCK, STAGE, BRICK-OP,
COMMIT-OP send response to CLI.

2) Volinfo should become more like inode_t in fs where each *-cli xlator
can store their own ctx like snapshot-cli can store all snapshot related
info for that volume in that context and afr can store afr-related info
in the ctx. Volinfo data strcuture should have very minimal information.
Maybe name, bricks etc.

3) Daemon handling:
   Daemon-manager xlator should have MOPS like START/STOP/INFO and
this xlator should be accessible for all the -cli xlators which want to
do their own management of the daemons. i.e. ec-cli/afr-cli should do
self-heal-daemon handling. dht should do rebalance process handling etc.
to give an example:
while winding START mop it has to specify the daemon as
self-heal-daemon and give enough info etc.

4) Peer handling:
  mgmtcluster(second child of top-xlator) should have MOPS like
PEER_ADD/PEER_DEL/PEER_UPDATE etc to do the needful. top xlator is going
to wind these operations based on the peer-cli-commands to this xlator.

5) volgen:
  top xlator is going to wind MOP called GET_NODE_LINKS, which takes
the type of volfile (i.e. mount/nfs/shd/brick etc) on which each *-cli
will construct its node(s), stuff options and tell the parent xl-name to
which it needs to be linked to. Top xlator is going to just link the
nodes to construct the graph and does graph_print to generate the volfile.

I am pretty sure I forgot some more aspects of what glusterd does but
you get the picture right? Break each aspect into different xlator and
have MOPS to solve them.

We have some initial ideas on how glusterd for 4.0 would look like. We won't be
continuing with glusterd is also a translator model. The above model would
work well only if we stuck with the stack of translators approach.
Oh nice, I might have missed the mails. Do you mind sharing the plan for 
4.0? Any reason why you guys do not want to continue glusterd as 
translator model?


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)

2015-05-09 Thread Pranith Kumar Karampuri


On 05/09/2015 11:08 AM, Krishnan Parthasarathi wrote:

Ah! now I understood the confusion. I never said maintainer should fix
all the bugs in tests. I am only saying that they maintain tests, just
like we maintain code. Whether you personally work on it or not, you at
least have an idea of what is the problem and what is the solution so
someone can come and ask you and you know the status of it. Expectation
is not to fix every test failure that comes maintainer's way by
maintainer alone. But he/she would know about problem/solution because
he/she at least reviews it and merges it. We want to make sure that the
tests are in good quality as well just like we make sure code is of good
quality. Core is a special case. We will handle it separately.

Glusterd is also a 'special' case. As a glusterd maintainer, I am _not_ 
maintaining
insert-your-favourite-gluster-command-here's implementation. So, I don't
'know'/'understand' how it has been implemented and by extension I wouldn't be 
able
to fix it (forget maintaining it :-) ). Given the no. of gluster commands, I 
won't be
surprised if I didn't have an inkling on how your-favourite-gluster-command 
worked ;-)
I hope this encourages other contributors, i.e, any gluster (feature) 
contributor,
to join Kaushal and me in maintaining glusterd.
I understand the frustration kp :-). Human brain can only take so much. 
I think we are solving wrong problem by putting more people on the code. 
Why not break glusterd into small parts and distribute the load to 
different people? Did you guys plan anything for 4.0 for breaking glusterd?

It is going to be a maintenance hell if we don't break it sooner.

Glusterd does a lot of things: Lets see how we can break things up one 
thing at a time. I would love to spend some quality time thinking about 
this problem once I am done with ec work, but this is a rough idea I 
have for glusterd.


1) CLI handling:
Glusterd-cli-xlator should act something like fuse in fs. It just gets 
the commands and passes it down, just like fuse gets the fops and passes 
it down. In glusterd process there should be snapshot.so, afr-cli.so, 
ec-cli.so, dht-cli.so loaded as management-xlators.
Just like we have fops lets have mops (management operations). 
LOCK/STAGE/BRICK-OP/COMMIT-OP if there are more add them as well. Every 
time the top xlator in glusterd receives commands from cli, it converts 
the params into the arguments (req, op, dict etc) which are needed to 
carryout the cli. Now it winds the fop to all its children. One of the 
children is going to handle it locally, while the other child will send 
the cli to different glusterds that are in cluster. Second child of 
gluster-cli-xlator (give it a better name, but for now lets call it: 
mgmtcluster) will collate the responses and give the list of responses 
to glusterd-cli-xlator, it will call COLLATE mop on the first-child(lets 
call it local-handler) to collate the responses, i.e. logic for 
collating responses should also be in snapshot.so, afr-cli.so, 
dht-cli.so etc etc. Once the top translator does LOCK, STAGE, BRICK-OP, 
COMMIT-OP send response to CLI.


2) Volinfo should become more like inode_t in fs where each *-cli xlator 
can store their own ctx like snapshot-cli can store all snapshot related 
info for that volume in that context and afr can store afr-related info 
in the ctx. Volinfo data strcuture should have very minimal information. 
Maybe name, bricks etc.


3) Daemon handling:
 Daemon-manager xlator should have MOPS like START/STOP/INFO and 
this xlator should be accessible for all the -cli xlators which want to 
do their own management of the daemons. i.e. ec-cli/afr-cli should do 
self-heal-daemon handling. dht should do rebalance process handling etc. 
to give an example:
while winding START mop it has to specify the daemon as 
self-heal-daemon and give enough info etc.


4) Peer handling:
mgmtcluster(second child of top-xlator) should have MOPS like 
PEER_ADD/PEER_DEL/PEER_UPDATE etc to do the needful. top xlator is going 
to wind these operations based on the peer-cli-commands to this xlator.


5) volgen:
top xlator is going to wind MOP called GET_NODE_LINKS, which takes 
the type of volfile (i.e. mount/nfs/shd/brick etc) on which each *-cli 
will construct its node(s), stuff options and tell the parent xl-name to 
which it needs to be linked to. Top xlator is going to just link the 
nodes to construct the graph and does graph_print to generate the volfile.


I am pretty sure I forgot some more aspects of what glusterd does but 
you get the picture right? Break each aspect into different xlator and 
have MOPS to solve them.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)

2015-05-09 Thread Pranith Kumar Karampuri


On 05/09/2015 02:21 PM, Atin Mukherjee wrote:


On 05/09/2015 01:36 PM, Pranith Kumar Karampuri wrote:

On 05/09/2015 11:08 AM, Krishnan Parthasarathi wrote:

Ah! now I understood the confusion. I never said maintainer should fix
all the bugs in tests. I am only saying that they maintain tests, just
like we maintain code. Whether you personally work on it or not, you at
least have an idea of what is the problem and what is the solution so
someone can come and ask you and you know the status of it. Expectation
is not to fix every test failure that comes maintainer's way by
maintainer alone. But he/she would know about problem/solution because
he/she at least reviews it and merges it. We want to make sure that the
tests are in good quality as well just like we make sure code is of good
quality. Core is a special case. We will handle it separately.

Glusterd is also a 'special' case. As a glusterd maintainer, I am
_not_ maintaining
insert-your-favourite-gluster-command-here's implementation. So, I
don't
'know'/'understand' how it has been implemented and by extension I
wouldn't be able
to fix it (forget maintaining it :-) ). Given the no. of gluster
commands, I won't be
surprised if I didn't have an inkling on how
your-favourite-gluster-command worked ;-)
I hope this encourages other contributors, i.e, any gluster (feature)
contributor,
to join Kaushal and me in maintaining glusterd.

I understand the frustration kp :-). Human brain can only take so much.
I think we are solving wrong problem by putting more people on the code.
Why not break glusterd into small parts and distribute the load to
different people? Did you guys plan anything for 4.0 for breaking glusterd?
It is going to be a maintenance hell if we don't break it sooner.

Glusterd does a lot of things: Lets see how we can break things up one
thing at a time. I would love to spend some quality time thinking about
this problem once I am done with ec work, but this is a rough idea I
have for glusterd.

1) CLI handling:
Glusterd-cli-xlator should act something like fuse in fs. It just gets
the commands and passes it down, just like fuse gets the fops and passes
it down. In glusterd process there should be snapshot.so, afr-cli.so,
ec-cli.so, dht-cli.so loaded as management-xlators.
Just like we have fops lets have mops (management operations).
LOCK/STAGE/BRICK-OP/COMMIT-OP if there are more add them as well. Every
time the top xlator in glusterd receives commands from cli, it converts
the params into the arguments (req, op, dict etc) which are needed to
carryout the cli. Now it winds the fop to all its children. One of the
children is going to handle it locally, while the other child will send
the cli to different glusterds that are in cluster. Second child of
gluster-cli-xlator (give it a better name, but for now lets call it:
mgmtcluster) will collate the responses and give the list of responses
to glusterd-cli-xlator, it will call COLLATE mop on the first-child(lets
call it local-handler) to collate the responses, i.e. logic for
collating responses should also be in snapshot.so, afr-cli.so,
dht-cli.so etc etc. Once the top translator does LOCK, STAGE, BRICK-OP,
COMMIT-OP send response to CLI.

2) Volinfo should become more like inode_t in fs where each *-cli xlator
can store their own ctx like snapshot-cli can store all snapshot related
info for that volume in that context and afr can store afr-related info
in the ctx. Volinfo data strcuture should have very minimal information.
Maybe name, bricks etc.

3) Daemon handling:
  Daemon-manager xlator should have MOPS like START/STOP/INFO and
this xlator should be accessible for all the -cli xlators which want to
do their own management of the daemons. i.e. ec-cli/afr-cli should do
self-heal-daemon handling. dht should do rebalance process handling etc.
to give an example:
while winding START mop it has to specify the daemon as
self-heal-daemon and give enough info etc.

4) Peer handling:
 mgmtcluster(second child of top-xlator) should have MOPS like
PEER_ADD/PEER_DEL/PEER_UPDATE etc to do the needful. top xlator is going
to wind these operations based on the peer-cli-commands to this xlator.

5) volgen:
 top xlator is going to wind MOP called GET_NODE_LINKS, which takes
the type of volfile (i.e. mount/nfs/shd/brick etc) on which each *-cli
will construct its node(s), stuff options and tell the parent xl-name to
which it needs to be linked to. Top xlator is going to just link the
nodes to construct the graph and does graph_print to generate the volfile.

I am pretty sure I forgot some more aspects of what glusterd does but
you get the picture right? Break each aspect into different xlator and
have MOPS to solve them.

Sounds interesting but needs to be thought out in details. For 4.0,wWe
do have a plan to make core glusterd algorithms work as a glusterd
engine and other features will work have interfaces to connect to it.
Your proposal looks another alternative. I would like to hear from

Re: [Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)

2015-05-09 Thread Pranith Kumar Karampuri


On 05/09/2015 03:04 PM, Kaushal M wrote:

Modularising GlusterD is something we plan to do.  As of now, it's
just that a plan. We don't yet have a design to achieve it yet.

What Atin mentioned and what you've mentioned seem to be the same at a
high level. The core of GlusterD will be a co-ordinating engine, which
defines an interface for commands to use to do their work. The
commands will each be a seperate module implementing this interface.
Depending on how we implement, the actual names will be different.
Yes, this is a nice approach. It would be nice if there is a clear 
demarcation as well for the code, so there won't be any dependency with 
merging dht changes vs say afr changes in cli. That is why I was 
suggesting xlator based solution. But other ways of doing it where there 
is clear demarcation is welcome as well. Would love to know more about 
the other approaches :-).


Pranith



On Sat, May 9, 2015 at 2:24 PM, Pranith Kumar Karampuri
pkara...@redhat.com wrote:

On 05/09/2015 02:21 PM, Atin Mukherjee wrote:


On 05/09/2015 01:36 PM, Pranith Kumar Karampuri wrote:

On 05/09/2015 11:08 AM, Krishnan Parthasarathi wrote:

Ah! now I understood the confusion. I never said maintainer should fix
all the bugs in tests. I am only saying that they maintain tests, just
like we maintain code. Whether you personally work on it or not, you at
least have an idea of what is the problem and what is the solution so
someone can come and ask you and you know the status of it. Expectation
is not to fix every test failure that comes maintainer's way by
maintainer alone. But he/she would know about problem/solution because
he/she at least reviews it and merges it. We want to make sure that the
tests are in good quality as well just like we make sure code is of
good
quality. Core is a special case. We will handle it separately.

Glusterd is also a 'special' case. As a glusterd maintainer, I am
_not_ maintaining
insert-your-favourite-gluster-command-here's implementation. So, I
don't
'know'/'understand' how it has been implemented and by extension I
wouldn't be able
to fix it (forget maintaining it :-) ). Given the no. of gluster
commands, I won't be
surprised if I didn't have an inkling on how
your-favourite-gluster-command worked ;-)
I hope this encourages other contributors, i.e, any gluster (feature)
contributor,
to join Kaushal and me in maintaining glusterd.

I understand the frustration kp :-). Human brain can only take so much.
I think we are solving wrong problem by putting more people on the code.
Why not break glusterd into small parts and distribute the load to
different people? Did you guys plan anything for 4.0 for breaking
glusterd?
It is going to be a maintenance hell if we don't break it sooner.

Glusterd does a lot of things: Lets see how we can break things up one
thing at a time. I would love to spend some quality time thinking about
this problem once I am done with ec work, but this is a rough idea I
have for glusterd.

1) CLI handling:
Glusterd-cli-xlator should act something like fuse in fs. It just gets
the commands and passes it down, just like fuse gets the fops and passes
it down. In glusterd process there should be snapshot.so, afr-cli.so,
ec-cli.so, dht-cli.so loaded as management-xlators.
Just like we have fops lets have mops (management operations).
LOCK/STAGE/BRICK-OP/COMMIT-OP if there are more add them as well. Every
time the top xlator in glusterd receives commands from cli, it converts
the params into the arguments (req, op, dict etc) which are needed to
carryout the cli. Now it winds the fop to all its children. One of the
children is going to handle it locally, while the other child will send
the cli to different glusterds that are in cluster. Second child of
gluster-cli-xlator (give it a better name, but for now lets call it:
mgmtcluster) will collate the responses and give the list of responses
to glusterd-cli-xlator, it will call COLLATE mop on the first-child(lets
call it local-handler) to collate the responses, i.e. logic for
collating responses should also be in snapshot.so, afr-cli.so,
dht-cli.so etc etc. Once the top translator does LOCK, STAGE, BRICK-OP,
COMMIT-OP send response to CLI.

2) Volinfo should become more like inode_t in fs where each *-cli xlator
can store their own ctx like snapshot-cli can store all snapshot related
info for that volume in that context and afr can store afr-related info
in the ctx. Volinfo data strcuture should have very minimal information.
Maybe name, bricks etc.

3) Daemon handling:
   Daemon-manager xlator should have MOPS like START/STOP/INFO and
this xlator should be accessible for all the -cli xlators which want to
do their own management of the daemons. i.e. ec-cli/afr-cli should do
self-heal-daemon handling. dht should do rebalance process handling etc.
to give an example:
while winding START mop it has to specify the daemon as
self-heal-daemon and give enough info etc.

4) Peer handling:
  mgmtcluster(second child of top

Re: [Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)

2015-05-09 Thread Pranith Kumar Karampuri


On 05/09/2015 04:23 PM, Krishnan Parthasarathi wrote:

Oh nice, I might have missed the mails. Do you mind sharing the plan for
4.0? Any reason why you guys do not want to continue glusterd as
translator model?

I don't understand why we are using the translator model in the first place.
I guess it was to reuse rpc code. You should be able to shed more light here.

Even I am not sure :-). It was a translator by the time I got in.

A quick google search with glusterd 2.0 gluster-users, gave me this
http://www.gluster.org/pipermail/gluster-users/2014-September/018639.html.
Interestingly you asked us to consider AFR/NSR for distributed configuration
management, which lead to 
http://www.gluster.org/pipermail/gluster-devel/2014-November/042944.html
This proposal didn't go in the expected direction.

I don't want to get into why not use translators now. We are currently 
heading in the
direction visible in the above threads. If glusterd can't be a translator 
anymore, so be it.
Kaushal's response gave the answers I was looking for. We should 
probably discuss it more once you guys come up with the interface CLI 
handling code needs to follow. I was thinking it would be great if you 
come up with a model where the handler code will be separate from the 
code of glusterd, which is what you guys seem to be targeting. 
Translator model is one way of achieving it, I personally love it on the 
FS side, that is why I was curious why it was not used. But any other 
way where the above requirements are met is welcome.

Really excited to see what will come up :-).

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] good job on fixing heavy hitters in spurious regressions

2015-05-08 Thread Pranith Kumar Karampuri


On 05/09/2015 02:31 AM, Jeff Darcy wrote:

What is so special about 'test' code?

A broken test blocks everybody's progress in a way that an incomplete
feature does not.


It is still code, if maintainers
are maintaining feature code and held responsible, why not test code? It
is not that maintainer is the only one who fixes all the problems in the
code they maintain, but they are still responsible for maintaining
quality of code. Why shouldn't they do the same for quality of tests
that test the component they maintain?

You said it yourself: the maintainer isn't the only one who fixes all of
the problems.  I would certainly hope that people working on a component
would keep that component's maintainer informed about what they're doing,
but that's not the same as making the component maintainer *directly*
responsible for every fix.  That especially doesn't work for core which
is a huge grab-bag full of different things best understood by different
people.  To turn your own question around, what's so special about test
code that we should short-circuit bugs to the maintainer right away?
Ah! now I understood the confusion. I never said maintainer should fix 
all the bugs in tests. I am only saying that they maintain tests, just 
like we maintain code. Whether you personally work on it or not, you at 
least have an idea of what is the problem and what is the solution so 
someone can come and ask you and you know the status of it. Expectation 
is not to fix every test failure that comes maintainer's way by 
maintainer alone. But he/she would know about problem/solution because 
he/she at least reviews it and merges it. We want to make sure that the 
tests are in good quality as well just like we make sure code is of good 
quality. Core is a special case. We will handle it separately.


Pranith



By putting the onus on the test owner,
we achieve two positive things: we lessen the burden on component
(or release) maintainers, and we give other people a strong incentive
to fix problems in their own (test) code.

This has been successful only when people who wrote the tests are still
working on same component.

Owner and original author are not necessarily the same thing.  If
someone is unavailable (e.g. new job), or has forgotten too much to be
effective, then ownership should already have been reassigned.  Then
owner first or maintainer first doesn't matter, because they're
the same person.  The only really tricky case is when the original
author and person most qualified to work on a test is still around but
unable/unwilling to work on fixing a test, e.g. because their employer
insists they work on something else.  Perhaps those issues are best
addressed on a different mailing list.  ;)  As far as the project is
concerned, what can we do?  Our only practical option might be to
have someone else fix the test.  If that's the case then so be it, but
that should be a case-by-case decision and not a default.  In the more
common cases, responsibility for fixing tests should rest with the
same person who's responsible for the associated production code.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] regression: brick crashed because of changelog xlator init failure

2015-05-08 Thread Pranith Kumar Karampuri

hi Kotresh/Aravinda,
 Do you guys know anything about following core which comes because 
of changelog xlator init failure? It just failed regression on one of my 
patches: http://review.gluster.org/#/c/10688


 24 [2015-05-08 21:34:47.750460] E [xlator.c:426:xlator_init] 
0-patchy-changelog: Initialization of volume 'patchy-changelog' failed, 
review your volfile again
 23 [2015-05-08 21:34:47.750485] E [graph.c:322:glusterfs_graph_init] 
0-patchy-changelog: initializing translator failed
 22 [2015-05-08 21:34:47.750497] E 
[graph.c:661:glusterfs_graph_activate] 0-graph: init failed
 21 [2015-05-08 21:34:47.749020] I 
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 2

 20 pending frames:
 19 frame : type(0) op(0)
 18 patchset: git://git.gluster.com/glusterfs.git
 17 signal received: 11
 16 time of crash:
 15 2015-05-08 21:34:47
 14 configuration details:
 13 argp 1
 12 backtrace 1
 11 dlfcn 1
 10 libpthread 1
  9 llistxattr 1
  8 setfsid 1
  7 spinlock 1
  6 epoll.h 1
  5 xattr.h 1
  4 st_atim.tv_nsec 1
  3 package-string: glusterfs 3.7.0beta1
  2 pending frames:
  1 frame : type(0) op(0)
  0 patchset: git://git.gluster.com/glusterfs.git

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] regression: brick crashed because of changelog xlator init failure

2015-05-08 Thread Pranith Kumar Karampuri


On 05/09/2015 03:26 AM, Pranith Kumar Karampuri wrote:

hi Kotresh/Aravinda,
 Do you guys know anything about following core which comes 
because of changelog xlator init failure? It just failed regression on 
one of my patches: http://review.gluster.org/#/c/10688
Sorry wrong URL, this is the correct one: 
http://review.gluster.com/#/c/10693/


Pranith


 24 [2015-05-08 21:34:47.750460] E [xlator.c:426:xlator_init] 
0-patchy-changelog: Initialization of volume 'patchy-changelog' 
failed, review your volfile again
 23 [2015-05-08 21:34:47.750485] E [graph.c:322:glusterfs_graph_init] 
0-patchy-changelog: initializing translator failed
 22 [2015-05-08 21:34:47.750497] E 
[graph.c:661:glusterfs_graph_activate] 0-graph: init failed
 21 [2015-05-08 21:34:47.749020] I 
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started 
thread with index 2

 20 pending frames:
 19 frame : type(0) op(0)
 18 patchset: git://git.gluster.com/glusterfs.git
 17 signal received: 11
 16 time of crash:
 15 2015-05-08 21:34:47
 14 configuration details:
 13 argp 1
 12 backtrace 1
 11 dlfcn 1
 10 libpthread 1
  9 llistxattr 1
  8 setfsid 1
  7 spinlock 1
  6 epoll.h 1
  5 xattr.h 1
  4 st_atim.tv_nsec 1
  3 package-string: glusterfs 3.7.0beta1
  2 pending frames:
  1 frame : type(0) op(0)
  0 patchset: git://git.gluster.com/glusterfs.git

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Possibly root cause for the Gluster regression test cores?

2015-04-09 Thread Pranith Kumar Karampuri


On 04/08/2015 07:08 PM, Justin Clift wrote:

On 8 Apr 2015, at 14:13, Pranith Kumar Karampuri pkara...@redhat.com wrote:

On 04/08/2015 06:20 PM, Justin Clift wrote:

snip

Hagarth mentioned in the weekly IRC meeting that you have an
idea what might be causing the regression tests to generate
cores?

Can you outline that quickly, as Jeff has some time and might
be able to help narrow it down further. :)

(and these core files are really annoying :/)

I feel it is a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1184417. 
clear-locks command is not handled properly after we did the client_t refactor. 
I believe that is the reason for the crashes but I could be wrong. But After 
looking at the code I feel there is high probability that this is the issue. I 
didn't find it easy to fix. We will need to change the lock structure list 
maintenance heavily. Easier thing would be to disable clear-locks functionality 
tests in the regression as it is not something that is used by the users IMO 
and see if it indeed is the same issue. There are 2 tests using this command:
18:34:00 :) ⚡ git grep clear-locks tests
tests/bugs/disperse/bug-1179050.t:TEST $CLI volume clear-locks $V0 / kind all 
inode
tests/bugs/glusterd/bug-824753-file-locker.c: gluster volume clear-locks %s /%s 
kind all posix 0,7-1 |

If even after disabling these two tests it fails then we will need to look 
again. I think jeff's patch which will find the test which triggered the core 
should help here.

Thanks Pranith. :)

Is this other problem when disconnecting BZ possibly related, or is that a
different thing?

   https://bugzilla.redhat.com/show_bug.cgi?id=1195415

I feel 1195415 could be a duplicate of 1184417.

Pranith


+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] regarding sharding

2015-04-05 Thread Pranith Kumar Karampuri

hi,
  As I am not able to spend much time on sharding, Kritika is the 
handling it completely now. I am only doing reviews. Just letting 
everyone know so that future communication will happen directly with the 
active developer :-).


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Possibly root cause for the Gluster regression test cores?

2015-04-08 Thread Pranith Kumar Karampuri


On 04/08/2015 06:20 PM, Justin Clift wrote:

Hi Pranith,

Hagarth mentioned in the weekly IRC meeting that you have an
idea what might be causing the regression tests to generate
cores?

Can you outline that quickly, as Jeff has some time and might
be able to help narrow it down further. :)

(and these core files are really annoying :/)
I feel it is a lot like 
https://bugzilla.redhat.com/show_bug.cgi?id=1184417. clear-locks command 
is not handled properly after we did the client_t refactor. I believe 
that is the reason for the crashes but I could be wrong. But After 
looking at the code I feel there is high probability that this is the 
issue. I didn't find it easy to fix. We will need to change the lock 
structure list maintenance heavily. Easier thing would be to disable 
clear-locks functionality tests in the regression as it is not something 
that is used by the users IMO and see if it indeed is the same issue. 
There are 2 tests using this command:

18:34:00 :) ⚡ git grep clear-locks tests
tests/bugs/disperse/bug-1179050.t:TEST $CLI volume clear-locks $V0 / 
kind all inode
tests/bugs/glusterd/bug-824753-file-locker.c: gluster volume 
clear-locks %s /%s kind all posix 0,7-1 |


If even after disabling these two tests it fails then we will need to 
look again. I think jeff's patch which will find the test which 
triggered the core should help here.


Pranith


Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Moratorium on new patch acceptance

2015-05-20 Thread Pranith Kumar Karampuri



On 05/21/2015 12:07 AM, Vijay Bellur wrote:

On 05/19/2015 11:56 PM, Vijay Bellur wrote:

On 05/18/2015 08:03 PM, Vijay Bellur wrote:

On 05/16/2015 03:34 PM, Vijay Bellur wrote:



I will send daily status updates from Monday (05/18) about this so 
that
we are clear about where we are and what needs to be done to remove 
this

moratorium. Appreciate your help in having a clean set of regression
tests going forward!



We have made some progress since Saturday. The problem with glupy.t has
been fixed - thanks to Niels! All but following tests have developers
looking into them:

 ./tests/basic/afr/entry-self-heal.t

 ./tests/bugs/replicate/bug-976800.t

 ./tests/bugs/replicate/bug-1015990.t

 ./tests/bugs/quota/bug-1038598.t

 ./tests/basic/ec/quota.t

 ./tests/basic/quota-nfs.t

 ./tests/bugs/glusterd/bug-974007.t

Can submitters of these test cases or current feature owners pick these
up and start looking into the failures please? Do update the spurious
failures etherpad [1] once you pick up a particular test.


[1] https://public.pad.fsfe.org/p/gluster-spurious-failures



Update for today - all tests that are known to fail have owners. Thanks
everyone for chipping in! I think we should be able to lift this
moratorium and resume normal patch acceptance shortly.



Today's update - Pranith fixed a bunch of failures in erasure coding 
and Avra removed a test that was not relevant anymore - thanks for that!

Xavi and I both sent a patch each for fixing these. But..
I ran the regression 4 times and it succeeded 3 times and failed once on 
xml.t before merging, I thought these were the last fixes for this 
problem. Ashish found a way to recreate these same EIO errors so all is 
not well yet. Xavi is sending one more patch tomorrow which addresses 
that problem as well. While testing another patch on master I found that 
there is use after free issue in ec :-(. I am not able to send the fix 
for it because gerrit ran out of space?


Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 1.10 KiB | 0 bytes/s, done.
Total 9 (delta 7), reused 0 (delta 0)
fatal: Unpack error, check server log
error: unpack failed: error No space left on device --


PS: Since valgrind is giving so much pain, I used Address sanitizer for 
debugging this mem-corruption. It is amazing! I followed 
http://tsdgeos.blogspot.in/2014/03/asan-and-gcc-how-to-get-line-numbers-in.html 
for getting the backtrace with line-numbers. It doesn't generate core 
with gcc-4.8 though (I had to use -N flag for starting mount process to 
get the output on stderr). I think in future versions of gcc we don't 
need to do all this. I will try and post my experience once I upgrade to 
fedora22 which has gcc5.


Pranith


Quota, afr, snapshot  tiering tests are being looked into. Will 
provide an update on where we are with these tomorrow.


Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failures? (master)

2015-06-04 Thread Pranith Kumar Karampuri



On 06/05/2015 02:12 AM, Shyam wrote:

Just checking,

This review request: http://review.gluster.org/#/c/11073/

Failed in the following tests:

1) Linux
[20:20:16] ./tests/bugs/replicate/bug-880898.t ..
not ok 4
This seems to be same RC as in self-heald.t where heal info is not 
failing sometimes when the brick is down.

Failed 1/4 subtests
[20:20:16]

http://build.gluster.org/job/rackspace-regression-2GB-triggered/10088/consoleFull 



2) NetBSD (Du seems to have faced the same)
[11:56:45] ./tests/basic/afr/sparse-file-self-heal.t ..
not ok 52 Got  instead of 1
not ok 53 Got  instead of 1
not ok 54
not ok 55 Got 2 instead of 0
not ok 56 Got d41d8cd98f00b204e9800998ecf8427e instead of 
b6d81b360a5672d80c27430f39153e2c

not ok 60 Got 0 instead of 1
Failed 6/64 subtests
[11:56:45]

http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/6233/consoleFull 

There is a bug in statedump code path, If it races with STACK_RESET then 
shd seems to crash. I see the following output indicating the process died.


kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill 
-l [sigspec]



I have not done any analysis, and also the change request should not 
affect the paths that this test is failing on.


Checking the logs for Linux did not throw any more light on the cause, 
although the brick logs are not updated(?) to reflect the volume 
create and start as per the TC in (1).


Anyone know anything (more) about this?

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failure with sparse-file-heal.t test

2015-06-07 Thread Pranith Kumar Karampuri



On 06/05/2015 09:10 AM, Krishnan Parthasarathi wrote:


- Original Message -

This seems to happen because of race between STACK_RESET and stack
statedump. Still thinking how to fix it without taking locks around
writing to file.

Why should we still keep the stack being reset as part of pending pool of
frames? Even we if we had to (can't guess why?), when we remove we should do
the following to prevent gf_proc_dump_pending_frames from crashing.

...

call_frame_t *toreset = NULL;

LOCK (stack-pool-lock)
{
   toreset = stack-frames;
   stack-frames = NULL;
}
UNLOCK (stack-pool-lock);

...

Now, perform all operations that are done on stack-frames on toreset
instead. Thoughts?

Is there a reason you want to avoid locks here? STACK_DESTROY uses the
call_pool lock to remove the stack from the list of pending frames.
It is always better to prevent spin-locks while doing a slow operation 
like write. That is the only reasoning behind it.


Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failure with sparse-file-heal.t test

2015-06-07 Thread Pranith Kumar Karampuri



On 06/07/2015 05:40 PM, Pranith Kumar Karampuri wrote:



On 06/05/2015 09:10 AM, Krishnan Parthasarathi wrote:


- Original Message -

This seems to happen because of race between STACK_RESET and stack
statedump. Still thinking how to fix it without taking locks around
writing to file.
Why should we still keep the stack being reset as part of pending 
pool of
frames? Even we if we had to (can't guess why?), when we remove we 
should do

the following to prevent gf_proc_dump_pending_frames from crashing.

...

call_frame_t *toreset = NULL;

LOCK (stack-pool-lock)
{
   toreset = stack-frames;
   stack-frames = NULL;
}
UNLOCK (stack-pool-lock);

...

Now, perform all operations that are done on stack-frames on toreset
instead. Thoughts?

Is there a reason you want to avoid locks here? STACK_DESTROY uses the
call_pool lock to remove the stack from the list of pending frames.
It is always better to prevent spin-locks while doing a slow operation 
like write. That is the only reasoning behind it.
Seems like we are already inside pool-lock while doing statedump which 
does writes to files, so may be I shouldn't think too much :-/. I will 
take a look at your patch once.


Pranith


Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failure with sparse-file-heal.t test

2015-06-07 Thread Pranith Kumar Karampuri



On 06/05/2015 09:01 AM, Krishnan Parthasarathi wrote:

This seems to happen because of race between STACK_RESET and stack
statedump. Still thinking how to fix it without taking locks around
writing to file.

Why should we still keep the stack being reset as part of pending pool of
frames? Even we if we had to (can't guess why?), when we remove we should do
the following to prevent gf_proc_dump_pending_frames from crashing.
C stack actually gives up the memory it takes when the function call 
returns. But there was no such mechanism for gluster stacks before 
STACK_RESET. So for long running operations like BIG file self-heal, Big 
directory read etc we can keep RESETting the stack to prevent it to grow 
to a large size.


Pranith


...

call_frame_t *toreset = NULL;

LOCK (stack-pool-lock)
{
   toreset = stack-frames;
   stack-frames = NULL;
}
UNLOCK (stack-pool-lock);

...

Now, perform all operations that are done on stack-frames on toreset
instead. Thoughts?


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] self-heald.t failures

2015-06-08 Thread Pranith Kumar Karampuri



On 06/05/2015 04:01 PM, Anuradha Talur wrote:

gluster volume heal volname info doesn't seem to fail because
the process is crashing in afr_notify when invoked by glfs_fini.
As a result proper error codes are not being propagated.

Pranith had recently sent a patch : http://review.gluster.org/#/c/11001/
to not invoke glfs_fini in non-debug builds. Given that regression is
run on debug builds, we are observing the failure. I will send a patch
to temporarily not invoke glfs_fini in glfs-heal.c.
Sorry for the delayed response. I see that your patch is already merged. 
Did you get a chance to find why afr_notify is crashing? I would love to 
keep executing glfs_fini for DEBUG builds so that bugs are found as soon 
as possible in that code path.


Pranith


- Original Message -

From: Pranith Kumar Karampuri pkara...@redhat.com
To: Vijay Bellur vbel...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Thursday, June 4, 2015 3:03:02 PM
Subject: Re: [Gluster-devel] self-heald.t failures

Yeah, I am looking into it. Basically gluster volume heal volname info
must fail after volume stop. But sometimes it doesn't seem to :-(. Will
need some time to RC. Will update the list.

Pranith
On 06/04/2015 02:19 PM, Vijay Bellur wrote:

On 06/03/2015 10:30 AM, Vijay Bellur wrote:

self-heald.t seems to fail intermittently.

One such instance was seen recently [1]. Can somebody look into this
please?

./tests/basic/afr/self-heald.t (Wstat: 0 Tests: 83 Failed: 1) Failed
test: 78

Thanks,
Vijay

http://build.gluster.org/job/rackspace-regression-2GB-triggered/10029/consoleFull



One more failure of self-heald.t:

http://build.gluster.org/job/rackspace-regression-2GB-triggered/10092/consoleFull


-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression failures

2015-06-03 Thread Pranith Kumar Karampuri



On 06/03/2015 04:36 PM, Sachin Pandit wrote:

Hi,

http://review.gluster.org/#/c/11024/ failed in
tests/basic/volume-snapshot-clone.t testcase.
http://build.gluster.org/job/rackspace-regression-2GB-triggered/10057/consoleFull


http://review.gluster.org/#/c/11000/ failed in
tests/bugs/replicate/bug-979365.t testcase.
http://build.gluster.org/job/rackspace-regression-2GB-triggered/9985/consoleFull

It failed in gluster volume stop test:
volume stop: patchy: failed
volume start: patchy: failed: Volume patchy already started
./tests/bugs/replicate/../../volume.rc: line 201: kill: (18684) - No 
such process

umount: /mnt/glusterfs/0: not mounted
[08:04:56] ./tests/bugs/replicate/bug-979365.t ..

Atin,
 Could you help please.

Pranith



Seems like a spurious failure. Can anyone please
have a look at this.

Regards,
Sachin Pandit.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] answer_list in EC xlator

2015-06-03 Thread Pranith Kumar Karampuri



On 06/03/2015 09:21 PM, fanghuang.d...@yahoo.com wrote:

On Wednesday, 3 June 2015, 19:43, fanghuang.d...@yahoo.com 
fanghuang.d...@yahoo.com wrote:

On Wednesday, 3 June 2015, 15:22, Xavier Hernandez xhernan...@datalab.es

wrote:


On 06/03/2015 05:40 AM, Pranith Kumar Karampuri wrote:

  On 06/02/2015 08:08 PM, fanghuang.d...@yahoo.com wrote:

  Hi all,

  As I reading the source codes of EC xlator, I am confused by the
  cbk_list and answer_list defined in struct _ec_fop_data. Why do we
  need two lists to combine the results of callback?

  Especially for the answer_list, it is initialized
  in ec_fop_data_allocate, then the nodes are added
  in ec_cbk_data_allocate. Without being any accessed during the
  lifetime of fop, the whole list finally is released in

ec_fop_cleanup.

  Do I miss something for the answer_list?

  +Xavi.

  hi,
   The only reason I found is that It is easier to cleanup cbks using
  answers_list. You can check ec_fop_cleanup() function on latest master
  to check how this is.

You are right. Currently answer_list is only used to cleanup all cbks
received while cbk_list is used to track groups of consistent answers.
Although currently doesn't happen, if error coercing or special
attribute handling are implemented, it could be possible that one cbk
gets referenced more than once in cbk_list, making answer_list
absolutely necessary.


That's a good point to put all the cbks into one group and put those with
consistent answers into the other group. But this designing policy cannot
be understood easily from the comments, source codes or the list names
(cbk_list,
answer_list). Could we rename the cbk_list to consist_list or something else
easier to be followed?


  Combining of cbks is a bit involved until you
  understand it but once you do, it is amazing. I tried to add comments
  for this part of code and sent a patch, but we forgot to merge it :-)
  http://review.gluster.org/9982. If you think we can add more
  comments/change this part of code in a way it makes it easier, let us
  know. We would love your feedback :-). Wait for Xavi's response as

well.
This patch is much clearer. For the function ec_combine_update_groups,
since we only operate on one list, should we use ec_combine_update_group? The
word groups is confusing for readers who may think there are two or
more groups.



I got it finally. The cbk-list actually maintains multi-groups of the same 
answer sorted
by the count. As Xavi said, one cbk may exist in different groups. So we need an
answer_list to do cleanup job. Pranith's patch explains it clearly. Well it is
really amazing.
Told ya! :-). I will resend the patch with updated comments about how 
the groups work.


Pranith


--
Fang Huang


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failure with sparse-file-heal.t test

2015-06-04 Thread Pranith Kumar Karampuri
I see that statedump is generating core because of which this test 
spuriously fails. I am looking into it.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Valgrind + glusterfs

2015-06-25 Thread Pranith Kumar Karampuri



On 06/25/2015 12:53 PM, Venky Shankar wrote:

On Thu, Jun 25, 2015 at 9:57 AM, Pranith Kumar Karampuri
pkara...@redhat.com wrote:

hi,
Does anyone know why glusterfs hangs with valgrind?

/proc/pid/stack ?

That was giving futex_wait() CPU shoots up to 100%

Pranith



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] build failure after merging http://review.gluster.com/10448

2015-06-25 Thread Pranith Kumar Karampuri

hi,
 I merged a patch before a dependent patch by mistake which lead to 
build failure. Merged http://review.gluster.com/11413 to fix the same.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GF_FOP_IPC changes

2015-06-24 Thread Pranith Kumar Karampuri



On 06/24/2015 07:44 PM, Soumya Koduri wrote:



On 06/24/2015 10:14 AM, Krishnan Parthasarathi wrote:



- Original Message -
I've been looking at the recent patches to redirect GF_FOP_IPC to an 
active

subvolume instead of always to the first.  Specifically, these:

http://review.gluster.org/11346 for DHT
http://review.gluster.org/11347 for EC
http://review.gluster.org/11348 for AFR

I can't help but wonder if there's a simpler and more generic way to 
do this,
instead of having to do this in a translator-specific way each time 
- then
again for NSR, or for a separate tiering translator, and so on.  For 
example

what if each translator had a first_active_child callback?

xlator_t * (*first_active_child) (xlator_t *parent);

Then default_ipc could invoke this, if it exists, where it currently 
invokes
FIRST_CHILD.  Each translator could implement a bare minimum to 
select a

child, then step out of the way for a fop it really wasn't all that
interested in to begin with.  Any thoughts?


We should do this right away. This change doesn't affect external 
interfaces.
we should be bold and implement the first solution. Over time we 
could improve

on this.


+1. It would definitely ease the implementation of many such fops 
which have to default to first active child. We need not keep track of 
all the fops which may get affected with new clustering xlators being 
added.
I haven't seen the patches yet. Failures can happen just at the time of 
winding, leading to same failures. It at least needs to have the logic 
of picking next_active_child. EC needs to lock+xattrop the bricks to 
find bricks with good copies. AFR needs to perform getxattr to find good 
copies. Just giving more information to see if it helps.


Pranith


Thanks,
Soumya






___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GF_FOP_IPC changes

2015-06-24 Thread Pranith Kumar Karampuri



On 06/24/2015 08:26 PM, Jeff Darcy wrote:

I haven't seen the patches yet. Failures can happen just at the time of
winding, leading to same failures. It at least needs to have the logic
of picking next_active_child. EC needs to lock+xattrop the bricks to
find bricks with good copies. AFR needs to perform getxattr to find good
copies.

Is that really true?  I thought they each had a readily-available idea of
which children are up or down, which they already use e.g. for reads.
It knows which bricks are up/down. But they may not be the latest. Will 
that matter?


Pranith





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regresssion Failure (3.7 branch): afr-quota-xattr-mdata-heal.t

2015-06-25 Thread Pranith Kumar Karampuri

This is a known spurious failure.

Pranith
On 06/25/2015 11:14 AM, Kotresh Hiremath Ravishankar wrote:

Hi,

I see the above test case failing for my patch which is not related.
Could some one from AFR team look into it?
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11332/consoleFull


Thanks and Regards,
Kotresh H R



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GF_FOP_IPC changes

2015-06-24 Thread Pranith Kumar Karampuri



On 06/25/2015 02:49 AM, Jeff Darcy wrote:

It knows which bricks are up/down. But they may not be the latest. Will
that matter?

AFAIK it's sufficient at this point to know which are up/down.
In that case, we need two functions which give first active child and 
next_active_child in case of failure.


Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GF_FOP_IPC changes

2015-06-25 Thread Pranith Kumar Karampuri



On 06/25/2015 12:10 PM, Soumya Koduri wrote:



On 06/25/2015 09:00 AM, Pranith Kumar Karampuri wrote:



On 06/25/2015 02:49 AM, Jeff Darcy wrote:
It knows which bricks are up/down. But they may not be the latest. 
Will

that matter?

AFAIK it's sufficient at this point to know which are up/down.

In that case, we need two functions which give first active child and
next_active_child in case of failure.


Do you suggest then in all default_*_cbk(), on receiving ENOTCONN 
failure, we re-send fop to next_active_child?
Yeah, I think that would be more generic than depending on up-subvols of 
the cluster xlator.

1) In default_ipc(), wind it to first subvol.
2) If it gives ENOTCONN wind to next child as long as it is not the last 
child.


Pranith


Thanks,
Soumya



Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Valgrind + glusterfs

2015-06-25 Thread Pranith Kumar Karampuri
I tried EC volume with 2+1 config. dd of=a.txt if=/dev/urandom bs=128k 
count=1024 worked fine. When I increased bs=1M it hung. This is on my 
laptop.


Pranith

On 06/25/2015 10:32 AM, Krishnan Parthasarathi wrote:


- Original Message -

hi,
 Does anyone know why glusterfs hangs with valgrind?

When do you observe the hang? I started a single brick volume,
enabled valgrind on bricks and mounted it via fuse. I didn't
observe the mount hang. Could you share the set of steps which
lead to the hang?


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Valgrind + glusterfs

2015-06-24 Thread Pranith Kumar Karampuri

hi,
   Does anyone know why glusterfs hangs with valgrind?

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Public key problem on new vms for NetBSD

2015-06-19 Thread Pranith Kumar Karampuri

hi,
I see that NetBSD regressions are passing but not able to give +1 
because of following problem:
+ ssh 'nb7bu...@review.gluster.org' gerrit review --message 
''\''http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7046/consoleFull 
: SUCCESS'\''' --project=glusterfs --code-review=0 '--verified=+1' 
276ba2dbd076a2c4b86e8afd0eaf2db7376ea2a8

Permission denied (publickey).

I saw it happened for 2 of my patches:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7046/console
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7047/console

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t

2015-06-15 Thread Pranith Kumar Karampuri

Emmanuel,
   I am not sure of the feasibility but just wanted to ask you. Do 
you think there is a possibility to error out operations on the mount 
when mount crashes instead of hanging? That would prevent a lot of 
manual intervention even in future.


Pranith.
On 06/15/2015 01:35 PM, Niels de Vos wrote:

Hi,

sometimes the NetBSD regression tests hang with messages like this:

 [12:29:07] ./tests/basic/mgmt_v3-locks.t
 ... ok79867 ms
 No volumes present
 mount_nfs: can't access /patchy: Permission denied
 mount_nfs: can't access /patchy: Permission denied
 mount_nfs: can't access /patchy: Permission denied

Most (if not all) of these hangs are caused by a crashing Gluster/NFS
process. Once the Gluster/NFS server is not reachable anymore,
unmounting fails.

The only way to recover is to reboot the VM and retrigger the test. For
rebooting, the http://build.gluster.org/job/reboot-vm job can be used,
and retriggering works by clicking the retrigger link in the left menu
once the test has been marked as failed/aborted.

When logging in on the NetBSD system that hangs, you can verify with
these steps:

1. check if there is a /glusterfsd.core file
2. run gdb on the core:

 # cd /build/install
 # gdb --core=/glusterfsd.core sbin/glusterfs
 ...
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0xb9b94f0b in auth_cache_lookup (cache=0xb9aa2310, fh=0xb9044bf8,
 host_addr=0xb900e400 104.130.205.187, timestamp=0xbf7fd900,
 can_write=0xbf7fd8fc)
 at
 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/nfs/server/src/auth-cache.c:164
 164 *can_write = lookup_res-item-opts-rw;

3. verify the lookup_res structure:

 (gdb) p *lookup_res
 $1 = {timestamp = 1434284981, item = 0xb901e3b0}
 (gdb) p *lookup_res-item
 $2 = {name = 0xff00 error: Cannot access memory at address
 0xff00, opts = 0x}


A fix for this has been sent, it is currently waiting for an update to
the prosed reference counting:

   - http://review.gluster.org/11022
 core: add gf_ref_t for common refcounting structures
   - http://review.gluster.org/11023
 nfs: refcount each auth_cache_entry and related data_t

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failure in tests/bugs/glusterd/bug-963541.t

2015-06-10 Thread Pranith Kumar Karampuri


+gluster-devel
On 06/11/2015 10:22 AM, Pranith Kumar Karampuri wrote:

hi,
 Could you guys help in finding RCA for 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/10449/consoleFull 
failures in tests/bugs/glusterd/bug-963541.t


Pranith


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Unable to send patches to gerrit

2015-06-10 Thread Pranith Kumar Karampuri

Last time when this happened Kaushal/vijay fixed it if I remember correctly.
+kaushal +Vijay

Pranith
On 06/11/2015 10:38 AM, Anoop C S wrote:


On 06/11/2015 10:33 AM, Ravishankar N wrote:

I'm unable to push a patch on release-3.6, getting different
errors every time:


This happens for master too. I continuously get the following error:

error: unpack failed: error No space left on device


[ravi@tuxpad glusterfs]$ ./rfc.sh [detached HEAD a59646a] afr:
honour selfheal enable/disable volume set options Date: Sat May 30
10:23:33 2015 +0530 3 files changed, 108 insertions(+), 4
deletions(-) create mode 100644 tests/basic/afr/client-side-heal.t
Successfully rebased and updated
refs/heads/3.6_honour_heal_options. Counting objects: 11, done.
Delta compression using up to 4 threads. Compressing objects: 100%
(11/11), done. Writing objects: 100% (11/11), 1.77 KiB | 0 bytes/s,
done. Total 11 (delta 9), reused 0 (delta 0) *error: unpack failed:
error No space left on device** **fatal: Unpack error, check server
log* To ssh://itisr...@git.gluster.org/glusterfs.git ! [remote
rejected] HEAD - refs/for/release-3.6/bug-1230259 (n/a (unpacker
error)) error: failed to push some refs to
'ssh://itisr...@git.gluster.org/glusterfs.git' [ravi@tuxpad
glusterfs]$


[ravi@tuxpad glusterfs]$ ./rfc.sh [detached HEAD 8b28efd] afr:
honour selfheal enable/disable volume set options Date: Sat May 30
10:23:33 2015 +0530 3 files changed, 108 insertions(+), 4
deletions(-) create mode 100644 tests/basic/afr/client-side-heal.t
Successfully rebased and updated
refs/heads/3.6_honour_heal_options. *fatal: internal server
error** **fatal: Could not read from remote repository.** **
**Please make sure you have the correct access rights** **and the
repository exists.*


Anybody else facing problems? -Ravi



___ Gluster-devel
mailing list Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] answer_list in EC xlator

2015-06-02 Thread Pranith Kumar Karampuri



On 06/02/2015 08:08 PM, fanghuang.d...@yahoo.com wrote:

Hi all,

As I reading the source codes of EC xlator, I am confused by the 
cbk_list and answer_list defined in struct _ec_fop_data. Why do we 
need two lists to combine the results of callback?


Especially for the answer_list, it is initialized 
in ec_fop_data_allocate, then the nodes are added 
in ec_cbk_data_allocate. Without being any accessed during the 
lifetime of fop, the whole list finally is released in ec_fop_cleanup. 
Do I miss something for the answer_list?

+Xavi.

hi,
The only reason I found is that It is easier to cleanup cbks using 
answers_list. You can check ec_fop_cleanup() function on latest master 
to check how this is. Combining of cbks is a bit involved until you 
understand it but once you do, it is amazing. I tried to add comments 
for this part of code and sent a patch, but we forgot to merge it :-) 
http://review.gluster.org/9982. If you think we can add more 
comments/change this part of code in a way it makes it easier, let us 
know. We would love your feedback :-). Wait for Xavi's response as well.


Pranith

Regards,
Fang Huang


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] DHTv2 design discussion

2015-06-02 Thread Pranith Kumar Karampuri



On 06/03/2015 01:14 AM, Jeff Darcy wrote:

I've put together a document which I hope captures the most recent discussions 
I've had, particularly those in Barcelona.  Commenting should be open to 
anyone, so please feel free to weigh in before too much code is written.  ;)

https://docs.google.com/document/d/1nJuG1KHtzU99HU9BK9Qxoo1ib9VXf2vwVuHzVQc_lKg/edit?usp=sharing

Jeff,
 Do you guys have any date before which the comments need to be 
given? It helps in prioritizing with other work I have. I would love to 
make time and go through this in detail and ask questions.


Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to find total number of gluster mounts?

2015-06-02 Thread Pranith Kumar Karampuri



On 06/01/2015 11:07 AM, Bipin Kunal wrote:

Hi All,

  Is there a way to find total number of gluster mounts?

  If not, what would be the complexity for this RFE?

  As far as I understand finding the number of fuse mount should be possible 
but seems unfeasible for nfs and samba mounts.
True. Bricks have connections from each of the clients. Each of 
fuse/nfs/glustershd/quotad/glfsapi-based-clients(samba/glfsheal) would 
have separate client-context set on the bricks. So We can get this 
information. But like you said I am not sure how it can be done in nfs 
server/samba. Adding more people.


Pranith


  Please let me know your precious thoughts on this.

Thanks,
Bipin Kunal
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] tests/bugs/glusterd/bug-948686.t gave a core

2015-06-04 Thread Pranith Kumar Karampuri
Glustershd is crashing because afr wound xattrop with null gfid in loc. 
Could one of you look into this failure? 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/10095/consoleFull


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failure with sparse-file-heal.t test

2015-06-04 Thread Pranith Kumar Karampuri
This seems to happen because of race between STACK_RESET and stack 
statedump. Still thinking how to fix it without taking locks around 
writing to file.


Pranith
On 06/04/2015 02:13 PM, Pranith Kumar Karampuri wrote:
I see that statedump is generating core because of which this test 
spuriously fails. I am looking into it.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Only netbsd regressions seem to be triggered

2015-06-02 Thread Pranith Kumar Karampuri



On 06/03/2015 10:26 AM, Raghavendra Gowdappa wrote:

All,

It seems only netbsd regressions are triggered. Linux based regressions seems 
to be not triggered. I've observed this with two patches [1][2]. Pranith also 
feels same. Have any of you seen similar issue?
I saw it happen in reverse. I think the netbsd jobs on my patches failed 
more because it couldn't fetch the patch from gerrit. It does happen 
quite a bit though.


Pranith


[1]http://review.gluster.org/#/c/10943/
[2]http://review.gluster.org/#/c/10834/

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] git fetch is failing

2015-05-26 Thread Pranith Kumar Karampuri

hi,
 git fetch on local repo fails with the following error. I asked on 
#gluster-dev, some of the people online now face the same error.


pk1@localhost - ~/workspace/gerrit-repo (cooperative-locking-3.7)
08:54:14 :( ⚡ git fetch
ssh_exchange_identification: Connection closed by remote host
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Automated bug workflow

2015-05-29 Thread Pranith Kumar Karampuri



On 05/29/2015 10:41 PM, Nagaprasad Sathyanarayana wrote:

When similar automation was discussed, somebody had raised the concern when 
more than one patch is associated with a BZ. Either we keep 1:1 between BZ and 
patch. Otherwise the workflow needs to be improvised to inform gerrit when the 
last patch is submitted for a BZ so that the state can be appropriately changed.

Thoughts?


rfc.sh will ask if this patch is the last one for the bug, or if more patches 
are expected. Based on this input it acts on bugzilla.

Pranith



Thanks
Naga


On 29-May-2015, at 10:21 pm, Niels de Vos nde...@redhat.com wrote:

Hi all,

today we had a discussion about how to get the status of reported bugs
more correct and up to date. It is something that has come up several
times already, but now we have a BIG solution as Pranith calls it.

The goal is rather simple, but is requires some thinking about rules and
components that can actually take care of the automation.

The general user-visible results would be:

* rfc.sh will ask if this patch it the last one for the bug, or if more
   patches are expected
* Gerrit will receive the patch with the answer, and modify the status
   of the bug to POST
* when the patch is merged, Gerrit will change (or not) the status of
   the bug to MODIFIED
* when a nightly build is made, all bugs that have patches included and
   the status of the bug is MODIFIED, the build script will change the
   status to ON_QA and set a fixed in version

This is a simplified view, there are some other cases that we need to
take care of. These are documented in the etherpad linked below.

We value any input for this, Kaleb and Rafi already gave some, thanks!
Please let us know over email or IRC and we'll update the etherpad.

Thanks,
Pranith  Niels


Etherpad with detailed step by step actions to take:

https://public.pad.fsfe.org/p/gluster-automated-bug-workflow

IRC log, where the discussion started:

https://botbot.me/freenode/gluster-dev/2015-05-29/?msg=40450336page=2

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Automated bug workflow

2015-05-29 Thread Pranith Kumar Karampuri



On 05/29/2015 11:23 PM, Shyam wrote:

On 05/29/2015 12:51 PM, Niels de Vos wrote:

Hi all,

today we had a discussion about how to get the status of reported bugs
more correct and up to date. It is something that has come up several
times already, but now we have a BIG solution as Pranith calls it.

The goal is rather simple, but is requires some thinking about rules and
components that can actually take care of the automation.

The general user-visible results would be:

  * rfc.sh will ask if this patch it the last one for the bug, or if 
more

patches are expected
  * Gerrit will receive the patch with the answer, and modify the status
of the bug to POST


I like to do this manually.
Instead of just yes/no may be we should also let it accept an input 
'disable' so that no automated BUG state modifications are done.



  * when the patch is merged, Gerrit will change (or not) the status of
the bug to MODIFIED


I like to do this manually too... but automation does not hurt, esp. 
when I control when the bug moves to POST.
hmm... if we have the 'marker' to say 'disabled' even this part won't be 
automatically done when the patch is merged. ./rfc.sh needs to take more 
inputs about what kind of automation is needed and act occardingly i.e. 
don't do 'moving to POST' but if the bug is already in POST move it to 
MODIFIED etc.


Pranith.


  * when a nightly build is made, all bugs that have patches included 
and

the status of the bug is MODIFIED, the build script will change the
status to ON_QA and set a fixed in version


This I would like automated, as I am not tracking when it was released 
(of sorts). But, if I miss the nightly boat, I assume the automation 
would not pick this up, as a result automation on the MODIFIED step is 
good, as that would take care of this miss for me.




This is a simplified view, there are some other cases that we need to
take care of. These are documented in the etherpad linked below.

We value any input for this, Kaleb and Rafi already gave some, thanks!
Please let us know over email or IRC and we'll update the etherpad.


Overall, we can have all of this, but I guess I will possibly never 
use the POST automation and do that myself.
Is this a personal preference or you think improving something in the 
tool will persuade you to let the tool take care of moving to POST?


Pranith




Thanks,
Pranith  Niels


Etherpad with detailed step by step actions to take:

https://public.pad.fsfe.org/p/gluster-automated-bug-workflow

IRC log, where the discussion started:

https://botbot.me/freenode/gluster-dev/2015-05-29/?msg=40450336page=2

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] gluster builds are failing in rpmbuilding

2015-05-29 Thread Pranith Kumar Karampuri



On 05/30/2015 08:10 AM, Pranith Kumar Karampuri wrote:

I see that kaleb already sent a patch for this:
http://review.gluster.org/#/c/11007 - master
http://review.gluster.org/#/c/11008 - NetBSD

I meant http://review.gluster.org/#/c/11008 for release-3.7  :-)

Pranith


I am going to abandon my patch.

Pranith

On 05/30/2015 07:54 AM, Pranith Kumar Karampuri wrote:



On 05/30/2015 07:44 AM, Pranith Kumar Karampuri wrote:



On 05/30/2015 07:33 AM, Nagaprasad Sathyanarayana wrote:
It appears to me that glusterd-errno.h was added in the patch 
http://review.gluster.org/10313, which was merged on 29th. Please 
correct me if I am wrong.
I think it is supposed to be added to Makefile as well. Let me do 
some testing.

http://review.gluster.org/11010 fixes this.

Thanks a lot Naga :-)

Pranith


Pranith


Thanks
Naga

- Original Message -
From: Nagaprasad Sathyanarayana nsath...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Saturday, May 30, 2015 7:23:21 AM
Subject: Re: [Gluster-devel] gluster builds are failing in rpmbuilding

Could it be due to the compilation errors?

http://build.gluster.org/job/glusterfs-devrpms-el6/9019/ :
glusterd-locks.c:24:28: error: glusterd-errno.h: No such file or 
directory

   CC glusterd_la-glusterd-mgmt-handler.lo
glusterd-locks.c: In function 'glusterd_mgmt_v3_lock':
glusterd-locks.c:557: error: 'EANOTRANS' undeclared (first use in 
this function)
glusterd-locks.c:557: error: (Each undeclared identifier is 
reported only once

glusterd-locks.c:557: error: for each function it appears in.)
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1
make[5]: *** Waiting for unfinished jobs
make[4]: *** [all-recursive] Error 1
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build)
 Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build)
Child return code was: 1

http://build.gluster.org/job/glusterfs-devrpms/9141/ :
glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file 
or directory

  #include glusterd-errno.h
 ^
compilation terminated.
   CC glusterd_la-glusterd-mgmt-handler.lo
   CC glusterd_la-glusterd-mgmt.lo
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1

http://build.gluster.org/job/glusterfs-devrpms-el7/2179/ :
glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file 
or directory

  #include glusterd-errno.h
 ^
compilation terminated.
   CC glusterd_la-glusterd-mgmt.lo
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1
make[5]: *** Waiting for unfinished jobs
glusterd-mgmt.c:26:28: fatal error: glusterd-errno.h: No such file 
or directory

  #include glusterd-errno.h
 ^
compilation terminated.


Thanks
Naga

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Sent: Saturday, May 30, 2015 6:57:41 AM
Subject: [Gluster-devel] gluster builds are failing in rpmbuilding

hi,
  I don't understand rpmbuild logs that well. But the following 
seems

to be the issue:
Start: build phase for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Start: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Finish: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm)
Config(epel-6-x86_64) 1 minutes 5 seconds

Please feel free to take a look at the following links for sample 
runs:

http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console
http://build.gluster.org/job/glusterfs-devrpms/9141/console
http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Regression fails in tests/bugs/nfs/bug-904065.t

2015-05-29 Thread Pranith Kumar Karampuri

Niels,
 As per git you are author for the test above. Could you please 
help find RC for the failure. Log: 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/9812/consoleFull


I am going to re-trigger the build.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] gluster builds are failing in rpmbuilding

2015-05-29 Thread Pranith Kumar Karampuri

hi,
I don't understand rpmbuild logs that well. But the following seems 
to be the issue:

Start: build phase for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Start: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Finish: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm) 
Config(epel-6-x86_64) 1 minutes 5 seconds


Please feel free to take a look at the following links for sample runs:
http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console
http://build.gluster.org/job/glusterfs-devrpms/9141/console
http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] gluster builds are failing in rpmbuilding

2015-05-29 Thread Pranith Kumar Karampuri



On 05/30/2015 07:44 AM, Pranith Kumar Karampuri wrote:



On 05/30/2015 07:33 AM, Nagaprasad Sathyanarayana wrote:
It appears to me that glusterd-errno.h was added in the patch 
http://review.gluster.org/10313, which was merged on 29th. Please 
correct me if I am wrong.
I think it is supposed to be added to Makefile as well. Let me do some 
testing.

http://review.gluster.org/11010 fixes this.

Thanks a lot Naga :-)

Pranith


Pranith


Thanks
Naga

- Original Message -
From: Nagaprasad Sathyanarayana nsath...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Saturday, May 30, 2015 7:23:21 AM
Subject: Re: [Gluster-devel] gluster builds are failing in rpmbuilding

Could it be due to the compilation errors?

http://build.gluster.org/job/glusterfs-devrpms-el6/9019/ :
glusterd-locks.c:24:28: error: glusterd-errno.h: No such file or 
directory

   CC glusterd_la-glusterd-mgmt-handler.lo
glusterd-locks.c: In function 'glusterd_mgmt_v3_lock':
glusterd-locks.c:557: error: 'EANOTRANS' undeclared (first use in 
this function)
glusterd-locks.c:557: error: (Each undeclared identifier is reported 
only once

glusterd-locks.c:557: error: for each function it appears in.)
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1
make[5]: *** Waiting for unfinished jobs
make[4]: *** [all-recursive] Error 1
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build)
 Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build)
Child return code was: 1

http://build.gluster.org/job/glusterfs-devrpms/9141/ :
glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file 
or directory

  #include glusterd-errno.h
 ^
compilation terminated.
   CC glusterd_la-glusterd-mgmt-handler.lo
   CC glusterd_la-glusterd-mgmt.lo
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1

http://build.gluster.org/job/glusterfs-devrpms-el7/2179/ :
glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file 
or directory

  #include glusterd-errno.h
 ^
compilation terminated.
   CC glusterd_la-glusterd-mgmt.lo
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1
make[5]: *** Waiting for unfinished jobs
glusterd-mgmt.c:26:28: fatal error: glusterd-errno.h: No such file or 
directory

  #include glusterd-errno.h
 ^
compilation terminated.


Thanks
Naga

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Sent: Saturday, May 30, 2015 6:57:41 AM
Subject: [Gluster-devel] gluster builds are failing in rpmbuilding

hi,
  I don't understand rpmbuild logs that well. But the following 
seems

to be the issue:
Start: build phase for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Start: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Finish: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm)
Config(epel-6-x86_64) 1 minutes 5 seconds

Please feel free to take a look at the following links for sample runs:
http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console
http://build.gluster.org/job/glusterfs-devrpms/9141/console
http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] gluster builds are failing in rpmbuilding

2015-05-29 Thread Pranith Kumar Karampuri



On 05/30/2015 07:33 AM, Nagaprasad Sathyanarayana wrote:

It appears to me that glusterd-errno.h was added in the patch 
http://review.gluster.org/10313, which was merged on 29th. Please correct me if 
I am wrong.
I think it is supposed to be added to Makefile as well. Let me do some 
testing.


Pranith


Thanks
Naga

- Original Message -
From: Nagaprasad Sathyanarayana nsath...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Saturday, May 30, 2015 7:23:21 AM
Subject: Re: [Gluster-devel] gluster builds are failing in rpmbuilding

Could it be due to the compilation errors?

http://build.gluster.org/job/glusterfs-devrpms-el6/9019/ :
glusterd-locks.c:24:28: error: glusterd-errno.h: No such file or directory
   CC glusterd_la-glusterd-mgmt-handler.lo
glusterd-locks.c: In function 'glusterd_mgmt_v3_lock':
glusterd-locks.c:557: error: 'EANOTRANS' undeclared (first use in this function)
glusterd-locks.c:557: error: (Each undeclared identifier is reported only once
glusterd-locks.c:557: error: for each function it appears in.)
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1
make[5]: *** Waiting for unfinished jobs
make[4]: *** [all-recursive] Error 1
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build)
 Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build)
Child return code was: 1

http://build.gluster.org/job/glusterfs-devrpms/9141/ :
glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory
  #include glusterd-errno.h
 ^
compilation terminated.
   CC glusterd_la-glusterd-mgmt-handler.lo
   CC glusterd_la-glusterd-mgmt.lo
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1

http://build.gluster.org/job/glusterfs-devrpms-el7/2179/ :
glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory
  #include glusterd-errno.h
 ^
compilation terminated.
   CC glusterd_la-glusterd-mgmt.lo
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1
make[5]: *** Waiting for unfinished jobs
glusterd-mgmt.c:26:28: fatal error: glusterd-errno.h: No such file or directory
  #include glusterd-errno.h
 ^
compilation terminated.


Thanks
Naga

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Sent: Saturday, May 30, 2015 6:57:41 AM
Subject: [Gluster-devel] gluster builds are failing in rpmbuilding

hi,
  I don't understand rpmbuild logs that well. But the following seems
to be the issue:
Start: build phase for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Start: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Finish: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm)
Config(epel-6-x86_64) 1 minutes 5 seconds

Please feel free to take a look at the following links for sample runs:
http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console
http://build.gluster.org/job/glusterfs-devrpms/9141/console
http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] gluster builds are failing in rpmbuilding

2015-05-29 Thread Pranith Kumar Karampuri



On 05/30/2015 09:20 AM, Avra Sengupta wrote:
That is because the patch that introduces glusterd-errno.h is not yet 
merged in 3.7. So glusterd-errno.h is still not present int release 
3.7. I will update the patch introducing the header file itself with 
the required change, and will abandon http://review.gluster.org/#/c/11008
Thanks avra. Seems like I cloned a branch from master but named it 3.7 
haha :-D.


Pranith


Regards,
Avra

On 05/30/2015 08:29 AM, Pranith Kumar Karampuri wrote:



On 05/30/2015 08:11 AM, Pranith Kumar Karampuri wrote:



On 05/30/2015 08:10 AM, Pranith Kumar Karampuri wrote:

I see that kaleb already sent a patch for this:
http://review.gluster.org/#/c/11007 - master
http://review.gluster.org/#/c/11008 - NetBSD

I meant http://review.gluster.org/#/c/11008 for release-3.7 :-)

This fails in smoke with the following failure :-(.
make[4]: *** No rule to make target `glusterd-errno.h', needed by 
`all-am'.  Stop.

make[4]: *** Waiting for unfinished jobs

On my laptop it succeeds though :-/. Any clues?

Pranith


Pranith


I am going to abandon my patch.

Pranith

On 05/30/2015 07:54 AM, Pranith Kumar Karampuri wrote:



On 05/30/2015 07:44 AM, Pranith Kumar Karampuri wrote:



On 05/30/2015 07:33 AM, Nagaprasad Sathyanarayana wrote:
It appears to me that glusterd-errno.h was added in the patch 
http://review.gluster.org/10313, which was merged on 29th. 
Please correct me if I am wrong.
I think it is supposed to be added to Makefile as well. Let me do 
some testing.

http://review.gluster.org/11010 fixes this.

Thanks a lot Naga :-)

Pranith


Pranith


Thanks
Naga

- Original Message -
From: Nagaprasad Sathyanarayana nsath...@redhat.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Saturday, May 30, 2015 7:23:21 AM
Subject: Re: [Gluster-devel] gluster builds are failing in 
rpmbuilding


Could it be due to the compilation errors?

http://build.gluster.org/job/glusterfs-devrpms-el6/9019/ :
glusterd-locks.c:24:28: error: glusterd-errno.h: No such file or 
directory

   CC glusterd_la-glusterd-mgmt-handler.lo
glusterd-locks.c: In function 'glusterd_mgmt_v3_lock':
glusterd-locks.c:557: error: 'EANOTRANS' undeclared (first use 
in this function)
glusterd-locks.c:557: error: (Each undeclared identifier is 
reported only once

glusterd-locks.c:557: error: for each function it appears in.)
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1
make[5]: *** Waiting for unfinished jobs
make[4]: *** [all-recursive] Error 1
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build)
 Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build)
Child return code was: 1

http://build.gluster.org/job/glusterfs-devrpms/9141/ :
glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such 
file or directory

  #include glusterd-errno.h
 ^
compilation terminated.
   CC glusterd_la-glusterd-mgmt-handler.lo
   CC glusterd_la-glusterd-mgmt.lo
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1

http://build.gluster.org/job/glusterfs-devrpms-el7/2179/ :
glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such 
file or directory

  #include glusterd-errno.h
 ^
compilation terminated.
   CC glusterd_la-glusterd-mgmt.lo
make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1
make[5]: *** Waiting for unfinished jobs
glusterd-mgmt.c:26:28: fatal error: glusterd-errno.h: No such 
file or directory

  #include glusterd-errno.h
 ^
compilation terminated.


Thanks
Naga

- Original Message -
From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Sent: Saturday, May 30, 2015 6:57:41 AM
Subject: [Gluster-devel] gluster builds are failing in rpmbuilding

hi,
  I don't understand rpmbuild logs that well. But the 
following seems

to be the issue:
Start: build phase for 
glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Start: build setup for 
glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
Finish: build setup for 
glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm

Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm
ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm)
Config(epel-6-x86_64) 1 minutes 5 seconds

Please feel free to take a look at the following links for 
sample runs:

http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console
http://build.gluster.org/job/glusterfs-devrpms/9141/console
http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http

Re: [Gluster-devel] New regression failure with EC

2015-05-27 Thread Pranith Kumar Karampuri

I am looking into it.

Pranith

On 05/28/2015 11:03 AM, Kaushal M wrote:

Got a EC test failure ( ./tests/bugs/disperse/bug-1161621.t) on
http://build.gluster.org/job/rackspace-regression-2GB-triggered/9628/consoleFull

The change being tested was a pure GlusterD change, so this is most
likely a (new?) spurious failure.

~kaushal


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ec/self-heal.t failure

2015-06-01 Thread Pranith Kumar Karampuri



On 06/02/2015 10:40 AM, Krishnan Parthasarathi wrote:

ec/self-heal.t failed regression reporting: not ok 71 Got -rw---
instead of -rw-r--r-- (regression had passed with earlier patchset).

Console output is:
http://build.gluster.org/job/rackspace-regression-2GB-triggered/9881/consoleFull

Mind having a look?

ec/self-heal.t failed for me on release-3.7 branch.
See 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/9978/consoleFull

snip
./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 4)
   Failed tests:  104, 115, 148, 159
/snip

Would it help if we added this to is_bad_test() until it's root caused?

http://review.gluster.org/11018 is the fix on master.
http://review.gluster.org/11027 on release-3.7

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-01 Thread Pranith Kumar Karampuri

hi,
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull 
has the logs. Could you please look into it.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Unable to send patches to review.gluster.org

2015-07-01 Thread Pranith Kumar Karampuri

I get the following error:
error: unpack failed: error No space left on device
fatal: Unpack error, check server log

Pranith

On 07/02/2015 09:58 AM, Atin Mukherjee wrote:

+ Infra, can any one of you just take a look at it?

On 07/02/2015 09:53 AM, Anuradha Talur wrote:

Hi,

I'm unable to send patches to r.g.o, also not able to login.
I'm getting the following errors respectively:
1)
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

2) Internal server error or forbidden access.

Is anyone else facing the same issue?



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

2015-07-02 Thread Pranith Kumar Karampuri

Thanks Dan!.

Pranith

On 07/02/2015 06:14 PM, Dan Lambright wrote:

I'll check on this.

- Original Message -

From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org, Joseph Fernandes 
josfe...@redhat.com
Sent: Thursday, July 2, 2015 5:40:34 AM
Subject: [Gluster-devel] Failure in 
tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

hi Joseph,
 Could you take a look at
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11842/consoleFull

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Mount hangs because of connection delays

2015-07-02 Thread Pranith Kumar Karampuri

hi,
When glusterfs mount process is coming up all cluster xlators wait 
for at least one event from all the children before propagating the 
status upwards. Sometimes client xlator takes upto 2 minutes to 
propogate this 
event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to 
this xavi implemented timer in ec notify where we treat a child as down 
if it doesn't come up in 10 seconds. Similar patch went up for review 
@http://review.gluster.org/#/c/3 for afr. Kritika raised an 
interesting point in the review that all cluster xlators need to have 
this logic for the mount to not hang, and the correct place to fix it 
would be client xlator itself. i.e. add the timer logic in client 
xlator. Which seems like a better approach. I just want to take inputs 
from everyone before we go ahead in that direction.
i.e. on PARENT_UP in client xlator it will start a timer and if no rpc 
notification is received in that timeout it treats the client xlator as 
down.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] regarding mem_0filled, iov_0filled and memdup

2015-05-21 Thread Pranith Kumar Karampuri

hi,
   These functions return 0 when 0filled and non-zero value when not 
0filled. This is quite unintuitive as people think that it should return 
_gf_true when 0filled and false when it is not 0filled. This comes up as 
bug in reviews quite a few times, so decided may be it is better to 
change the api itself. What do you say?


   Along the same lines is memdup. It is a function in common-utils 
which does GF_CALLOC, so the memory needs to be freed with GF_FREE. But 
since it sounds so much like standard api, I have seen people do free 
instead of GF_FREE. May be it is better to change it to gf_memdup?


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] tests/bugs/fuse/bug-924726.t spurious failure

2015-05-26 Thread Pranith Kumar Karampuri

hi,
 tests/bugs/fuse/bug-924726.t failed in 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/9553/consoleFull


Could you take a look.

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2015-08-19 Thread Pranith Kumar Karampuri

+ Ravi, Anuradha

On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote:

All,

Pranith and me were discussing about implementation of compound operations like create + lock, 
mkdir + lock, open + lock etc. These operations are useful in situations like:

1. To prevent locking on all subvols during directory creation as part of self 
heal in dht. Currently we are following approach of locking _all_ subvols by 
both rmdir and lookup-heal [1].
2. To lock a file in advance so that there is less performance hit during 
transactions in afr.

While thinking about implementing such compound operations, it occurred to me 
that one of the problems would be how do we handle a racing mkdir/create and a 
(named lookup - simply referred as lookup from now on - followed by lock). This 
is because,
1. creation of directory/file on backend
2. linking of the inode with the gfid corresponding to that file/directory

are not atomic. It is not guaranteed that inode passed down during mkdir/create 
call need not be the one that survives in inode table. Since posix-locks xlator 
maintains all the lock-state in inode, it would be a problem if a different 
inode is linked in inode table than the one passed during mkdir/create. One way 
to solve this problem is to serialize fops (like mkdir/create, lookup, rename, 
rmdir, unlink) that are happening on a particular dentry. This serialization 
would also solve other bugs like:

1. issues solved by [2][3] and possibly many such issues.
2. Stale dentries left out in bricks' inode table because of a racing lookup 
and dentry modification ops (like rmdir, unlink, rename etc).

Initial idea I've now is to maintain fops in-progress on a dentry in parent 
inode (may be resolver code in protocol/server). Based on this we can serialize 
the operations. Since we need to serialize _only_ operations on a dentry (we 
don't serialize nameless lookups), it is guaranteed that we do have a parent 
inode always. Any comments/discussion on this would be appreciated.

[1] http://review.gluster.org/11725
[2] http://review.gluster.org/9913
[3] http://review.gluster.org/5240

regards,
Raghavendra.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Justin's last day at Red Hat today ;)

2015-06-30 Thread Pranith Kumar Karampuri

All the best Justin!

Pranith

On 06/30/2015 08:11 PM, Justin Clift wrote:

Hi us,

It's my last day at Red Hat today, so I've just adjusted the
jus...@gluster.org email address to redirect things to
jus...@postgresql.org instead.  So, people can still email
me.

I do have some Gluster things I'd like to finish off, it's just
I need a bit of a break first. ;)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] reviving spurious failures tracking

2015-07-29 Thread Pranith Kumar Karampuri

hi,
I just updated 
https://public.pad.fsfe.org/p/gluster-spurious-failures with the latest 
spurious failures we saw in linux and NetBSD regressions. Could you guys 
update with any more spurious regressions that you guys are observing 
but not listed on the pad. Could you guys help in fixing these issues 
fast as the number of failures is increasing quite a bit nowadays.



Tests to be fixed (Linux)
tests/bugs/distribute/bug-1066798.t 
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/12908/console) 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12908/console%29(http://build.gluster.org/job/rackspace-regression-2GB-triggered/12907/console) 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12907/console%29
tests/bitrot/bug-1244613.t 
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/12906/console) 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12906/console%29
tests/bugs/snapshot/bug-1109889.t 
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/12905/console) 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12905/console%29
tests/bugs/replicate/bug-1238508-self-heal.t 
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/12904/console) 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12904/console%29
tests/basic/nufa.t 
(http://build.gluster.org/job/rackspace-regression-2GB-triggered/12902/console) 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12902/console%29


On NetBSD:
tests/basic/mount-nfs-auth.t 
(http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8796/console) 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8796/console%29
tests/basic/tier/tier-attach-many.t 
(http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8789/console) 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8789/console%29
tests/basic/afr/arbiter.t 
(http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8785/console) 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8785/console%29
tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t 
(http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8784/console) 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8784/console%29
tests/basic/quota.t 
(http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8780/console) 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8780/console%29


First step is to move the tests above to Tests being looked at: (please 
put your name against the test you are looking into): section by the 
respective developers.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] reviving spurious failures tracking

2015-07-29 Thread Pranith Kumar Karampuri



On 07/29/2015 06:10 PM, Emmanuel Dreyfus wrote:

On Wed, Jul 29, 2015 at 04:06:43PM +0530, Vijay Bellur wrote:

- If there are tests that cannot be fixed easily in the near term, we move
such tests to a different folder or drop such test units.

A tests/disabled directory seemsthe way to go. But before going there,
the test maintaniner should be notified. Perhaps we should have a list
of contacts ina comment on the topof each test?

Jeff already implemented bad-tests infra already. We can use the same?

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] pluggability of some aspects in afr/nsr/ec

2015-10-29 Thread Pranith Kumar Karampuri

hi,
 I want to understand how are you guys planning to integrate 
NSR volumes to the existing CLIs. Here are some thoughts I had, wanted 
to know your thoughts:

At the heart of both the replication/ec schemes we have
1) synchronization mechanisms
   a) afr,ec does it using locks
   b) nsr does it using leader election
2) Metadata to figure out the healing/reconciliation aspects
   a) afr,ec does it using xattrs
   b) nsr does it using journals

I want to understand if there is a possibility of exposing these as 
different modules that we can mix and match, using options. If the users 
choose 1b, 2b it becomes nsr and 1a, 2a becomes afr/ec. In future if we 
come up with better metadata journals/stores it should be easy to plug 
them is what I'm thinking. The idea I have is based on the workload, 
users should be able to decide which pair of synchronization/metadata 
works best for them (Or we can also recommend based on our tests). 
Wanted to seek your inputs.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] pluggability of some aspects in afr/nsr/ec

2015-10-29 Thread Pranith Kumar Karampuri



On 10/29/2015 12:18 PM, Venky Shankar wrote:

On Thu, Oct 29, 2015 at 11:36 AM, Pranith Kumar Karampuri
<pkara...@redhat.com> wrote:

hi,
  I want to understand how are you guys planning to integrate NSR
volumes to the existing CLIs. Here are some thoughts I had, wanted to know
your thoughts:
At the heart of both the replication/ec schemes we have
1) synchronization mechanisms
a) afr,ec does it using locks
b) nsr does it using leader election
2) Metadata to figure out the healing/reconciliation aspects
a) afr,ec does it using xattrs
b) nsr does it using journals

I want to understand if there is a possibility of exposing these as
different modules that we can mix and match, using options. If the users

Do you mean abstracting it out during volume creation? At a high level
this could be in the form of client or server
side replication. Not that AFR cannot be used on the server side
(you'd know better than me), but, if at all this level
of abstraction is used, we'd need to default to what fits best in what
use case (as you already mentioned below)
but still retaining the flexibility to override it.
precisely. I think switching is not that difficult once we make sure 
healing is complete. Switching is a rare operation IMO so we can 
probably ask the users to do stop/choose-new-value/start the volume 
after choosing the options. This way is simpler than to migrate between 
the volumes where you have to probably copy the data.


Pranith



choose 1b, 2b it becomes nsr and 1a, 2a becomes afr/ec. In future if we come
up with better metadata journals/stores it should be easy to plug them is
what I'm thinking. The idea I have is based on the workload, users should be
able to decide which pair of synchronization/metadata works best for them
(Or we can also recommend based on our tests). Wanted to seek your inputs.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] pluggability of some aspects in afr/nsr/ec

2015-10-30 Thread Pranith Kumar Karampuri



On 10/29/2015 06:11 PM, Jeff Darcy wrote:

I want to understand if there is a possibility of exposing these as
different modules that we can mix and match, using options.

It’s not only possible, but it’s easier than you might think.  If an
option is set (cluster.nsr IIRC) then we replace cluster/afr with
cluster/nsr-client and then add some translators to the server-side
stack.  A year ago that was just one nsr-server translator.  The
journaling part has already been split out, and I plan to do the same
with the leader-election parts (making them usable for server-side AFR
or EC) as well.  It shouldn’t be hard to control the addition and
removal of these and related translators (e.g. index) with multiple
options instead of just one.  The biggest stumbling block I’ve actually
hit when trying to do this with AFR on the server side is the *tests*,
many of which can’t handle delays on the client side while the server
side elects leaders and cross-connects peers.  That’s all solvable.  It
just would have taken more time than I had available for the experiment.


precisely. I think switching is not that difficult once we make sure
healing is complete. Switching is a rare operation IMO so we can
probably ask the users to do stop/choose-new-value/start the volume
after choosing the options. This way is simpler than to migrate
between the volumes where you have to probably copy the data.

The two sets of metadata are *entirely* disjoint, which puts us in a
good position compared e.g. to DHT/tiering which had overlaps.  As long
as the bricks are “clean” switching back and forth should be simple.  In
fact I expect to do this a lot when we get to characterizing performance
etc.

Good to hear this.



choose 1b, 2b it becomes nsr and 1a, 2a becomes afr/ec. In future
if we come up with better metadata journals/stores it should be
easy to plug them is what I'm thinking. The idea I have is based on
the workload, users should be able to decide which pair of
synchronization/metadata works best for them (Or we can also
recommend based on our tests). Wanted to seek your inputs.

Absolutely.  As I’m sure you’re tired of hearing, I believe NSR will
outperform AFR by a significant margin for most workloads and
configurations.  I wouldn’t be the project’s initiator/leader if I
didn’t believe that, but I’m OK if others disagree.  We’ll find out
eventually.  ;)  More importantly, “most” is still not “all”.  Even by
my own reckoning, there are cases in which AFR will perform better or be
preferable for other reasons.  EC’s durability and space-efficiency
advantages make an even stronger case for preserving both kinds of data
paths and metadata arrangements.  That’s precisely why I want to make
the journaling and leader-election parts more generic.

All the best for your endeavors! Lets make users happy.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] spurious failure in ./tests/basic/ec/ec-readdir.t

2015-11-02 Thread Pranith Kumar Karampuri
Thanks Gaurav, Xavi is already looking into it. Meanwhile a patch to 
mark it bad test is already posted for review: 
http://review.gluster.org/#/c/12481/


Pranith
On 11/02/2015 06:21 PM, Gaurav Garg wrote:

Hi

  ./tests/basic/ec/ec-readdir.t test case. seems to be spurious failure in ec

https://build.gluster.org/job/rackspace-regression-2GB-triggered/15395/consoleFull

https://build.gluster.org/job/rackspace-regression-2GB-triggered/15388/consoleFull

https://build.gluster.org/job/rackspace-regression-2GB-triggered/15386/consoleFull



ccing ec team members.


Thanx,

~Gaurav


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.7.5 released

2015-10-14 Thread Pranith Kumar Karampuri

Hi all,

I'm pleased to announce the release of GlusterFS-3.7.5. This release
includes 70 changes after 3.7.4. The list of fixed bugs is included
below.

Tarball and RPMs can be downloaded from
http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.5/

Ubuntu debs are available from
https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.7

Debian Unstable (sid) packages have been updated and should be
available from default repos.

NetBSD has updated ports at
ftp://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/filesystems/glusterfs/README.html


Upgrade notes from 3.7.2 and earlier

GlusterFS uses insecure ports by default from release v3.7.3. This
causes problems when upgrading from release 3.7.2 and below to 3.7.3
and above. Performing the following steps before upgrading helps avoid
problems.

- Enable insecure ports for all volumes.

 ```
 gluster volume set  server.allow-insecure on
 gluster volume set  client.bind-insecure on
 ```

- Enable insecure ports for GlusterD. Set the following line in
`/etc/glusterfs/glusterd.vol`

 ```
 option rpc-auth-allow-insecure on
 ```

 This needs to be done on all the members in the cluster.


Fixed bugs
==
1258313 - Start self-heal and display correct heal info after replace brick
1268804 - Test tests/bugs/shard/bug-1245547.t failing consistently when run 
with patch http://review.gluster.org/#/c/11938/
1261234 - Possible memory leak during rebalance with large quantity of files
1259697 - Disperse volume: Huge memory leak of glusterfsd process
1267817 - No quota API to get real hard-limit value.
1267822 - Have a way to disable readdirp on dht from glusterd volume set command
1267823 - Perf: Getting bad performance while doing ls
1267532 - Data Tiering:CLI crashes with segmentation fault when user tries "gluster 
v tier" command
1267149 - Perf: Getting bad performance while doing ls
1266822 - Add more logs in failure code paths + port existing messages to the 
msg-id framework
1262335 - Fix invalid logic in tier.t
1251821 - /usr/lib/glusterfs/ganesha/ganesha_ha.sh is distro specific
1258338 - Data Tiering: Tiering related information is not displayed in gluster 
volume info xml output
1266872 - FOP handling during file migration is broken in the release-3.7 
branch.
1266882 - RFE: posix: xattrop 'GF_XATTROP_ADD_DEF_ARRAY' implementation
1246397 - POSIX ACLs as used by a FUSE mount can not use more than 32 groups
1265633 - AFR : "gluster volume heal  dest=:1.65 reply_serial=2"
1265890 - rm command fails with "Transport end point not connected" during add 
brick
1261444 - cli : volume start will create/overwrite ganesha export file
1258347 - Data Tiering: Tiering related information is not displayed in gluster 
volume status xml output
1258340 - Data Tiering:Volume task status showing as remove brick when detach 
tier is trigger
1260919 - Quota+Rebalance : While rebalance is in progress , quota list shows 
'Used Space' more than the Hard Limit set
1264738 - 'gluster v tier/attach-tier/detach-tier help' command shows the 
usage, and then throws 'Tier command failed' error message
1262700 - DHT + rebalance :- file permission got changed (sticky bit and setgid 
is set) after file migration failure
1263191 - Error not propagated correctly if selfheal layout lock fails
1258244 - Data Tieirng:Change error message as detach-tier error message throws as 
"remove-brick"
1263746 - Data Tiering:Setting only promote frequency and no demote frequency 
causes crash
1262408 - Data Tieirng:Detach tier status shows number of failures even when 
all files are migrated successfully
1262547 - `getfattr -n replica.split-brain-status ' command hung on the 
mount
1262547 - `getfattr -n replica.split-brain-status ' command hung on the 
mount
1262344 - quota: numbers of warning messages in nfs.log a single file itself
1260858 - glusterd: volume status backward compatibility
1261742 - Tier: glusterd crash when trying to detach , when hot tier is having 
exactly one brick and cold tier is of replica type
1262197 - DHT: Few files are missing after remove-brick operation
1261008 - Do not expose internal sharding xattrs to the application.
1262341 - Database locking due to write contention between CTR sql connection 
and tier migrator sql connection
1261715 - [HC] Fuse mount crashes, when client-quorum is not met
1260511 - fuse client crashed during i/o
1261664 - Tiering status command is very cumbersome.
1259694 - Data Tiering:Regression:Commit of detach tier passes without directly 
without even issuing a detach tier start
1260859 - snapshot: from nfs-ganesha mount no content seen in 
.snaps/ directory
1260856 - xml output for volume status on tiered volume
1260593 - man or info page of gluster needs to be updated with self-heal 
commands.
1257394 - Provide more meaningful errors on peer probe and peer detach
1258769 - Porting log messages to new framework
1255110 - client is sending io to arbiter with replica 2
1259652 - quota test 'quota-nfs.t' 

Re: [Gluster-devel] Backup support for GlusterFS

2015-10-15 Thread Pranith Kumar Karampuri

Probably a good question on gluster-users (CCed)

Pranith

On 10/14/2015 03:57 AM, Brian Lahoue wrote:
Has anyone tested backing up a fairly large Gluster implementation 
with Amanda/ZManda recently?









___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [release-3.7] seeing multiple crashes with 3.7 linux regression

2015-10-07 Thread Pranith Kumar Karampuri



On 10/07/2015 05:48 PM, Pranith Kumar Karampuri wrote:

Sent the fix @http://review.gluster.org/12309

This fixes the afr issue. Will take a look at the other crash.


Pranith

On 10/07/2015 05:38 PM, Vijaikumar Mallikarjuna wrote:

*https://build.gluster.org/job/rackspace-regression-2GB-triggered/14753/consoleFull*
#gdb -ex 'set sysroot ./' -ex 'core-file 
./build/install/cores/core.7122'  /build/install/sbin/glusterfsd


Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00407eae in emancipate (ctx=0x0, ret=-1) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:1329
1329/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c: 
No such file or directory.

(gdb) bt
#0  0x00407eae in emancipate (ctx=0x0, ret=-1) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:1329
#1  0x0040f806 in mgmt_pmap_signin_cbk (req=0x7f51e806fdec, 
iov=0x7f51eece15e0, count=1, myframe=0x7f51e806ee1c) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd-mgmt.c:2174
#2  0x7f51fab1a6c7 in saved_frames_unwind 
(saved_frames=0x1535fb0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:366
#3  0x7f51fab1a766 in saved_frames_destroy (frames=0x1535fb0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:383
#4  0x7f51fab1abf8 in rpc_clnt_connection_cleanup 
(conn=0x1534b70) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:536
#5  0x7f51fab1b670 in rpc_clnt_notify (trans=0x1534fc0, 
mydata=0x1534b70, event=RPC_TRANSPORT_DISCONNECT, data=0x1534fc0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:856
#6  0x7f51fab17af3 in rpc_transport_notify (this=0x1534fc0, 
event=RPC_TRANSPORT_DISCONNECT, data=0x1534fc0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-transport.c:544
#7  0x7f51f0305621 in socket_event_poll_err (this=0x1534fc0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:1151
#8  0x7f51f030a34c in socket_event_handler (fd=9, idx=1, 
data=0x1534fc0, poll_in=1, poll_out=0, poll_err=24) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2356
#9  0x7f51fadcb7c0 in event_dispatch_epoll_handler 
(event_pool=0x14fac90, event=0x7f51eece1e70) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:575
#10 0x7f51fadcbbae in event_dispatch_epoll_worker 
(data=0x1536180) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:678

#11 0x7f51fa032a51 in start_thread () from ./lib64/libpthread.so.0
#12 0x7f51f999c93d in clone () from ./lib64/libc.so.6


*https://build.gluster.org/job/rackspace-regression-2GB-triggered/14748/consoleFull*
#gdb -ex 'set sysroot ./' -ex 'core-file 
./build/install/cores/core.25320' ./build/install/sbin/glusterfs
#0  0x7fae978ccb0f in afr_local_replies_wipe (local=0x0, 
priv=0x7fae900125b0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:1241
1241/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c: 
No such file or directory.

(gdb) bt
#0  0x7fae978ccb0f in afr_local_replies_wipe (local=0x0, 
priv=0x7fae900125b0) at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:1241
#1  0x7fae978b7aaf in afr_selfheal_inodelk (frame=0x7fae8c000c0c, 
this=0x7fae9000a6d0, inode=0x7fae8c00609c, dom=0x7fae900099f0 
"patchy-replicate-0", off=8126464, size=131072, 
locked_on=0x7fae96b4f110 "")
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:879
#2  0x7fae978bbeb5 in afr_selfheal_data_block 
(frame=0x7fae8c000c0c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, 
source=0, healed_sinks=0x7fae96b4f8a0 "", offset=8126464, 
size=131072, type=1, replies=0x7fae96b4f2b0)
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:243
#3  0x7fae978bc91d in afr_selfheal_data_do (frame=0x7fae8c006c9c, 
this=0x7fae9000a6d0, fd=0x7fae8c006e6c, source=0, 
healed_sinks=0x7fae96b4f8a0 "", replies=0x7fae96b4f2b0)
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:365
#4  0x7fae978bdc7b in __afr_selfheal_data (frame=0x7fae8c006c9c, 
this=0x7fae9000a6d0, fd=0x7fae8c006e6c, locked_on=0x7fae96b4fa00 
"\001\001\240")
at 
/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/x

Re: [Gluster-devel] Automated bug workflow

2015-07-07 Thread Pranith Kumar Karampuri



On 07/07/2015 02:42 PM, Rafi Kavungal Chundattu Parambil wrote:


Since we have some common interest in proposed design, IMHO let's start doing 
the implementation by keeping all of this valuable suggestions in mind.

If any one interested to volunteer this project, please reply to this thread.
I really want to contribute to this, but I am tied up with other work 
till end of this month. When are we trying to start this?


Pranith


Regards
Rafi KC


- Original Message -
From: Shyam srang...@redhat.com
To: Niels de Vos nde...@redhat.com, gluster-devel@gluster.org
Sent: Friday, May 29, 2015 11:23:34 PM
Subject: Re: [Gluster-devel] Automated bug workflow

On 05/29/2015 12:51 PM, Niels de Vos wrote:

Hi all,

today we had a discussion about how to get the status of reported bugs
more correct and up to date. It is something that has come up several
times already, but now we have a BIG solution as Pranith calls it.

The goal is rather simple, but is requires some thinking about rules and
components that can actually take care of the automation.

The general user-visible results would be:

   * rfc.sh will ask if this patch it the last one for the bug, or if more
 patches are expected
   * Gerrit will receive the patch with the answer, and modify the status
 of the bug to POST

I like to do this manually.


   * when the patch is merged, Gerrit will change (or not) the status of
 the bug to MODIFIED

I like to do this manually too... but automation does not hurt, esp.
when I control when the bug moves to POST.


   * when a nightly build is made, all bugs that have patches included and
 the status of the bug is MODIFIED, the build script will change the
 status to ON_QA and set a fixed in version

This I would like automated, as I am not tracking when it was released
(of sorts). But, if I miss the nightly boat, I assume the automation
would not pick this up, as a result automation on the MODIFIED step is
good, as that would take care of this miss for me.


This is a simplified view, there are some other cases that we need to
take care of. These are documented in the etherpad linked below.

We value any input for this, Kaleb and Rafi already gave some, thanks!
Please let us know over email or IRC and we'll update the etherpad.

Overall, we can have all of this, but I guess I will possibly never use
the POST automation and do that myself.


Thanks,
Pranith  Niels


Etherpad with detailed step by step actions to take:

  https://public.pad.fsfe.org/p/gluster-automated-bug-workflow

IRC log, where the discussion started:

  https://botbot.me/freenode/gluster-dev/2015-05-29/?msg=40450336page=2

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1109889.t

2015-07-09 Thread Pranith Kumar Karampuri

Sorry, seems like this is already fixed, I just need to rebase.

Pranith

On 07/09/2015 03:56 PM, Pranith Kumar Karampuri wrote:

hi,
  Could you please look into 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12150/consoleFull 



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] gfapi 3.6.3 QEMU 2.3 Ubuntu 14.04 testing

2015-07-08 Thread Pranith Kumar Karampuri

CC Prasanna who will be looking into it.

On 07/06/2015 07:30 PM, Josh Boon wrote:

Hey folks,

Does anyone have test environment running Ubuntu 14.04, QEMU 2.0, and 
Gluster 3.6.3? I'm looking to have some folks test out QEMU 2.3 for 
stability and performance and see if it removes the segfault errors. 
Another group of folks are experiencing the same segfaults I still 
experience but looking over their logs my theory of it being related 
to a self-heal didn't work out. I've included the stack trace below 
from their environment which matches mine. I've already put together a 
PPA over 
at https://launchpad.net/~josh-boon/+archive/ubuntu/qemu-edge-glusterfs with 
QEMU 2.3 and deps built for trusty. If anyone has the time or the 
resources that I could get into I'd appreciate the support. I'd like 
to get this ironed out so I can give my full vote of confidence to 
Gluster again.



Thanks,
Josh

Stack
 #0 0x7f369c95248c in ?? ()
No symbol table info available.
#1 0x7f369bd2b3b1 in glfs_io_async_cbk (ret=optimized out, 
frame=optimized out, data=0x7f369ee536c0) at glfs-fops.c:598

gio = 0x7f369ee536c0
#2 0x7f369badb66a in syncopctx_setfspid (pid=0x7f369ee536c0) at 
syncop.c:191

opctx = 0x0
ret = -1
#3 0x00100011 in ?? ()
No symbol table info available.
#4 0x7f36a5ae26b0 in ?? ()
No symbol table info available.
#5 0x7f36a81e2800 in ?? ()
No symbol table info available.
#6 0x7f36a5ae26b0 in ?? ()
No symbol table info available.
#7 0x7f36a81e2800 in ?? ()
No symbol table info available.
#8 0x in ?? ()
No symbol table info available.

Full log attached.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1109889.t

2015-07-09 Thread Pranith Kumar Karampuri

hi,
  Could you please look into 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/12150/consoleFull 



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] context based defaults for volume options in glusterd

2015-08-31 Thread Pranith Kumar Karampuri

hi,
 Afr needs context based defaults for quorum where by default 
quorum value is 'none' for 2-way replica and 'auto' for 3 way replica.
Anuradha sent http://review.gluster.org/11872 to fix the same. May be we 
can come up with more generic solution. The present solution remembers 
the default in volinfo->options and also written to the store. So the 
default will be shown in "volume info" output and if we want to change 
the defaults in future we will need to carefully think of all the things 
that could go wrong especially peers getting rejected because of the 
md5sum mismatch. Another way to solve the same problem is to generate 
the default value of the vme-option based on the context of the volume 
when we have to write to the volfile. In this particular case, we need 
to generate default as 'none' for 2-way-replica-count volume. and 'auto' 
for 3-replica-count volume. For volume-get command handling also we need 
to consider this dynamic default value. For implementing this, we can 
add a new member 'context_based_default_value_get()'(please feel free to 
come up with better name for the function :-) ) to the vme-table which 
can be invoked to get the default option which takes the volinfo as 
parameter at least, and not set .value i.e implicitly .value will be NULL.


This is based on earlier design detail in the comment of vme-table:
* Fourth field is . In this context they are used to specify
* a default. That is, even the volume dict doesn't have a value,
* we procced as if the default value were set for it.

We just want to enhance the existing behavior with this proposed change. 
It seems more generic than the present solution in the patch. In future 
people can write their own implementations of context based default 
value generation following same procedure. Let me know your comments.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] context based defaults for volume options in glusterd

2015-09-01 Thread Pranith Kumar Karampuri



On 09/01/2015 11:55 AM, Krishnan Parthasarathi wrote:


- Original Message -

hi,
   Afr needs context based defaults for quorum where by default
quorum value is 'none' for 2-way replica and 'auto' for 3 way replica.
Anuradha sent http://review.gluster.org/11872 to fix the same. May be we
can come up with more generic solution. The present solution remembers
the default in volinfo->options and also written to the store. So the
default will be shown in "volume info" output and if we want to change
the defaults in future we will need to carefully think of all the things
that could go wrong especially peers getting rejected because of the
md5sum mismatch. Another way to solve the same problem is to generate
the default value of the vme-option based on the context of the volume
when we have to write to the volfile. In this particular case, we need
to generate default as 'none' for 2-way-replica-count volume. and 'auto'
for 3-replica-count volume. For volume-get command handling also we need
to consider this dynamic default value. For implementing this, we can
add a new member 'context_based_default_value_get()'(please feel free to
come up with better name for the function :-) ) to the vme-table which
can be invoked to get the default option which takes the volinfo as
parameter at least, and not set .value i.e implicitly .value will be NULL.

This is based on earlier design detail in the comment of vme-table:
* Fourth field is . In this context they are used to specify
* a default. That is, even the volume dict doesn't have a value,
* we procced as if the default value were set for it.

We just want to enhance the existing behavior with this proposed change.
It seems more generic than the present solution in the patch. In future
people can write their own implementations of context based default
value generation following same procedure. Let me know your comments.

Here are a few things that are not clear to me.

1) Does the context-based default value for an option comes into effect
only when .value in vme table is NULL?
If there is context based default then the static default value should 
be NULL is my feeling.


2) IIUC, the generated default value is applied to the volume files generated
and persisted no place else. Is this correct?

Yes.


3) What happens if the context_based_default_get() is not available in all
glusterds in the cluster? e.g, upgrade from 3.6 to 3.7.x (where this may land).
Shouldn't this behaviour also be 'versioned' to prevent different volume files
being served by different nodes of the cluster?
In the context_based_default_value_get() we can add the version checks 
and generate it the way we want.


Pranith





Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] context based defaults for volume options in glusterd

2015-09-01 Thread Pranith Kumar Karampuri



On 09/01/2015 12:05 PM, Krishnan Parthasarathi wrote:

Here are a few things that are not clear to me.

1) Does the context-based default value for an option comes into effect
only when .value in vme table is NULL?

If there is context based default then the static default value should
be NULL is my feeling.

2) IIUC, the generated default value is applied to the volume files
generated
and persisted no place else. Is this correct?

Yes.

3) What happens if the context_based_default_get() is not available in all
glusterds in the cluster? e.g, upgrade from 3.6 to 3.7.x (where this may
land).
Shouldn't this behaviour also be 'versioned' to prevent different volume
files
being served by different nodes of the cluster?

In the context_based_default_value_get() we can add the version checks
and generate it the way we want.

Hmm. We have op-version for the options in vme table against which we ensure
that all servers generate the same volume files. What versions would the
context based default value generator functions use? I'd recommend documenting
these details of the proposal and send a PR to gluster-specs repository.
This needs to be reviewed carefully with all the details available in one place.
The version context based defaults will use is dependent on the version 
the change needs to go into. We will do one thing. We will add this 
proposal to the specs repo and as an example we will give the link to 
the patch for afr quorum which implements this proposal. It will be very 
similar to the current implementation Anuradha came up with absent 
storing on gluster-store from op-version Point of view.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


<    1   2   3   4   5   6   7   8   >