Re: [Gluster-devel] Gluster Code Metrics Weekly Report

2021-09-26 Thread Karthik Subrahmanya
Hi Saju,

Seems like coverity builds are failing. According to the overview
 page
"Last Analyzed" is pointing to 21st September. I have seen few of the
issues being triaged/fixed but the count is still showing as 27 issues
remaining.

Regards,
Karthik

On Mon, Sep 27, 2021 at 8:03 AM  wrote:

> Gluster Code Metrics
> MetricsValues
> Clang Scan 0
> Coverity 27
> Line Cov
> Func Cov
> Trend Graph Check the latest run: Coverity
>  Clang
>  Code Coverage
> 
> ---
>
> Community Meeting Calendar:
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
>
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
---

Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Regression failure continues: 'tests/basic/afr/split-brain-favorite-child-policy.t`

2019-06-10 Thread Karthik Subrahmanya
Patch posted: https://review.gluster.org/#/c/glusterfs/+/22850/

-Karthik

On Mon, Jun 10, 2019 at 10:47 PM Karthik Subrahmanya 
wrote:

> Hi Amar,
>
> I found the issue, will be sending a patch in a while.
>
> Regards,
> Karthik
>
> On Mon, Jun 10, 2019 at 10:46 PM Amar Tumballi Suryanarayan <
> atumb...@redhat.com> wrote:
>
>> Fails with:
>>
>> *20:56:58* ok 132 [  8/ 82] < 194> 'gluster --mode=script --wignore 
>> volume heal patchy'*20:56:58* not ok 133 [  8/  80260] < 195> '^0$ 
>> get_pending_heal_count patchy' -> 'Got "2" instead of "^0$"'*20:56:58* ok 
>> 134 [ 18/  2] < 197> '0 echo 0'
>>
>>
>> Looks like when the error occurred, it took 80seconds.
>>
>>
>> I see 2 different patches fail on this, would be good to analyze it further.
>>
>>
>> Regards,
>>
>> Amar
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Regression failure continues: 'tests/basic/afr/split-brain-favorite-child-policy.t`

2019-06-10 Thread Karthik Subrahmanya
Hi Amar,

I found the issue, will be sending a patch in a while.

Regards,
Karthik

On Mon, Jun 10, 2019 at 10:46 PM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

> Fails with:
>
> *20:56:58* ok 132 [  8/ 82] < 194> 'gluster --mode=script --wignore 
> volume heal patchy'*20:56:58* not ok 133 [  8/  80260] < 195> '^0$ 
> get_pending_heal_count patchy' -> 'Got "2" instead of "^0$"'*20:56:58* ok 134 
> [ 18/  2] < 197> '0 echo 0'
>
>
> Looks like when the error occurred, it took 80seconds.
>
>
> I see 2 different patches fail on this, would be good to analyze it further.
>
>
> Regards,
>
> Amar
>
>
> --
> Amar Tumballi (amarts)
>
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-11 Thread Karthik Subrahmanya
On Thu, Apr 11, 2019 at 1:40 PM Strahil Nikolov 
wrote:

> Hi Karthik,
>
> - the volume configuration you were using?
> I used oVirt 4.2.6 Gluster Wizard, so I guess - we need to involve the
> oVirt devs here.
> - why you wanted to replace your brick?
> I have deployed the arbiter on another location as I thought I can deploy
> the Thin Arbiter (still waiting the docs to be updated), but once I
> realized that GlusterD doesn't support Thin Arbiter, I had to build another
> machine for a local arbiter - thus a replacement was needed.
>
We are working on supporting Thin-arbiter with GlusterD. Once done, we will
update on the users list so that you can play with it and let us know your
experience.

> - which brick(s) you tried replacing?
> I was replacing the old arbiter with a new one
> - what problem(s) did you face?
> All oVirt VMs got paused due to I/O errors.
>
There could be many reasons for this. Without knowing the exact state of
the system at that time, I am afraid to make any comment on this.

>
> At the end, I have rebuild the whole setup and I never tried to replace
> the brick this way (used only reset-brick which didn't cause any issues).
>
> As I mentioned that was on v3.12, which is not the default for oVirt
> 4.3.x - so my guess is that it is OK now (current is v5.5).
>
I don't remember anyone complaining about this recently. This should work
in the latest releases.

>
> Just sharing my experience.
>
Highly appreciated.

Regards,
Karthik

>
> Best Regards,
> Strahil Nikolov
>
> В четвъртък, 11 април 2019 г., 0:53:52 ч. Гринуич-4, Karthik Subrahmanya <
> ksubr...@redhat.com> написа:
>
>
> Hi Strahil,
>
> Can you give us some more insights on
> - the volume configuration you were using?
> - why you wanted to replace your brick?
> - which brick(s) you tried replacing?
> - what problem(s) did you face?
>
> Regards,
> Karthik
>
> On Thu, Apr 11, 2019 at 10:14 AM Strahil  wrote:
>
> Hi Karthnik,
> I used only once the brick replace function when I wanted to change my
> Arbiter (v3.12.15 in oVirt 4.2.7)  and it was a complete disaster.
> Most probably I should have stopped the source arbiter before doing that,
> but the docs didn't mention it.
>
> Thus I always use reset-brick, as it never let me down.
>
> Best Regards,
> Strahil Nikolov
> On Apr 11, 2019 07:34, Karthik Subrahmanya  wrote:
>
> Hi Strahil,
>
> Thank you for sharing your experience with reset-brick option.
> Since he is using the gluster version 3.7.6, we do not have the
> reset-brick [1] option implemented there. It is introduced in 3.9.0. He has
> to go with replace-brick with the force option if he wants to use the same
> path & name for the new brick.
> Yes, it is recommended to have the new brick to be of the same size as
> that of the other bricks.
>
> [1]
> https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command
>
> Regards,
> Karthik
>
> On Wed, Apr 10, 2019 at 10:31 PM Strahil  wrote:
>
> I have used reset-brick - but I have just changed the brick layout.
> You may give it a try, but I guess you need your new brick to have same
> amount of space (or more).
>
> Maybe someone more experienced should share a more sound solution.
>
> Best Regards,
> Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth 
> wrote:
> >
> > Hi all,
> >
> > I am running replica 3 gluster with 3 bricks. One of my servers failed -
> all disks are showing errors and raid is in fault state.
> >
> > Type: Replicate
> > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
> > Status: Started
> > Number of Bricks: 1 x 3 = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1
> > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is
> down
> > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1
> >
> > So one of my bricks is totally failed (node2). It went down and all data
> are lost (failed raid on node2). Now I am running only two bricks on 2
> servers out from 3.
> > This is really critical problem for us, we can lost all data. I want to
> add new disks to node2, create new raid array on them and try to replace
> failed brick on this node.
> >
> > What is the procedure of replacing Brick2 on node2, can someone advice?
> I can’t find anything relevant in documentation.
> >
> > Thanks in advance,
> > Martin
> > ___
> > Gluster-users mailing list
> > gluster-us...@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-11 Thread Karthik Subrahmanya
On Thu, Apr 11, 2019 at 12:43 PM Martin Toth  wrote:

> Hi Karthik,
>
> more over, I would like to ask if there are some recommended
> settings/parameters for SHD in order to achieve good or fair I/O while
> volume will be healed when I will replace Brick (this should trigger
> healing process).
>
If I understand you concern correctly, you need to get fair I/O performance
for clients while healing takes place as part of  the replace brick
operation. For this you can turn off the "data-self-heal" and
"metadata-self-heal" options until the heal completes on the new brick.
Turning off client side healing doesn't compromise data integrity and
consistency. During the read request from client, pending xattr is
evaluated for replica copies and read is only served from correct copy.
During writes, IO will continue on both the replicas, SHD will take care of
healing files.
After replacing the brick, we strongly recommend you to consider upgrading
your gluster to one of the maintained versions. We have many stability
related fixes there, which can handle some critical issues and corner cases
which you could hit during these kind of scenarios.

Regards,
Karthik

> I had some problems in past when healing was triggered, VM disks became
> unresponsive because healing took most of I/O. My volume containing only
> big files with VM disks.
>
> Thanks for suggestions.
> BR,
> Martin
>
> On 10 Apr 2019, at 12:38, Martin Toth  wrote:
>
> Thanks, this looks ok to me, I will reset brick because I don't have any
> data anymore on failed node so I can use same path / brick name.
>
> Is reseting brick dangerous command? Should I be worried about some
> possible failure that will impact remaining two nodes? I am running really
> old 3.7.6 but stable version.
>
> Thanks,
> BR!
>
> Martin
>
>
> On 10 Apr 2019, at 12:20, Karthik Subrahmanya  wrote:
>
> Hi Martin,
>
> After you add the new disks and creating raid array, you can run the
> following command to replace the old brick with new one:
>
> - If you are going to use a different name to the new brick you can run
> gluster volume replace-brickcommit force
>
> - If you are planning to use the same name for the new brick as well then
> you can use
> gluster volume reset-brickcommit force
> Here old-brick & new-brick's hostname &  path should be same.
>
> After replacing the brick, make sure the brick comes online using volume
> status.
> Heal should automatically start, you can check the heal status to see all
> the files gets replicated to the newly added brick. If it does not start
> automatically, you can manually start that by running gluster volume heal
> .
>
> HTH,
> Karthik
>
> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth  wrote:
>
>> Hi all,
>>
>> I am running replica 3 gluster with 3 bricks. One of my servers failed -
>> all disks are showing errors and raid is in fault state.
>>
>> Type: Replicate
>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1
>> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is down
>> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1
>>
>> So one of my bricks is totally failed (node2). It went down and all data
>> are lost (failed raid on node2). Now I am running only two bricks on 2
>> servers out from 3.
>> This is really critical problem for us, we can lost all data. I want to
>> add new disks to node2, create new raid array on them and try to replace
>> failed brick on this node.
>>
>> What is the procedure of replacing Brick2 on node2, can someone advice? I
>> can’t find anything relevant in documentation.
>>
>> Thanks in advance,
>> Martin
>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-10 Thread Karthik Subrahmanya
On Thu, Apr 11, 2019 at 10:23 AM Karthik Subrahmanya 
wrote:

> Hi Strahil,
>
> Can you give us some more insights on
> - the volume configuration you were using?
> - why you wanted to replace your brick?
> - which brick(s) you tried replacing?
>
- if you remember the commands/steps that you followed, please give that as
well.

> - what problem(s) did you face?
>

> Regards,
> Karthik
>
> On Thu, Apr 11, 2019 at 10:14 AM Strahil  wrote:
>
>> Hi Karthnik,
>> I used only once the brick replace function when I wanted to change my
>> Arbiter (v3.12.15 in oVirt 4.2.7)  and it was a complete disaster.
>> Most probably I should have stopped the source arbiter before doing that,
>> but the docs didn't mention it.
>>
>> Thus I always use reset-brick, as it never let me down.
>>
>> Best Regards,
>> Strahil Nikolov
>> On Apr 11, 2019 07:34, Karthik Subrahmanya  wrote:
>>
>> Hi Strahil,
>>
>> Thank you for sharing your experience with reset-brick option.
>> Since he is using the gluster version 3.7.6, we do not have the
>> reset-brick [1] option implemented there. It is introduced in 3.9.0. He has
>> to go with replace-brick with the force option if he wants to use the same
>> path & name for the new brick.
>> Yes, it is recommended to have the new brick to be of the same size as
>> that of the other bricks.
>>
>> [1]
>> https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command
>>
>> Regards,
>> Karthik
>>
>> On Wed, Apr 10, 2019 at 10:31 PM Strahil  wrote:
>>
>> I have used reset-brick - but I have just changed the brick layout.
>> You may give it a try, but I guess you need your new brick to have same
>> amount of space (or more).
>>
>> Maybe someone more experienced should share a more sound solution.
>>
>> Best Regards,
>> Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth 
>> wrote:
>> >
>> > Hi all,
>> >
>> > I am running replica 3 gluster with 3 bricks. One of my servers failed
>> - all disks are showing errors and raid is in fault state.
>> >
>> > Type: Replicate
>> > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
>> > Status: Started
>> > Number of Bricks: 1 x 3 = 3
>> > Transport-type: tcp
>> > Bricks:
>> > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1
>> > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is
>> down
>> > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1
>> >
>> > So one of my bricks is totally failed (node2). It went down and all
>> data are lost (failed raid on node2). Now I am running only two bricks on 2
>> servers out from 3.
>> > This is really critical problem for us, we can lost all data. I want to
>> add new disks to node2, create new raid array on them and try to replace
>> failed brick on this node.
>> >
>> > What is the procedure of replacing Brick2 on node2, can someone advice?
>> I can’t find anything relevant in documentation.
>> >
>> > Thanks in advance,
>> > Martin
>> > ___
>> > Gluster-users mailing list
>> > gluster-us...@gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-10 Thread Karthik Subrahmanya
Hi Strahil,

Can you give us some more insights on
- the volume configuration you were using?
- why you wanted to replace your brick?
- which brick(s) you tried replacing?
- what problem(s) did you face?

Regards,
Karthik

On Thu, Apr 11, 2019 at 10:14 AM Strahil  wrote:

> Hi Karthnik,
> I used only once the brick replace function when I wanted to change my
> Arbiter (v3.12.15 in oVirt 4.2.7)  and it was a complete disaster.
> Most probably I should have stopped the source arbiter before doing that,
> but the docs didn't mention it.
>
> Thus I always use reset-brick, as it never let me down.
>
> Best Regards,
> Strahil Nikolov
> On Apr 11, 2019 07:34, Karthik Subrahmanya  wrote:
>
> Hi Strahil,
>
> Thank you for sharing your experience with reset-brick option.
> Since he is using the gluster version 3.7.6, we do not have the
> reset-brick [1] option implemented there. It is introduced in 3.9.0. He has
> to go with replace-brick with the force option if he wants to use the same
> path & name for the new brick.
> Yes, it is recommended to have the new brick to be of the same size as
> that of the other bricks.
>
> [1]
> https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command
>
> Regards,
> Karthik
>
> On Wed, Apr 10, 2019 at 10:31 PM Strahil  wrote:
>
> I have used reset-brick - but I have just changed the brick layout.
> You may give it a try, but I guess you need your new brick to have same
> amount of space (or more).
>
> Maybe someone more experienced should share a more sound solution.
>
> Best Regards,
> Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth 
> wrote:
> >
> > Hi all,
> >
> > I am running replica 3 gluster with 3 bricks. One of my servers failed -
> all disks are showing errors and raid is in fault state.
> >
> > Type: Replicate
> > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
> > Status: Started
> > Number of Bricks: 1 x 3 = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1
> > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is
> down
> > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1
> >
> > So one of my bricks is totally failed (node2). It went down and all data
> are lost (failed raid on node2). Now I am running only two bricks on 2
> servers out from 3.
> > This is really critical problem for us, we can lost all data. I want to
> add new disks to node2, create new raid array on them and try to replace
> failed brick on this node.
> >
> > What is the procedure of replacing Brick2 on node2, can someone advice?
> I can’t find anything relevant in documentation.
> >
> > Thanks in advance,
> > Martin
> > ___
> > Gluster-users mailing list
> > gluster-us...@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-10 Thread Karthik Subrahmanya
Hi Strahil,

Thank you for sharing your experience with reset-brick option.
Since he is using the gluster version 3.7.6, we do not have the reset-brick
[1] option implemented there. It is introduced in 3.9.0. He has to go with
replace-brick with the force option if he wants to use the same path & name
for the new brick.
Yes, it is recommended to have the new brick to be of the same size as that
of the other bricks.

[1]
https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command

Regards,
Karthik

On Wed, Apr 10, 2019 at 10:31 PM Strahil  wrote:

> I have used reset-brick - but I have just changed the brick layout.
> You may give it a try, but I guess you need your new brick to have same
> amount of space (or more).
>
> Maybe someone more experienced should share a more sound solution.
>
> Best Regards,
> Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth 
> wrote:
> >
> > Hi all,
> >
> > I am running replica 3 gluster with 3 bricks. One of my servers failed -
> all disks are showing errors and raid is in fault state.
> >
> > Type: Replicate
> > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
> > Status: Started
> > Number of Bricks: 1 x 3 = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1
> > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is
> down
> > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1
> >
> > So one of my bricks is totally failed (node2). It went down and all data
> are lost (failed raid on node2). Now I am running only two bricks on 2
> servers out from 3.
> > This is really critical problem for us, we can lost all data. I want to
> add new disks to node2, create new raid array on them and try to replace
> failed brick on this node.
> >
> > What is the procedure of replacing Brick2 on node2, can someone advice?
> I can’t find anything relevant in documentation.
> >
> > Thanks in advance,
> > Martin
> > ___
> > Gluster-users mailing list
> > gluster-us...@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] [External] Replica 3 - how to replace failed node (peer)

2019-04-10 Thread Karthik Subrahmanya
Hi Martin,

The reset-brick command is introduced in 3.9.0 and not present in 3.7.6.
You can try using the same replace-brick command with the force option even
if you want to use the same name for the brick being replaced.
3.7.6 is EOLed long back and glusterfs-6 is the latest version with lots of
improvements, bug fixes and new features. The release schedule can be found
at [1]. Upgrading to one of the maintained branch is highly recommended.

On Wed, Apr 10, 2019 at 4:14 PM Martin Toth  wrote:

> I’ve read this documentation but step 4 is really unclear to me. I don’t
> understand related mkdir/rmdir/setfattr and so on.
>
> Step 4:
>
> *Using the gluster volume fuse mount (In this example: /mnt/r2) set up
> metadata so that data will be synced to new brick (In this case it is
> from Server1:/home/gfs/r2_1 to Server1:/home/gfs/r2_5)*
>
> Why should I change trusted.non-existent-key on this volume?
> It is even more confusing because other mentioned howtos does not contain
> this step at all.
>
Those steps were needed in the older releases to set some metadata on the
good bricks so that heal should not happen from the replaced brick to good
bricks, which can lead to data loss. Since you are on 3.7.6, we have
automated all these steps for you in that branch. You just need to run the
replace-brick command, which will take care of all those things.

[1] https://www.gluster.org/release-schedule/

Regards,
Karthik

>
> BR,
> Martin
>
> On 10 Apr 2019, at 11:54, Davide Obbi  wrote:
>
>
> https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>
> On Wed, Apr 10, 2019 at 11:42 AM Martin Toth  wrote:
>
>> Hi all,
>>
>> I am running replica 3 gluster with 3 bricks. One of my servers failed -
>> all disks are showing errors and raid is in fault state.
>>
>> Type: Replicate
>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1
>> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is down
>> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1
>>
>> So one of my bricks is totally failed (node2). It went down and all data
>> are lost (failed raid on node2). Now I am running only two bricks on 2
>> servers out from 3.
>> This is really critical problem for us, we can lost all data. I want to
>> add new disks to node2, create new raid array on them and try to replace
>> failed brick on this node.
>>
>> What is the procedure of replacing Brick2 on node2, can someone advice? I
>> can’t find anything relevant in documentation.
>>
>> Thanks in advance,
>> Martin
>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Davide Obbi
> Senior System Administrator
>
> Booking.com B.V.
> Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
> Direct +31207031558
> [image: Booking.com] 
> Empowering people to experience the world since 1996
> 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
> million reported listings
> Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
>
>
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Replica 3 - how to replace failed node (peer)

2019-04-10 Thread Karthik Subrahmanya
Hi Martin,

After you add the new disks and creating raid array, you can run the
following command to replace the old brick with new one:

- If you are going to use a different name to the new brick you can run
gluster volume replace-brickcommit force

- If you are planning to use the same name for the new brick as well then
you can use
gluster volume reset-brickcommit force
Here old-brick & new-brick's hostname &  path should be same.

After replacing the brick, make sure the brick comes online using volume
status.
Heal should automatically start, you can check the heal status to see all
the files gets replicated to the newly added brick. If it does not start
automatically, you can manually start that by running gluster volume heal
.

HTH,
Karthik

On Wed, Apr 10, 2019 at 3:13 PM Martin Toth  wrote:

> Hi all,
>
> I am running replica 3 gluster with 3 bricks. One of my servers failed -
> all disks are showing errors and raid is in fault state.
>
> Type: Replicate
> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1
> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is down
> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1
>
> So one of my bricks is totally failed (node2). It went down and all data
> are lost (failed raid on node2). Now I am running only two bricks on 2
> servers out from 3.
> This is really critical problem for us, we can lost all data. I want to
> add new disks to node2, create new raid array on them and try to replace
> failed brick on this node.
>
> What is the procedure of replacing Brick2 on node2, can someone advice? I
> can’t find anything relevant in documentation.
>
> Thanks in advance,
> Martin
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status

2018-08-08 Thread Karthik Subrahmanya
On Wed, Aug 8, 2018 at 2:28 PM Nigel Babu  wrote:

>
>
> On Wed, Aug 8, 2018 at 2:00 PM Ravishankar N 
> wrote:
>
>>
>> On 08/08/2018 05:07 AM, Shyam Ranganathan wrote:
>> > 5) Current test failures
>> > We still have the following tests failing and some without any RCA or
>> > attention, (If something is incorrect, write back).
>> >
>> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>>  From the runs captured at https://review.gluster.org/#/c/20637/ , I saw
>> that the latest runs where this particular .t failed were at
>> https://build.gluster.org/job/line-coverage/415 and
>> https://build.gluster.org/job/line-coverage/421/.
>> In both of these runs, there are no gluster 'regression' logs available
>> at https://build.gluster.org/job/line-coverage//artifact.
>> I have raised BZ 1613721 for it.
>
>
> We've fixed this for newer runs, but we can do nothing for older runs,
> sadly.
>
Thanks Nigel! I'm also blocked on this. The failures are not reproducible
locally.
Without the logs we can not debug the issue. Will wait for the new runs to
complete.

>
>
>>
>> Also, Shyam was saying that in case of retries, the old (failure) logs
>> get overwritten by the retries which are successful. Can we disable
>> re-trying the .ts when they fail just for this lock down period alone so
>> that we do have the logs?
>
>
> Please don't apply a band-aid. Please fix run-test.sh so that the second
> run has a -retry attached to the file name or some such, please.
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Karthik Subrahmanya
On Fri, Aug 3, 2018 at 3:07 PM Karthik Subrahmanya 
wrote:

>
>
> On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya 
> wrote:
>
>>
>>
>> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya 
>> wrote:
>>
>>>
>>>
>>> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, 
>>> wrote:
>>>
>>>> I just went through the nightly regression report of brick mux runs and
>>>> here's what I can summarize.
>>>>
>>>>
>>>> =
>>>> Fails only with brick-mux
>>>>
>>>> =
>>>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
>>>> 400 secs. Refer
>>>> https://fstat.gluster.org/failure/209?state=2_date=2018-06-30_date=2018-07-31=all,
>>>> specifically the latest report
>>>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText
>>>> . Wasn't timing out as frequently as it was till 12 July. But since 27
>>>> July, it has timed out twice. Beginning to believe commit
>>>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
>>>> secs isn't sufficient enough (Mohit?)
>>>>
>>>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>>>> (Ref -
>>>> https://build.gluster.org/job/regression-test-with-multiplex/814/console)
>>>> -  Test fails only in brick-mux mode, AI on Atin to look at and get back.
>>>>
>>>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
>>>> https://build.gluster.org/job/regression-test-with-multiplex/813/console)
>>>> - Seems like failed just twice in last 30 days as per
>>>> https://fstat.gluster.org/failure/251?state=2_date=2018-06-30_date=2018-07-31=all.
>>>> Need help from AFR team.
>>>>
>>>> tests/bugs/quota/bug-1293601.t (
>>>> https://build.gluster.org/job/regression-test-with-multiplex/812/console)
>>>> - Hasn't failed after 26 July and earlier it was failing regularly. Did we
>>>> fix this test through any patch (Mohit?)
>>>>
>>>> tests/bitrot/bug-1373520.t - (
>>>> https://build.gluster.org/job/regression-test-with-multiplex/811/console)
>>>> - Hasn't failed after 27 July and earlier it was failing regularly. Did we
>>>> fix this test through any patch (Mohit?)
>>>>
>>>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
>>>> not sure if related to brick mux or not, so not sure if brick mux is
>>>> culprit here or not. Ref -
>>>> https://build.gluster.org/job/regression-test-with-multiplex/806/console
>>>> . Seems to be a glustershd crash. Need help from AFR folks.
>>>>
>>>>
>>>> =
>>>> Fails for non-brick mux case too
>>>>
>>>> =
>>>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
>>>> very often, with out brick mux as well. Refer
>>>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText
>>>> . There's an email in gluster-devel and a BZ 1610240 for the same.
>>>>
>>>> tests/bugs/bug-1368312.t - Seems to be recent failures (
>>>> https://build.gluster.org/job/regression-test-with-multiplex/815/console)
>>>> - seems to be a new failure, however seen this for a non-brick-mux case too
>>>> -
>>>> https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
>>>> . Need some eyes from AFR folks.
>>>>
>>>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to
>>>> brick mux, have seen this failing at multiple default regression runs.
>>>> Refer
>>>> https://fstat.gluster.org/failure/392?state=2_date=2018-06-30_date=2018-07-31=all
>>>> . We need help from geo-rep dev to root cause this earlier than lat

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Karthik Subrahmanya
On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya 
wrote:

>
>
> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya 
> wrote:
>
>>
>>
>> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, 
>> wrote:
>>
>>> I just went through the nightly regression report of brick mux runs and
>>> here's what I can summarize.
>>>
>>>
>>> =
>>> Fails only with brick-mux
>>>
>>> =
>>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
>>> 400 secs. Refer
>>> https://fstat.gluster.org/failure/209?state=2_date=2018-06-30_date=2018-07-31=all,
>>> specifically the latest report
>>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText
>>> . Wasn't timing out as frequently as it was till 12 July. But since 27
>>> July, it has timed out twice. Beginning to believe commit
>>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
>>> secs isn't sufficient enough (Mohit?)
>>>
>>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>>> (Ref -
>>> https://build.gluster.org/job/regression-test-with-multiplex/814/console)
>>> -  Test fails only in brick-mux mode, AI on Atin to look at and get back.
>>>
>>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
>>> https://build.gluster.org/job/regression-test-with-multiplex/813/console)
>>> - Seems like failed just twice in last 30 days as per
>>> https://fstat.gluster.org/failure/251?state=2_date=2018-06-30_date=2018-07-31=all.
>>> Need help from AFR team.
>>>
>>> tests/bugs/quota/bug-1293601.t (
>>> https://build.gluster.org/job/regression-test-with-multiplex/812/console)
>>> - Hasn't failed after 26 July and earlier it was failing regularly. Did we
>>> fix this test through any patch (Mohit?)
>>>
>>> tests/bitrot/bug-1373520.t - (
>>> https://build.gluster.org/job/regression-test-with-multiplex/811/console)
>>> - Hasn't failed after 27 July and earlier it was failing regularly. Did we
>>> fix this test through any patch (Mohit?)
>>>
>>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
>>> not sure if related to brick mux or not, so not sure if brick mux is
>>> culprit here or not. Ref -
>>> https://build.gluster.org/job/regression-test-with-multiplex/806/console
>>> . Seems to be a glustershd crash. Need help from AFR folks.
>>>
>>>
>>> =
>>> Fails for non-brick mux case too
>>>
>>> =
>>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
>>> very often, with out brick mux as well. Refer
>>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText
>>> . There's an email in gluster-devel and a BZ 1610240 for the same.
>>>
>>> tests/bugs/bug-1368312.t - Seems to be recent failures (
>>> https://build.gluster.org/job/regression-test-with-multiplex/815/console)
>>> - seems to be a new failure, however seen this for a non-brick-mux case too
>>> - https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
>>> . Need some eyes from AFR folks.
>>>
>>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to brick
>>> mux, have seen this failing at multiple default regression runs. Refer
>>> https://fstat.gluster.org/failure/392?state=2_date=2018-06-30_date=2018-07-31=all
>>> . We need help from geo-rep dev to root cause this earlier than later
>>>
>>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick
>>> mux, have seen this failing at multiple default regression runs. Refer
>>> https://fstat.gluster.org/failure/393?state=2_date=2018-06-30_date=2018-07-31=all
>>> . We need help from geo-rep dev to root cause this earlier than later
>>>
>>> tests/bugs/glusterd/validating-se

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Karthik Subrahmanya
On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya 
wrote:

>
>
> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee,  wrote:
>
>> I just went through the nightly regression report of brick mux runs and
>> here's what I can summarize.
>>
>>
>> =
>> Fails only with brick-mux
>>
>> =
>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
>> 400 secs. Refer
>> https://fstat.gluster.org/failure/209?state=2_date=2018-06-30_date=2018-07-31=all,
>> specifically the latest report
>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText .
>> Wasn't timing out as frequently as it was till 12 July. But since 27 July,
>> it has timed out twice. Beginning to believe commit
>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
>> secs isn't sufficient enough (Mohit?)
>>
>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>> (Ref -
>> https://build.gluster.org/job/regression-test-with-multiplex/814/console)
>> -  Test fails only in brick-mux mode, AI on Atin to look at and get back.
>>
>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
>> https://build.gluster.org/job/regression-test-with-multiplex/813/console)
>> - Seems like failed just twice in last 30 days as per
>> https://fstat.gluster.org/failure/251?state=2_date=2018-06-30_date=2018-07-31=all.
>> Need help from AFR team.
>>
>> tests/bugs/quota/bug-1293601.t (
>> https://build.gluster.org/job/regression-test-with-multiplex/812/console)
>> - Hasn't failed after 26 July and earlier it was failing regularly. Did we
>> fix this test through any patch (Mohit?)
>>
>> tests/bitrot/bug-1373520.t - (
>> https://build.gluster.org/job/regression-test-with-multiplex/811/console)
>> - Hasn't failed after 27 July and earlier it was failing regularly. Did we
>> fix this test through any patch (Mohit?)
>>
>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
>> not sure if related to brick mux or not, so not sure if brick mux is
>> culprit here or not. Ref -
>> https://build.gluster.org/job/regression-test-with-multiplex/806/console
>> . Seems to be a glustershd crash. Need help from AFR folks.
>>
>>
>> =
>> Fails for non-brick mux case too
>>
>> =
>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
>> very often, with out brick mux as well. Refer
>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText .
>> There's an email in gluster-devel and a BZ 1610240 for the same.
>>
>> tests/bugs/bug-1368312.t - Seems to be recent failures (
>> https://build.gluster.org/job/regression-test-with-multiplex/815/console)
>> - seems to be a new failure, however seen this for a non-brick-mux case too
>> - https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
>> . Need some eyes from AFR folks.
>>
>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to brick
>> mux, have seen this failing at multiple default regression runs. Refer
>> https://fstat.gluster.org/failure/392?state=2_date=2018-06-30_date=2018-07-31=all
>> . We need help from geo-rep dev to root cause this earlier than later
>>
>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick
>> mux, have seen this failing at multiple default regression runs. Refer
>> https://fstat.gluster.org/failure/393?state=2_date=2018-06-30_date=2018-07-31=all
>> . We need help from geo-rep dev to root cause this earlier than later
>>
>> tests/bugs/glusterd/validating-server-quorum.t (
>> https://build.gluster.org/job/regression-test-with-multiplex/810/console)
>> - Fails for non-brick-mux cases too,
>> https://fstat.gluster.org/failure/580?state=2_date=2018-06-30_date=2018-07-31=all
>> .  Atin has a patch https://review.gluster.org/20584 which resolves it
>> but patch is failing regression for a different test which is unrel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-02 Thread Karthik Subrahmanya
On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee,  wrote:

> I just went through the nightly regression report of brick mux runs and
> here's what I can summarize.
>
>
> =
> Fails only with brick-mux
>
> =
> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after 400
> secs. Refer
> https://fstat.gluster.org/failure/209?state=2_date=2018-06-30_date=2018-07-31=all,
> specifically the latest report
> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText .
> Wasn't timing out as frequently as it was till 12 July. But since 27 July,
> it has timed out twice. Beginning to believe commit
> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
> secs isn't sufficient enough (Mohit?)
>
> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Ref -
> https://build.gluster.org/job/regression-test-with-multiplex/814/console)
> -  Test fails only in brick-mux mode, AI on Atin to look at and get back.
>
> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
> https://build.gluster.org/job/regression-test-with-multiplex/813/console)
> - Seems like failed just twice in last 30 days as per
> https://fstat.gluster.org/failure/251?state=2_date=2018-06-30_date=2018-07-31=all.
> Need help from AFR team.
>
> tests/bugs/quota/bug-1293601.t (
> https://build.gluster.org/job/regression-test-with-multiplex/812/console)
> - Hasn't failed after 26 July and earlier it was failing regularly. Did we
> fix this test through any patch (Mohit?)
>
> tests/bitrot/bug-1373520.t - (
> https://build.gluster.org/job/regression-test-with-multiplex/811/console)
> - Hasn't failed after 27 July and earlier it was failing regularly. Did we
> fix this test through any patch (Mohit?)
>
> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
> not sure if related to brick mux or not, so not sure if brick mux is
> culprit here or not. Ref -
> https://build.gluster.org/job/regression-test-with-multiplex/806/console
> . Seems to be a glustershd crash. Need help from AFR folks.
>
>
> =
> Fails for non-brick mux case too
>
> =
> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup very
> often, with out brick mux as well. Refer
> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText .
> There's an email in gluster-devel and a BZ 1610240 for the same.
>
> tests/bugs/bug-1368312.t - Seems to be recent failures (
> https://build.gluster.org/job/regression-test-with-multiplex/815/console)
> - seems to be a new failure, however seen this for a non-brick-mux case too
> - https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
> . Need some eyes from AFR folks.
>
> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to brick
> mux, have seen this failing at multiple default regression runs. Refer
> https://fstat.gluster.org/failure/392?state=2_date=2018-06-30_date=2018-07-31=all
> . We need help from geo-rep dev to root cause this earlier than later
>
> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick
> mux, have seen this failing at multiple default regression runs. Refer
> https://fstat.gluster.org/failure/393?state=2_date=2018-06-30_date=2018-07-31=all
> . We need help from geo-rep dev to root cause this earlier than later
>
> tests/bugs/glusterd/validating-server-quorum.t (
> https://build.gluster.org/job/regression-test-with-multiplex/810/console)
> - Fails for non-brick-mux cases too,
> https://fstat.gluster.org/failure/580?state=2_date=2018-06-30_date=2018-07-31=all
> .  Atin has a patch https://review.gluster.org/20584 which resolves it
> but patch is failing regression for a different test which is unrelated.
>
> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> (Ref -
> https://build.gluster.org/job/regression-test-with-multiplex/809/console)
> - fails for non brick mux case too -
> https://build.gluster.org/job/regression-test-burn-in/4049/consoleText -
> Need some eyes from AFR folks.
>
I am looking at this. It is not reproducible locally. Trying to do this on
soft serve.

Regards,
Karthik

> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] bug-1432542-mpx-restart-crash.t failing

2018-07-09 Thread Karthik Subrahmanya
Thanks for the patch Xavi :)

Regards,
Karthik

On Mon, Jul 9, 2018 at 3:45 PM Xavi Hernandez  wrote:

> On Mon, Jul 9, 2018 at 11:14 AM Karthik Subrahmanya 
> wrote:
>
>> Hi Deepshikha,
>>
>> Are you looking into this failure? I can still see this happening for all
>> the regression runs.
>>
>
> I've executed the failing script on my laptop and all tests finish
> relatively fast. What seems to take time is the final cleanup. I can see
> 'semanage' taking some CPU during destruction of volumes. The test required
> 350 seconds to finish successfully.
>
> Not sure what caused the cleanup time to increase, but I've created a bug
> [1] to track this and a patch [2] to give more time to this test. This
> should allow all blocked regressions to complete successfully.
>
> Xavi
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1599250
> [2] https://review.gluster.org/20482
>
>
>> Thanks & Regards,
>> Karthik
>>
>> On Sun, Jul 8, 2018 at 7:18 AM Atin Mukherjee 
>> wrote:
>>
>>>
>>> https://build.gluster.org/job/regression-test-with-multiplex/794/display/redirect
>>> has the same test failing. Is the reason of the failure different given
>>> this is on jenkins?
>>>
>>> On Sat, 7 Jul 2018 at 19:12, Deepshikha Khandelwal 
>>> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> The issue[1] has been resolved. Now the softserve instance will be
>>>> having 2GB RAM i.e. same as that of the Jenkins builder's sizing
>>>> configurations.
>>>>
>>>> [1] https://github.com/gluster/softserve/issues/40
>>>>
>>>> Thanks,
>>>> Deepshikha Khandelwal
>>>>
>>>> On Fri, Jul 6, 2018 at 6:14 PM, Karthik Subrahmanya <
>>>> ksubr...@redhat.com> wrote:
>>>> >
>>>> >
>>>> > On Fri 6 Jul, 2018, 5:18 PM Deepshikha Khandelwal, <
>>>> dkhan...@redhat.com>
>>>> > wrote:
>>>> >>
>>>> >> Hi Poornima/Karthik,
>>>> >>
>>>> >> We've looked into the memory error that this softserve instance have
>>>> >> showed up. These machine instances have 1GB RAM which is not in the
>>>> >> case with the Jenkins builder. It's 2GB RAM there.
>>>> >>
>>>> >> We've created the issue [1] and will solve it sooner.
>>>> >
>>>> > Great. Thanks for the update.
>>>> >>
>>>> >>
>>>> >> Sorry for the inconvenience.
>>>> >>
>>>> >> [1] https://github.com/gluster/softserve/issues/40
>>>> >>
>>>> >> Thanks,
>>>> >> Deepshikha Khandelwal
>>>> >>
>>>> >> On Fri, Jul 6, 2018 at 3:44 PM, Karthik Subrahmanya <
>>>> ksubr...@redhat.com>
>>>> >> wrote:
>>>> >> > Thanks Poornima for the analysis.
>>>> >> > Can someone work on fixing this please?
>>>> >> >
>>>> >> > ~Karthik
>>>> >> >
>>>> >> > On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah
>>>> >> > 
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> The same test case is failing for my patch as well [1]. I
>>>> requested for
>>>> >> >> a
>>>> >> >> regression system and tried to reproduce it.
>>>> >> >> From my analysis, the brick process (mutiplexed) is consuming a
>>>> lot of
>>>> >> >> memory, and is being OOM killed. The regression has 1GB ram and
>>>> the
>>>> >> >> process
>>>> >> >> is consuming more than 1GB. 1GB for 120 bricks is acceptable
>>>> >> >> considering
>>>> >> >> there is 1000 threads in that brick process.
>>>> >> >> Ways to fix:
>>>> >> >> - Increase the regression system RAM size OR
>>>> >> >> - Decrease the number of volumes in the test case.
>>>> >> >>
>>>> >> >> But what is strange is why the test passes sometimes for some
>>>> patches.
>>>> >> >> There could be some bug/? in memory consumption.

Re: [Gluster-devel] [Gluster-infra] bug-1432542-mpx-restart-crash.t failing

2018-07-09 Thread Karthik Subrahmanya
Hi Deepshikha,

Are you looking into this failure? I can still see this happening for all
the regression runs.

Thanks & Regards,
Karthik

On Sun, Jul 8, 2018 at 7:18 AM Atin Mukherjee  wrote:

>
> https://build.gluster.org/job/regression-test-with-multiplex/794/display/redirect
> has the same test failing. Is the reason of the failure different given
> this is on jenkins?
>
> On Sat, 7 Jul 2018 at 19:12, Deepshikha Khandelwal 
> wrote:
>
>> Hi folks,
>>
>> The issue[1] has been resolved. Now the softserve instance will be
>> having 2GB RAM i.e. same as that of the Jenkins builder's sizing
>> configurations.
>>
>> [1] https://github.com/gluster/softserve/issues/40
>>
>> Thanks,
>> Deepshikha Khandelwal
>>
>> On Fri, Jul 6, 2018 at 6:14 PM, Karthik Subrahmanya 
>> wrote:
>> >
>> >
>> > On Fri 6 Jul, 2018, 5:18 PM Deepshikha Khandelwal, > >
>> > wrote:
>> >>
>> >> Hi Poornima/Karthik,
>> >>
>> >> We've looked into the memory error that this softserve instance have
>> >> showed up. These machine instances have 1GB RAM which is not in the
>> >> case with the Jenkins builder. It's 2GB RAM there.
>> >>
>> >> We've created the issue [1] and will solve it sooner.
>> >
>> > Great. Thanks for the update.
>> >>
>> >>
>> >> Sorry for the inconvenience.
>> >>
>> >> [1] https://github.com/gluster/softserve/issues/40
>> >>
>> >> Thanks,
>> >> Deepshikha Khandelwal
>> >>
>> >> On Fri, Jul 6, 2018 at 3:44 PM, Karthik Subrahmanya <
>> ksubr...@redhat.com>
>> >> wrote:
>> >> > Thanks Poornima for the analysis.
>> >> > Can someone work on fixing this please?
>> >> >
>> >> > ~Karthik
>> >> >
>> >> > On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah
>> >> > 
>> >> > wrote:
>> >> >>
>> >> >> The same test case is failing for my patch as well [1]. I requested
>> for
>> >> >> a
>> >> >> regression system and tried to reproduce it.
>> >> >> From my analysis, the brick process (mutiplexed) is consuming a lot
>> of
>> >> >> memory, and is being OOM killed. The regression has 1GB ram and the
>> >> >> process
>> >> >> is consuming more than 1GB. 1GB for 120 bricks is acceptable
>> >> >> considering
>> >> >> there is 1000 threads in that brick process.
>> >> >> Ways to fix:
>> >> >> - Increase the regression system RAM size OR
>> >> >> - Decrease the number of volumes in the test case.
>> >> >>
>> >> >> But what is strange is why the test passes sometimes for some
>> patches.
>> >> >> There could be some bug/? in memory consumption.
>> >> >>
>> >> >> Regards,
>> >> >> Poornima
>> >> >>
>> >> >>
>> >> >> On Fri, Jul 6, 2018 at 2:11 PM, Karthik Subrahmanya
>> >> >> 
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi,
>> >> >>>
>> >> >>> $subject is failing on centos regression for most of the patches
>> with
>> >> >>> timeout error.
>> >> >>>
>> >> >>> 07:32:34
>> >> >>>
>> >> >>>
>> 
>> >> >>> 07:32:34 [07:33:05] Running tests in file
>> >> >>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>> >> >>> 07:32:34 Timeout set is 300, default 200
>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed
>> out
>> >> >>> after 300 seconds
>> >> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t: bad
>> status
>> >> >>> 124
>> >> >>> 07:37:34
>> >> >>> 07:37:34*
>> >> >>> 07:37:34*   REGRESSION FAILED   *
>> >> >>> 07:37:34* Retrying failed tests in case *
>> >> >>> 07:37:34* we got some spurious failures *
>> >> >>> 07:37:34*
>> >> >>> 07:37:34
>> >> >>> 07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed
>> out
>> >> >>> after 300 seconds
>> >> >>> 07:42:34 End of test
>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>> >> >>> 07:42:34
>> >> >>>
>> >> >>>
>> 
>> >> >>>
>> >> >>> Can anyone take a look?
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Karthik
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> ___
>> >> >>> Gluster-devel mailing list
>> >> >>> Gluster-devel@gluster.org
>> >> >>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>> >> >>
>> >> >>
>> >> >
>> >> > ___
>> >> > Gluster-infra mailing list
>> >> > gluster-in...@gluster.org
>> >> > https://lists.gluster.org/mailman/listinfo/gluster-infra
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
> --
> - Atin (atinm)
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] bug-1432542-mpx-restart-crash.t failing

2018-07-06 Thread Karthik Subrahmanya
On Fri 6 Jul, 2018, 5:18 PM Deepshikha Khandelwal, 
wrote:

> Hi Poornima/Karthik,
>
> We've looked into the memory error that this softserve instance have
> showed up. These machine instances have 1GB RAM which is not in the
> case with the Jenkins builder. It's 2GB RAM there.
>
> We've created the issue [1] and will solve it sooner.
>
Great. Thanks for the update.

>
> Sorry for the inconvenience.
>
> [1] https://github.com/gluster/softserve/issues/40
>
> Thanks,
> Deepshikha Khandelwal
>
> On Fri, Jul 6, 2018 at 3:44 PM, Karthik Subrahmanya 
> wrote:
> > Thanks Poornima for the analysis.
> > Can someone work on fixing this please?
> >
> > ~Karthik
> >
> > On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah <
> pguru...@redhat.com>
> > wrote:
> >>
> >> The same test case is failing for my patch as well [1]. I requested for
> a
> >> regression system and tried to reproduce it.
> >> From my analysis, the brick process (mutiplexed) is consuming a lot of
> >> memory, and is being OOM killed. The regression has 1GB ram and the
> process
> >> is consuming more than 1GB. 1GB for 120 bricks is acceptable considering
> >> there is 1000 threads in that brick process.
> >> Ways to fix:
> >> - Increase the regression system RAM size OR
> >> - Decrease the number of volumes in the test case.
> >>
> >> But what is strange is why the test passes sometimes for some patches.
> >> There could be some bug/? in memory consumption.
> >>
> >> Regards,
> >> Poornima
> >>
> >>
> >> On Fri, Jul 6, 2018 at 2:11 PM, Karthik Subrahmanya <
> ksubr...@redhat.com>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> $subject is failing on centos regression for most of the patches with
> >>> timeout error.
> >>>
> >>> 07:32:34
> >>>
> 
> >>> 07:32:34 [07:33:05] Running tests in file
> >>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> >>> 07:32:34 Timeout set is 300, default 200
> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed out
> >>> after 300 seconds
> >>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t: bad status
> >>> 124
> >>> 07:37:34
> >>> 07:37:34*
> >>> 07:37:34*   REGRESSION FAILED   *
> >>> 07:37:34* Retrying failed tests in case *
> >>> 07:37:34* we got some spurious failures *
> >>> 07:37:34*
> >>> 07:37:34
> >>> 07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed out
> >>> after 300 seconds
> >>> 07:42:34 End of test ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> >>> 07:42:34
> >>>
> 
> >>>
> >>> Can anyone take a look?
> >>>
> >>> Thanks,
> >>> Karthik
> >>>
> >>>
> >>>
> >>> ___
> >>> Gluster-devel mailing list
> >>> Gluster-devel@gluster.org
> >>> https://lists.gluster.org/mailman/listinfo/gluster-devel
> >>
> >>
> >
> > ___
> > Gluster-infra mailing list
> > gluster-in...@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-infra
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] bug-1432542-mpx-restart-crash.t failing

2018-07-06 Thread Karthik Subrahmanya
Thanks Poornima for the analysis.
Can someone work on fixing this please?

~Karthik

On Fri, Jul 6, 2018 at 3:17 PM Poornima Gurusiddaiah 
wrote:

> The same test case is failing for my patch as well [1]. I requested for a
> regression system and tried to reproduce it.
> From my analysis, the brick process (mutiplexed) is consuming a lot of
> memory, and is being OOM killed. The regression has 1GB ram and the process
> is consuming more than 1GB. 1GB for 120 bricks is acceptable considering
> there is 1000 threads in that brick process.
> Ways to fix:
> - Increase the regression system RAM size OR
> - Decrease the number of volumes in the test case.
>
> But what is strange is why the test passes sometimes for some patches.
> There could be some bug/? in memory consumption.
>
> Regards,
> Poornima
>
>
> On Fri, Jul 6, 2018 at 2:11 PM, Karthik Subrahmanya 
> wrote:
>
>> Hi,
>>
>> $subject is failing on centos regression for most of the patches with
>> timeout error.
>>
>> 07:32:34
>> 
>> 07:32:34 [07:33:05] Running tests in file
>> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>> 07:32:34 Timeout set is 300, default 200
>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed out
>> after 300 seconds
>> 07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t: bad status 124
>> 07:37:34
>> 07:37:34*
>> 07:37:34*   REGRESSION FAILED   *
>> 07:37:34* Retrying failed tests in case *
>> 07:37:34* we got some spurious failures *
>> 07:37:34*
>> 07:37:34
>> 07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed out
>> after 300 seconds
>> 07:42:34 End of test ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>> 07:42:34
>> 
>>
>> Can anyone take a look?
>>
>> Thanks,
>> Karthik
>>
>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] bug-1432542-mpx-restart-crash.t failing

2018-07-06 Thread Karthik Subrahmanya
Hi,

$subject is failing on centos regression for most of the patches with
timeout error.

07:32:34

07:32:34 [07:33:05] Running tests in file
./tests/bugs/core/bug-1432542-mpx-restart-crash.t
07:32:34 Timeout set is 300, default 200
07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed out after
300 seconds
07:37:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t: bad status 124
07:37:34
07:37:34*
07:37:34*   REGRESSION FAILED   *
07:37:34* Retrying failed tests in case *
07:37:34* we got some spurious failures *
07:37:34*
07:37:34
07:42:34 ./tests/bugs/core/bug-1432542-mpx-restart-crash.t timed out after
300 seconds
07:42:34 End of test ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
07:42:34


Can anyone take a look?

Thanks,
Karthik
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] replace-brick commit force fails in multi node cluster

2018-03-28 Thread Karthik Subrahmanya
Hey Atin,

This is happening because of bringing down the glusterd on the third node
before doing the replcae brick.
In replace brick we do a temporary mount to mark pending xattr on the
source bricks saying that the brick being replaced is sink.
But in this case, since one of the source brick's glusterd is down, when
the mount tries to get the port at which the brick is listening,
it fails to get that leading to failure of setting the
"trusted.replace_brick" attribute.
For replica 3 volume to say any fop as success it needs at least quorum
number of success. Hence the replace brick fails.

On the QE setup the replace brick would have succeeded only because of some
race between glusterd going down and replace brick happening.
Otherwise there is no chance for replace brick to succeed.

Regards,
Karthik

On Tue, Mar 27, 2018 at 7:25 PM, Atin Mukherjee  wrote:

> While writing a test for the patch fix of BZ https://bugzilla.redhat.com/
> show_bug.cgi?id=1560957 I just can't make my test case to pass where a
> replace brick commit force always fails on a multi node cluster and that's
> on the latest mainline code.
>
>
> *The fix is a one liner:*
> atin@dhcp35-96:~/codebase/upstream/glusterfs_master/glusterfs$ gd HEAD~1
> diff --git a/xlators/mgmt/glusterd/src/glusterd-utils.c
> b/xlators/mgmt/glusterd/src/glusterd-utils.c
> index af30756c9..24d813fbd 100644
> --- a/xlators/mgmt/glusterd/src/glusterd-utils.c
> +++ b/xlators/mgmt/glusterd/src/glusterd-utils.c
> @@ -5995,6 +5995,7 @@ glusterd_brick_start (glusterd_volinfo_t *volinfo,
>   * TBD: re-use RPC connection across bricks
>   */
>  if (is_brick_mx_enabled ()) {
> +brickinfo->port_registered = _gf_true;
>  ret = glusterd_get_sock_from_brick_pid
> (pid, socketpath,
>
> sizeof(socketpath));
>  if (ret) {
>
>
>
>
> *The test does the following:*
>
> #!/bin/bash
>
>
>
> . $(dirname $0)/../../include.rc
>
> . $(dirname $0)/../../cluster.rc
>
> . $(dirname $0)/../../volume.rc
>
>
>
>
>
> cleanup;
>
>
>
> TEST launch_cluster 3;
>
>
>
> TEST $CLI_1 peer probe $H2;
>
> EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count
>
>
>
> TEST $CLI_1 peer probe $H3;
>
> EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count
>
>
>
> TEST $CLI_1 volume set all cluster.brick-multiplex
> on
>
>
> TEST $CLI_1 volume create $V0 replica 3 $H1:$B1/${V0}1 $H2:$B2/${V0}1
> $H3:$B3/${V0}1
>
>
> TEST $CLI_1 volume start $V0
>
> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H1
> $B1/${V0}1
> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H2
> $B2/${V0}1
> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3
> $B3/${V0}1
>
>
>
>
> #bug-1560957 - replace brick followed by an add-brick in a brick mux
> setup
> #brings down one brick instance
>
>
>
> kill_glusterd 3
>
> EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count
>
> TEST $CLI_1 volume replace-brick $V0 $H1:$B1/${V0}1 $H1:$B1/${V0}1_new
> commit force
>
>
> *this is where the test always fails saying "volume replace-brick: failed:
> Commit failed on localhost. Please check log file for details.*
>
>
> TEST $glusterd_3
>
> EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count
>
>
>
> TEST $CLI_1 volume add-brick $V0 replica 3 $H1:$$B1/${V0}3 $H2:$B1/${V0}3
> $H3:$B1/${V0}3 commit force
>
>
> EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status_1 $V0 $H3
> $H3:$B1/${V0}1
> cleanup;
>
> glusterd log from 1st node
> [2018-03-27 13:11:58.630845] E [MSGID: 106053] [glusterd-utils.c:13889:
> glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended
> attribute trusted.replace-brick : Transport endpoint is not connected
> [Transport endpoint is not connected]
>
> Request some help/attention from AFR folks.
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 3.13: (STM release) Details

2017-11-01 Thread Karthik Subrahmanya
On Tue, Oct 31, 2017 at 6:33 PM, Shyam Ranganathan <srang...@redhat.com>
wrote:

> On 10/31/2017 08:11 AM, Karthik Subrahmanya wrote:
>
>> Hey Shyam,
>>
>> Can we also have the heal info summary feature [1], which is merged
>> upstream [2].
>> I haven't raised an issue for this yet, I can do that by tomorrow and I
>> need to write a doc for that.
>>
>
> Thanks for bringing this to my notice, it would have been missed out as a
> feature otherwise.
>
> I do see that the commit start goes way back into 2015, and was rekindled
> in Sep 2017 (by you), because I was initially thinking why this did not
> have a issue reference anyway to begin with.
>
> Please raise a github issue for the same with the appropriate details and
> I can take care of the remaining process there for you.
>
Raised issue #346. I don't have permission to add the milestones & other
things. Please do the needful.

Thanks & Regards,
Karthik

>
> @maintainers on the patch review, please ensure that we have a github
> reference for features, else there is a lot we will miss for the same!
>
>
>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1261463
>> [2] https://review.gluster.org/#/c/12154/
>>
>>
>> Thanks & Regards,
>> Karthik
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] after hard reboot, split-brain happened, but nothing showed in gluster voluem heal info command !

2017-09-28 Thread Karthik Subrahmanya
On Thu, Sep 28, 2017 at 12:11 PM, Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

>
>
> The version I am using is glusterfs 3.6.9
>
This is a very old version which is EOL. If you can upgrade to any of the
supported version (3.10 or 3.12) would be great.
They have many new features, bug fixes & performance improvements. If you
can try to reproduce the issue on that would be
very helpful.

Regards,
Karthik

> Best regards,
> *Cynthia **(周琳)*
>
> MBB SM HETRAN SW3 MATRIX
>
> Storage
> Mobile: +86 (0)18657188311
>
>
>
> *From:* Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> *Sent:* Thursday, September 28, 2017 2:37 PM
>
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>
> *Cc:* gluster-us...@gluster.org; gluster-devel@gluster.org
> *Subject:* Re: [Gluster-users] after hard reboot, split-brain happened,
> but nothing showed in gluster voluem heal info command !
>
>
>
>
>
>
>
> On Thu, Sep 28, 2017 at 11:41 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Hi,
>
> Thanks for reply!
>
> I’ve checked [1]. But the problem is that there is nothing shown in
> command “gluster volume heal  info”. So these split-entry
> files could only be detected when app try to visit them.
>
> I can find gfid mismatch for those in-split-brain entries from mount log,
> however, nothing show in shd log, the shd log does not know those
> split-brain entries. Because there is nothing in indices/xattrop directory.
>
> I guess it was there before, and then it got cleared by one of the heal
> process either client side or server side. I wanted to check that by
> examining the logs.
>
> Which version of gluster you are running by the way?
>
>
>
> The log is not available right now, when it reproduced, I will provide it
> to your, Thanks!
>
> Ok.
>
>
>
> Best regards,
> *Cynthia **(周琳)*
>
> MBB SM HETRAN SW3 MATRIX
>
> Storage
> Mobile: +86 (0)18657188311
>
>
>
> *From:* Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> *Sent:* Thursday, September 28, 2017 2:02 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>
> *Cc:* gluster-us...@gluster.org; gluster-devel@gluster.org
> *Subject:* Re: [Gluster-users] after hard reboot, split-brain happened,
> but nothing showed in gluster voluem heal info command !
>
>
>
> Hi,
>
> To resolve the gfid split-brain you can follow the steps at [1].
>
> Since we don't have the pending markers set on the files, it is not
> showing in the heal info.
> To debug this issue, need some more data from you. Could you provide these
> things?
>
> 1. volume info
>
> 2. mount log
>
> 3. brick logs
>
> 4. shd log
>
>
>
> May I also know which version of gluster you are running. From the info
> you have provided it looks like an old version.
>
> If it is, then it would be great if you can upgarde to one of the latest
> supported release.
>
>
> [1] http://docs.gluster.org/en/latest/Troubleshooting/split-
> brain/#fixing-directory-entry-split-brain
>
>
>
> Thanks & Regards,
>
> Karthik
>
> On Wed, Sep 27, 2017 at 9:42 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
>
>
> HI gluster experts,
>
>
>
> I meet a tough problem about “split-brain” issue. Sometimes, after hard
> reboot, we will find some files in split-brain, however its parent
> directory or anything could be shown in command “gluster volume heal
>  info”, also, no entry in .glusterfs/indices/xattrop
> directory, can you help to shed some lights on this issue? Thanks!
>
>
>
>
>
>
>
> Following is some info from our env,
>
>
>
> *Checking from sn-0 cliet, nothing is shown in-split-brain!*
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # gluster v heal services info
>
> Brick sn-0:/mnt/bricks/services/brick/
>
> Number of entries: 0
>
>
>
> Brick sn-1:/mnt/bricks/services/brick/
>
> Number of entries: 0
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # gluster v heal services info split-brain
>
> Gathering list of split brain entries on volume services has been
> successful
>
>
>
> Brick sn-0.local:/mnt/bricks/services/brick
>
> Number of entries: 0
>
>
>
> Brick sn-1.local:/mnt/bricks/services/brick
>
> Number of entries: 0
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # ls -l /mnt/services/netserv/ethip/
>
> ls: cannot acces

Re: [Gluster-devel] [Gluster-users] after hard reboot, split-brain happened, but nothing showed in gluster voluem heal info command !

2017-09-28 Thread Karthik Subrahmanya
On Thu, Sep 28, 2017 at 11:41 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi,
>
> Thanks for reply!
>
> I’ve checked [1]. But the problem is that there is nothing shown in
> command “gluster volume heal  info”. So these split-entry
> files could only be detected when app try to visit them.
>
> I can find gfid mismatch for those in-split-brain entries from mount log,
> however, nothing show in shd log, the shd log does not know those
> split-brain entries. Because there is nothing in indices/xattrop directory.
>
I guess it was there before, and then it got cleared by one of the heal
process either client side or server side. I wanted to check that by
examining the logs.
Which version of gluster you are running by the way?

>
>
> The log is not available right now, when it reproduced, I will provide it
> to your, Thanks!
>
Ok.

>
>
> Best regards,
> *Cynthia **(周琳)*
>
> MBB SM HETRAN SW3 MATRIX
>
> Storage
> Mobile: +86 (0)18657188311
>
>
>
> *From:* Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> *Sent:* Thursday, September 28, 2017 2:02 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.z...@nokia-sbell.com>
> *Cc:* gluster-us...@gluster.org; gluster-devel@gluster.org
> *Subject:* Re: [Gluster-users] after hard reboot, split-brain happened,
> but nothing showed in gluster voluem heal info command !
>
>
>
> Hi,
>
> To resolve the gfid split-brain you can follow the steps at [1].
>
> Since we don't have the pending markers set on the files, it is not
> showing in the heal info.
> To debug this issue, need some more data from you. Could you provide these
> things?
>
> 1. volume info
>
> 2. mount log
>
> 3. brick logs
>
> 4. shd log
>
>
>
> May I also know which version of gluster you are running. From the info
> you have provided it looks like an old version.
>
> If it is, then it would be great if you can upgarde to one of the latest
> supported release.
>
>
> [1] http://docs.gluster.org/en/latest/Troubleshooting/split-
> brain/#fixing-directory-entry-split-brain
>
>
>
> Thanks & Regards,
>
> Karthik
>
> On Wed, Sep 27, 2017 at 9:42 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
>
>
> HI gluster experts,
>
>
>
> I meet a tough problem about “split-brain” issue. Sometimes, after hard
> reboot, we will find some files in split-brain, however its parent
> directory or anything could be shown in command “gluster volume heal
>  info”, also, no entry in .glusterfs/indices/xattrop
> directory, can you help to shed some lights on this issue? Thanks!
>
>
>
>
>
>
>
> Following is some info from our env,
>
>
>
> *Checking from sn-0 cliet, nothing is shown in-split-brain!*
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # gluster v heal services info
>
> Brick sn-0:/mnt/bricks/services/brick/
>
> Number of entries: 0
>
>
>
> Brick sn-1:/mnt/bricks/services/brick/
>
> Number of entries: 0
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # gluster v heal services info split-brain
>
> Gathering list of split brain entries on volume services has been
> successful
>
>
>
> Brick sn-0.local:/mnt/bricks/services/brick
>
> Number of entries: 0
>
>
>
> Brick sn-1.local:/mnt/bricks/services/brick
>
> Number of entries: 0
>
>
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> # ls -l /mnt/services/netserv/ethip/
>
> ls: cannot access '/mnt/services/netserv/ethip/sn-2': Input/output error
>
> ls: cannot access '/mnt/services/netserv/ethip/mn-1': Input/output error
>
> total 3
>
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-0
>
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-1
>
> -rw-r--r-- 1 root root 145 Sep 26 20:35 as-2
>
> -rw-r--r-- 1 root root 237 Sep 26 20:36 mn-0
>
> -? ? ??  ?? mn-1
>
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-0
>
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-1
>
> -? ? ??  ?? sn-2
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
>
>
> *Checking from glusterfs server side, the gfid of mn-1 on sn-0 and sn-1 is
> different*
>
>
>
> *[SN-0]*
>
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
>
> # getfattr -m . -d -e hex /mnt/bricks/services/brick/netserv/ethip
>
> getfattr: Removing leading '/' from absolute path names
>
> # file: mnt/bricks/services/brick/netserv/ethip
>
> trusted.gfid=0

Re: [Gluster-devel] [Gluster-users] after hard reboot, split-brain happened, but nothing showed in gluster voluem heal info command !

2017-09-28 Thread Karthik Subrahmanya
Hi,

To resolve the gfid split-brain you can follow the steps at [1].
Since we don't have the pending markers set on the files, it is not showing
in the heal info.
To debug this issue, need some more data from you. Could you provide these
things?
1. volume info
2. mount log
3. brick logs
4. shd log

May I also know which version of gluster you are running. From the info you
have provided it looks like an old version.
If it is, then it would be great if you can upgarde to one of the latest
supported release.

[1]
http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain

Thanks & Regards,
Karthik

On Wed, Sep 27, 2017 at 9:42 AM, Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

>
> HI gluster experts,
>
> I meet a tough problem about “split-brain” issue. Sometimes, after hard
> reboot, we will find some files in split-brain, however its parent
> directory or anything could be shown in command “gluster volume heal
>  info”, also, no entry in .glusterfs/indices/xattrop
> directory, can you help to shed some lights on this issue? Thanks!
>
>
>
> Following is some info from our env,
>
> *Checking from sn-0 cliet, nothing is shown in-split-brain!*
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # gluster v heal services info
> Brick sn-0:/mnt/bricks/services/brick/
> Number of entries: 0
>
> Brick sn-1:/mnt/bricks/services/brick/
> Number of entries: 0
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # gluster v heal services info split-brain
> Gathering list of split brain entries on volume services has been
> successful
>
> Brick sn-0.local:/mnt/bricks/services/brick
> Number of entries: 0
>
> Brick sn-1.local:/mnt/bricks/services/brick
> Number of entries: 0
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # ls -l /mnt/services/netserv/ethip/
> ls: cannot access '/mnt/services/netserv/ethip/sn-2': Input/output error
> ls: cannot access '/mnt/services/netserv/ethip/mn-1': Input/output error
> total 3
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-0
> -rw-r--r-- 1 root root 144 Sep 26 20:35 as-1
> -rw-r--r-- 1 root root 145 Sep 26 20:35 as-2
> -rw-r--r-- 1 root root 237 Sep 26 20:36 mn-0
> -? ? ??  ?? mn-1
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-0
> -rw-r--r-- 1 root root  73 Sep 26 20:35 sn-1
> -? ? ??  ?? sn-2
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
>
> *Checking from glusterfs server side, the gfid of mn-1 on sn-0 and sn-1 is
> different*
>
> *[SN-0]*
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
> # getfattr -m . -d -e hex /mnt/bricks/services/brick/netserv/ethip
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/bricks/services/brick/netserv/ethip
> trusted.gfid=0xee71d19ac0f84f60b11eb42a083644e4
> trusted.glusterfs.dht=0x0001
>
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # getfattr -m . -d -e hex mn-1
> # file: mn-1
> trusted.afr.dirty=0x
> trusted.afr.services-client-0=0x
> trusted.afr.services-client-1=0x
> trusted.gfid=0x53a33f437464475486f31c4e44d83afd
> [root@sn-0:/mnt/bricks/services/brick/netserv/ethip]
> # stat mn-1
>   File: mn-1
>   Size: 237  Blocks: 16 IO Block: 4096   regular file
> Device: fd51h/64849dInode: 2536Links: 2
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 2017-09-26 20:30:25.67900 +0300
> Modify: 2017-09-26 20:30:24.60400 +0300
> Change: 2017-09-26 20:30:24.61000 +0300
> Birth: -
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/indices/xattrop]
> # ls
> xattrop-63f8bbcb-7fa6-4fc8-b721-675a05de0ab3
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/indices/xattrop]
>
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
> # ls
> 53a33f43-7464-4754-86f3-1c4e44d83afd
> [root@sn-0:/mnt/bricks/services/brick/.glusterfs/53/a3]
> # stat 53a33f43-7464-4754-86f3-1c4e44d83afd
>   File: 53a33f43-7464-4754-86f3-1c4e44d83afd
>   Size: 237  Blocks: 16 IO Block: 4096   regular file
> Device: fd51h/64849dInode: 2536Links: 2
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 2017-09-26 20:30:25.67900 +0300
> Modify: 2017-09-26 20:30:24.60400 +0300
> Change: 2017-09-26 20:30:24.61000 +0300
> Birth: -
>
> #
> *[SN-1]*
>
> [root@sn-1:/mnt/bricks/services/brick/.glusterfs/f7/f1]
> #  getfattr -m . -d -e hex /mnt/bricks/services/brick/netserv/ethip
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/bricks/services/brick/netserv/ethip
> trusted.gfid=0xee71d19ac0f84f60b11eb42a083644e4
> trusted.glusterfs.dht=0x0001
>
> [root@sn-1:/mnt/bricks/services/brick/.glusterfs/f7/f1]
> *#*
> 

Re: [Gluster-devel] About WORM feature

2017-07-27 Thread Karthik Subrahmanya
+Poornima
On Thu, Jul 27, 2017 at 3:38 PM, Li, Dan <li...@cn.fujitsu.com> wrote:

> Hi, Karthik
>
> My gluster's version is 3.10.3 as follows,
> I thinks the "client" means the samba's plugin:
> samba-vfs-glusterfs-4.4.4-14.el7_3.x86_64
> How to update the samba-vfs-glusterfs's version?
>
I don't have much context on that. Adding Poornima who can help you with
that.

>
> [lidan@node glusterfs-3.10.3]$ glusterfs --version
> glusterfs 3.10.3
> Repository revision: git://git.gluster.org/glusterfs.git
> Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> It is licensed to you under your choice of the GNU Lesser
> General Public License, version 3 or any later version (LGPLv3
> or later), or the GNU General Public License, version 2 (GPLv2),
> in all cases as published by the Free Software Foundation.
>
> 
> 以上、よろしくお願いします
> 李 丹(LI DAN)
> Dept. III of Technology and DevelopmentNanjing Fujitsu Nanda Software
> Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, 210012, China
> T: +86-25-86630566-9488
> Mail: li...@cn.fujitsu.com
> 
>
> From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> Sent: Thursday, July 27, 2017 5:30 PM
> To: Li, Dan/李 丹 <li...@cn.fujitsu.com>
> Cc: gluster-devel@gluster.org
> Subject: Re: [Gluster-devel] About WORM feature
>
>
>
> On Thu, Jul 27, 2017 at 9:10 AM, Li, Dan <li...@cn.fujitsu.com> wrote:
> Hi, Karthik
>
> Thanks for your information, which is very useful for me.
>
> I tried to use volume level worm feature and file level worm feature,
> and I think file level worm feature is seemly matching my needs.
> But I find it cannot work with samba. (volume level worm feature is OK)
>
> I mounted worm_vol on samba and setup samba server by the following
> configuration.
> [gluster-worm_vol]
> comment = For samba share of volume worm_vol
> vfs objects = glusterfs
> glusterfs:volume = worm_vol
> glusterfs:logfile = /var/log/samba/glusterfs-worm_vol.%M.log
> glusterfs:loglevel = 7
> path = /
> read only = no
> guesn ok = yes
> kernel share modes = no
>
> The following error has been reported when I set the feature on the volume.
> [root@node mnt]# gluster  vol set  worm_vol features.worm-file-level on
> volume set: failed: One of the client 127.0.0.1:1023 is running with
> op-version 30703 and doesn't support the required op-version 30800. This
> client needs to be upgraded or disconnected before running this command
> again
> The file level worm feature is implemented in glusterfs-3.8 but on your
> machine 127.0.0.1:1023 you are running glusterfs-3.7 (which is EOL). You
> should update the machines to have glusterfs 3.8 or above to avail this
> feature. And it is recommended to have all the machines running the same
> version of gluster.
>
> Other features of worm can be setted without error.
> [root@ node mnt]# gluster  vol set  worm_vol features.default-retention-period
> 30
> volume set: success
> This was successful because we do not check for the op-version when when
> setting this configurable.
> HTH,
> Karthik
>
> The glusterd log shows as following:
> [2021-07-25 06:41:09.078135] I [MSGID: 106022] 
> [glusterd-handshake.c:803:_client_supports_volume]
> 0-glusterd: Client 127.0.0.1:1017 (1 -> 30703) doesn't support required
> op-version (30800). Rejecting volfile request.
>
> The version info is as following:
> samba-vfs-glusterfs-4.4.4-14.el7_3.x86_64
> glusterfs-3.10.3.tar.gz
>
> Can you give me some suggestion? Thanks!
>
> Regards,
> Li Dan
>
>
>
> > From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> > Sent: Wednesday, July 26, 2017 8:10 PM
> > To: Li, Dan/李 丹 <li...@cn.fujitsu.com>
> > Cc: gluster-devel@gluster.org
> > Subject: Re: [Gluster-devel] About WORM feature
> >
> > Hi,
> >
> >> On Wed, Jul 26, 2017 at 3:23 PM, Li, Dan <li...@cn.fujitsu.com> wrote:
> >> Hi, all
> >
> >> I cannot find the introduction of WORM feature in the latest version of
> documents.
> >> Does it means that this feature is not recommended anymore?
> > We recommend the feature, but unfortunately the documentation on
> > volume level worm feature is moved and the link is broken I guess. Even
> I am not getting
> > the link to that. If anyone can point to the document it would be great.
> > Below is the link to the documentation on file level worm feature.
> > http://blog.gluster.org/2016/07/worm-write-once-read-
> multiple-retention-and-compliance-2/
> > Regards,
> > Karthik
> >
> >> http://gluster.readthedocs.io/en/latest/Features/worm/?highlight=WORM
> >
> >> Regards,
> >
> >> Li Dan
>
>
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] About WORM feature

2017-07-27 Thread Karthik Subrahmanya
On Thu, Jul 27, 2017 at 9:10 AM, Li, Dan <li...@cn.fujitsu.com> wrote:

> Hi, Karthik
>
> Thanks for your information, which is very useful for me.
>
> I tried to use volume level worm feature and file level worm feature,
> and I think file level worm feature is seemly matching my needs.
> But I find it cannot work with samba. (volume level worm feature is OK)
>
> I mounted worm_vol on samba and setup samba server by the following
> configuration.
> [gluster-worm_vol]
> comment = For samba share of volume worm_vol
> vfs objects = glusterfs
> glusterfs:volume = worm_vol
> glusterfs:logfile = /var/log/samba/glusterfs-worm_vol.%M.log
> glusterfs:loglevel = 7
> path = /
> read only = no
> guesn ok = yes
> kernel share modes = no
>
> The following error has been reported when I set the feature on the volume.
> [root@node mnt]# gluster  vol set  worm_vol features.worm-file-level on
> volume set: failed: One of the client 127.0.0.1:1023 is running with
> op-version 30703 and doesn't support the required op-version 30800. This
> client needs to be upgraded or disconnected before running this command
> again
>
The file level worm feature is implemented in glusterfs-3.8 but on your
machine 127.0.0.1:1023 you are running glusterfs-3.7 (which is EOL). You
should update the machines to have glusterfs 3.8 or above to avail this
feature. And it is recommended to have all the machines running the same
version of gluster.

>
> Other features of worm can be setted without error.
> [root@ node mnt]# gluster  vol set  worm_vol features.default-retention-period
> 30
> volume set: success
>
This was successful because we do not check for the op-version when when
setting this configurable.

HTH,
Karthik

>
> The glusterd log shows as following:
> [2021-07-25 06:41:09.078135] I [MSGID: 106022] 
> [glusterd-handshake.c:803:_client_supports_volume]
> 0-glusterd: Client 127.0.0.1:1017 (1 -> 30703) doesn't support required
> op-version (30800). Rejecting volfile request.
>
> The version info is as following:
> samba-vfs-glusterfs-4.4.4-14.el7_3.x86_64
> glusterfs-3.10.3.tar.gz
>
> Can you give me some suggestion? Thanks!
>
> Regards,
> Li Dan
>
>
>
> > From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> > Sent: Wednesday, July 26, 2017 8:10 PM
> > To: Li, Dan/李 丹 <li...@cn.fujitsu.com>
> > Cc: gluster-devel@gluster.org
> > Subject: Re: [Gluster-devel] About WORM feature
> >
> > Hi,
> >
> >> On Wed, Jul 26, 2017 at 3:23 PM, Li, Dan <li...@cn.fujitsu.com> wrote:
> >> Hi, all
> >
> >> I cannot find the introduction of WORM feature in the latest version of
> documents.
> >> Does it means that this feature is not recommended anymore?
> > We recommend the feature, but unfortunately the documentation on
> > volume level worm feature is moved and the link is broken I guess. Even
> I am not getting
> > the link to that. If anyone can point to the document it would be great.
> > Below is the link to the documentation on file level worm feature.
> > http://blog.gluster.org/2016/07/worm-write-once-read-
> multiple-retention-and-compliance-2/
> > Regards,
> > Karthik
> >
> >> http://gluster.readthedocs.io/en/latest/Features/worm/?highlight=WORM
> >
> >> Regards,
> >
> >> Li Dan
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] About WORM feature

2017-07-26 Thread Karthik Subrahmanya
Hi,

On Wed, Jul 26, 2017 at 3:23 PM, Li, Dan  wrote:

> Hi, all
>
> I cannot find the introduction of WORM feature in the latest version of
> documents.
> Does it means that this feature is not recommended anymore?
>
We recommend the feature, but unfortunately the documentation on
volume level worm feature is moved and the link is broken I guess. Even I
am not getting
the link to that. If anyone can point to the document it would be great.
Below is the link to the documentation on file level worm feature.
http://blog.gluster.org/2016/07/worm-write-once-read-multiple-retention-and-compliance-2/

Regards,
Karthik

>
> http://gluster.readthedocs.io/en/latest/Features/worm/?highlight=WORM
>

Regards,
>
> Li Dan
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] geo-rep regression because of node-uuid change

2017-06-21 Thread Karthik Subrahmanya
On Wed, Jun 21, 2017 at 1:56 PM, Xavier Hernandez 
wrote:

> That's ok. I'm currently unable to write a patch for this on ec.

Sunil is working on this patch.

~Karthik

> If no one can do it, I can try to do it in 6 - 7 hours...
>
> Xavi
>
>
> On Wednesday, June 21, 2017 09:48 CEST, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>
>
>
> On Wed, Jun 21, 2017 at 1:00 PM, Xavier Hernandez 
> wrote:
>>
>> I'm ok with reverting node-uuid content to the previous format and create
>> a new xattr for the new format. Currently, only rebalance will use it.
>>
>> Only thing to consider is what can happen if we have a half upgraded
>> cluster where some clients have this change and some not. Can rebalance
>> work in this situation ? if so, could there be any issue ?
>
>
> I think there shouldn't be any problem, because this is in-memory xattr so
> layers below afr/ec will only see node-uuid xattr.
> This also gives us a chance to do whatever we want to do in future with
> this xattr without any problems about backward compatibility.
>
> You can check https://review.gluster.org/#/c/17576/3/xlators/cluster/afr/
> src/afr-inode-read.c@1507 for how karthik implemented this in AFR (this
> got merged accidentally yesterday, but looks like this is what we are
> settling on)
>
>
>>
>> Xavi
>>
>>
>> On Wednesday, June 21, 2017 06:56 CEST, Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>
>>
>>
>> On Wed, Jun 21, 2017 at 10:07 AM, Nithya Balachandran <
>> nbala...@redhat.com> wrote:
>>>
>>>
>>> On 20 June 2017 at 20:38, Aravinda  wrote:

 On 06/20/2017 06:02 PM, Pranith Kumar Karampuri wrote:

 Xavi, Aravinda and I had a discussion on #gluster-dev and we agreed to
 go with the format Aravinda suggested for now and in future we wanted some
 more changes for dht to detect which subvolume went down came back up, at
 that time we will revisit the solution suggested by Xavi.

 Susanth is doing the dht changes
 Aravinda is doing geo-rep changes

 Done. Geo-rep patch sent for review https://review.gluster.org/17582


>>>
>>> The proposed changes to the node-uuid behaviour (while good) are going
>>> to break tiering . Tiering changes will take a little more time to be coded
>>> and tested.
>>>
>>> As this is a regression for 3.11 and a blocker for 3.11.1, I suggest we
>>> go back to the original node-uuid behaviour for now so as to unblock the
>>> release and target the proposed changes for the next 3.11 releases.
>>>
>>
>> Let me see if I understand the changes correctly. We are restoring the
>> behavior of node-uuid xattr and adding a new xattr for parallel rebalance
>> for both afr and ec, correct? Otherwise that is one more regression. If
>> yes, we will also wait for Xavi's inputs. Jeff accidentally merged the afr
>> patch yesterday which does these changes. If everyone is in agreement, we
>> will leave it as is and add similar changes in ec as well. If we are not in
>> agreement, then we will let the discussion progress :-)
>>
>>
>>>
>>>
>>> Regards,
>>> Nithya
>>>
 --
 Aravinda



 Thanks to all of you guys for the discussions!

 On Tue, Jun 20, 2017 at 5:05 PM, Xavier Hernandez <
 xhernan...@datalab.es> wrote:
>
> Hi Aravinda,
>
> On 20/06/17 12:42, Aravinda wrote:
>>
>> I think following format can be easily adopted by all components
>>
>> UUIDs of a subvolume are seperated by space and subvolumes are
>> separated
>> by comma
>>
>> For example, node1 and node2 are replica with U1 and U2 UUIDs
>> respectively and
>> node3 and node4 are replica with U3 and U4 UUIDs respectively
>>
>> node-uuid can return "U1 U2,U3 U4"
>
>
> While this is ok for current implementation, I think this can be
> insufficient if there are more layers of xlators that require to indicate
> some sort of grouping. Some representation that can represent hierarchy
> would be better. For example: "(U1 U2) (U3 U4)" (we can use spaces or 
> comma
> as a separator).
>
>>
>>
>> Geo-rep can split by "," and then split by space and take first UUID
>> DHT can split the value by space or comma and get unique UUIDs list
>
>
> This doesn't solve the problem I described in the previous email. Some
> more logic will need to be added to avoid more than one node from each
> replica-set to be active. If we have some explicit hierarchy information 
> in
> the node-uuid value, more decisions can be taken.
>
> An initial proposal I made was this:
>
> DHT[2](AFR[2,0](NODE(U1), NODE(U2)), AFR[2,0](NODE(U1), NODE(U2)))
>
> This is harder to parse, but gives a lot of information: DHT with 2
> subvolumes, each subvolume is an AFR with replica 2 and no arbiters. It's
> also easily extensible with any new xlator that changes the 

Re: [Gluster-devel] geo-rep regression because of node-uuid change

2017-06-20 Thread Karthik Subrahmanya
On Tue, Jun 20, 2017 at 4:12 PM, Aravinda  wrote:

> I think following format can be easily adopted by all components
>
> UUIDs of a subvolume are seperated by space and subvolumes are separated
> by comma
>
> For example, node1 and node2 are replica with U1 and U2 UUIDs respectively
> and
> node3 and node4 are replica with U3 and U4 UUIDs respectively
>
> node-uuid can return "U1 U2,U3 U4"
>
> Geo-rep can split by "," and then split by space and take first UUID
> DHT can split the value by space or comma and get unique UUIDs list
>
> Another question is about the behavior when a node is down, existing
> node-uuid xattr will not return that UUID if a node is down.

After the change [1], if a node is down we send all zeros as the uuid for
that node, in the list of node uuids.

[1] https://review.gluster.org/#/c/17084/

Regards,
Karthik

> What is the behavior with the proposed xattr?
>
> Let me know your thoughts.
>
> regards
> Aravinda VK
>
>
> On 06/20/2017 03:06 PM, Aravinda wrote:
>
>> Hi Xavi,
>>
>> On 06/20/2017 02:51 PM, Xavier Hernandez wrote:
>>
>>> Hi Aravinda,
>>>
>>> On 20/06/17 11:05, Pranith Kumar Karampuri wrote:
>>>
 Adding more people to get a consensus about this.

 On Tue, Jun 20, 2017 at 1:49 PM, Aravinda > wrote:


 regards
 Aravinda VK


 On 06/20/2017 01:26 PM, Xavier Hernandez wrote:

 Hi Pranith,

 adding gluster-devel, Kotresh and Aravinda,

 On 20/06/17 09:45, Pranith Kumar Karampuri wrote:



 On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez
 
 >> wrote:

 On 20/06/17 09:31, Pranith Kumar Karampuri wrote:

 The way geo-replication works is:
 On each machine, it does getxattr of node-uuid and
 check if its
 own uuid
 is present in the list. If it is present then it
 will consider
 it active
 otherwise it will be considered passive. With this
 change we are
 giving
 all uuids instead of first-up subvolume. So all
 machines think
 they are
 ACTIVE which is bad apparently. So that is the
 reason. Even I
 felt bad
 that we are doing this change.


 And what about changing the content of node-uuid to
 include some
 sort of hierarchy ?

 for example:

 a single brick:

 NODE()

 AFR/EC:

 AFR[2](NODE(), NODE())
 EC[3,1](NODE(), NODE(), NODE())

 DHT:

 DHT[2](AFR[2](NODE(), NODE()),
 AFR[2](NODE(),
 NODE()))

 This gives a lot of information that can be used to
 take the
 appropriate decisions.


 I guess that is not backward compatible. Shall I CC
 gluster-devel and
 Kotresh/Aravinda?


 Is the change we did backward compatible ? if we only require
 the first field to be a GUID to support backward compatibility,
 we can use something like this:

 No. But the necessary change can be made to Geo-rep code as well if
 format is changed, Since all these are built/shipped together.

 Geo-rep uses node-id as follows,

 list = listxattr(node-uuid)
 active_node_uuids = list.split(SPACE)
 active_node_flag = True if self.node_id exists in active_node_uuids
 else False

>>>
>>> How was this case solved ?
>>>
>>> suppose we have three servers and 2 bricks in each server. A replicated
>>> volume is created using the following command:
>>>
>>> gluster volume create test replica 2 server1:/brick1 server2:/brick1
>>> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2
>>>
>>> In this case we have three replica-sets:
>>>
>>> * server1:/brick1 server2:/brick1
>>> * server2:/brick2 server3:/brick1
>>> * server3:/brick2 server2:/brick2
>>>
>>> Old AFR implementation for node-uuid always returned the uuid of the
>>> node of the first brick, so in this case we will get the uuid of the three
>>> nodes because all of them are the first brick of a replica-set.
>>>
>>> Does this mean that with this configuration all nodes are active ? Is
>>> this a 

Re: [Gluster-devel] tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t - regression failures

2017-05-12 Thread Karthik Subrahmanya
Hey Atin,

I had a look at
"tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t".
The test case passes in my local system with latest master. I also tried
cherry picking some of the patches which failed regression but it passed on
my system.
In the list https://fstat.gluster.org/weeks/1/failure/214 many of the
patches passed this test case in the later phase and are already merged on
master.

For many patches the test case failed for first time and passed when it
tried for second time.
In some cases it failed with EIO while doing "ls" for the file, but the
immediate "cat" on the file passed.
It has some dependency on the cli option to resolve gfid split-brain, which
is under progress.
So as discussed with Ravi, we were planning to mark it as bad at the
moment. Is that fine?

Regards,
Karthik

On Fri, May 12, 2017 at 3:33 PM, Atin Mukherjee  wrote:

> Refer https://fstat.gluster.org/weeks/1 .  tests/basic/afr/add-brick-
> self-heal.t
> 
> is the 2nd in the list.
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Thank You!

2016-07-12 Thread Karthik Subrahmanya


- Original Message -
> From: "Niels de Vos" <nde...@redhat.com>
> To: "Karthik Subrahmanya" <ksubr...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>, josephau...@gmail.com, 
> "vivek sb agarwal"
> <vivek.sb.agar...@gmail.com>, "Vijaikumar Mallikarjuna" 
> <vijaym.see...@gmail.com>
> Sent: Wednesday, July 13, 2016 2:30:35 AM
> Subject: Re: [Gluster-devel] Thank You!
> 
> On Sat, Jul 09, 2016 at 11:35:13AM -0400, Karthik Subrahmanya wrote:
> > Hi all,
> > 
> > I am a intern joined on 11th of January 2016, and worked on the
> > WORM/Retention feature for GlusterFS. It is released as an
> > experimental feature with the GlusterFS v3.8. The blog post on
> > the feature is published on "Planet Gluster" [1] and
> > "blog.gluster.org" [2].
> > 
> > Monday 11th July 2016 I am getting converted as "Associate Software
> > Engineer".
> > I would like to take this opportunity to thank all of you for all your
> > valuable
> > guidance, support and help during this period. I hope you will guide me in
> > my future works, correct me when I am wrong and help me top learn more.
> 
> Congrats! Keep up the good work on improving the feature, sending the
> awesome weekly status updates and detailed blog posts. You've set the
> bar high for yourself already ;-)
> 
> Good to know that you will stick around for a while.

Hey,

Thanks Niels :). I am asked to work on the "Gluster Must Fixes" and
understand more about various components of gluster.
I'll try to improve the feature during my free time. Thanks for
all your support and feedback.

Regards,
Karthik
> 
> Cheers,
> Niels
> 
> 
> > Thank you all.
> > 
> > [1] http://planet.gluster.org/
> > [2]
> > https://blog.gluster.org/2016/07/worm-write-once-read-multiple-retention-and-compliance-2/
> > 
> > Regards,
> > Karthik
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Thank You!

2016-07-11 Thread Karthik Subrahmanya


- Original Message -
> From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
> To: "Karthik Subrahmanya" <ksubr...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>, josephau...@gmail.com, 
> "vivek sb agarwal"
> <vivek.sb.agar...@gmail.com>, "Vijaikumar Mallikarjuna" 
> <vijaym.see...@gmail.com>
> Sent: Monday, July 11, 2016 11:34:41 AM
> Subject: Re: [Gluster-devel] Thank You!
> 
> 
> 
> - Original Message -
> > From: "Karthik Subrahmanya" <ksubr...@redhat.com>
> > To: "Gluster Devel" <gluster-devel@gluster.org>, josephau...@gmail.com,
> > "vivek sb agarwal"
> > <vivek.sb.agar...@gmail.com>, "Vijaikumar Mallikarjuna"
> > <vijaym.see...@gmail.com>
> > Sent: Saturday, July 9, 2016 9:05:13 PM
> > Subject: [Gluster-devel] Thank You!
> > 
> > Hi all,
> > 
> > I am a intern joined on 11th of January 2016, and worked on the
> > WORM/Retention feature for GlusterFS. It is released as an
> > experimental feature with the GlusterFS v3.8. The blog post on
> > the feature is published on "Planet Gluster" [1] and
> > "blog.gluster.org" [2].
> > 
> > Monday 11th July 2016 I am getting converted as "Associate Software
> > Engineer".
> 
> Awesome!! Where is the treat :)? Congratulations and welcome. Hope you'll
> find enough ways to contribute constructively.
Thank you :)

~Karthik
> 
> > I would like to take this opportunity to thank all of you for all your
> > valuable
> > guidance, support and help during this period. I hope you will guide me in
> > my future works, correct me when I am wrong and help me top learn more.
> > Thank you all.
> > 
> > [1] http://planet.gluster.org/
> > [2]
> > https://blog.gluster.org/2016/07/worm-write-once-read-multiple-retention-and-compliance-2/
> > 
> > Regards,
> > Karthik
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 01/07/2016

2016-07-01 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Implemented the posix_do_futimes function and sent the patch for review
-Fixed the review comments on the patch
-Working on writing a test script for the posix_do_futimes function


Plan for next week:

-Completing the test script


Current work:

Patches: http://review.gluster.org/#/c/13429/
 http://review.gluster.org/#/c/14222/
 http://review.gluster.org/#/c/14539/
 http://review.gluster.org/#/c/14619/
 http://review.gluster.org/#/c/14815/
Spec: 
https://github.com/gluster/glusterfs-specs/blob/master/under_review/worm-compliance.md
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1326308
  https://bugzilla.redhat.com/show_bug.cgi?id=1333263
  https://bugzilla.redhat.com/show_bug.cgi?id=1339524
  https://bugzilla.redhat.com/show_bug.cgi?id=1341556
  https://bugzilla.redhat.com/show_bug.cgi?id=1342259
  https://bugzilla.redhat.com/show_bug.cgi?id=1345900
  https://bugzilla.redhat.com/show_bug.cgi?id=1350406

Thanks & Regards,
Karthik
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Review request for http://review.gluster.org/#/c/14815/

2016-06-27 Thread Karthik Subrahmanya
Hi,

I have added implementation for the posix_do_futimes function, which is
not complete in the current implementation. I did the implementation by
looking into the current implementation of posix_do_utimes function.
Please review the patch and correct me if I am wrong anywhere.

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1350406
Patch: http://review.gluster.org/#/c/14815/

Thanks & Regards,
Karthik
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 24/06/2016

2016-06-24 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Updated the blog
-Working on implementation of automatic state transition with WRITE FOP
-While trying to set the utimes with write FOP, getting "posix_do_futimes not 
implemented" error
-Looking into the posix code to implement the "posix_do_futimes" function


Plan for next week:

-Implementing the "posix_do_futimes" function and sending a patch for review


Current work:

Patch: http://review.gluster.org/#/c/13429/
   http://review.gluster.org/#/c/14222/
   http://review.gluster.org/#/c/14539/
   http://review.gluster.org/#/c/14619/
Spec: 
https://github.com/gluster/glusterfs-specs/blob/master/under_review/worm-compliance.md
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1326308
  https://bugzilla.redhat.com/show_bug.cgi?id=1333263
  https://bugzilla.redhat.com/show_bug.cgi?id=1339524
  https://bugzilla.redhat.com/show_bug.cgi?id=1341556
  https://bugzilla.redhat.com/show_bug.cgi?id=1342259
  https://bugzilla.redhat.com/show_bug.cgi?id=1345900
Blog: 
https://docs.google.com/document/d/1YSTDsbA93--AIU_myjqKkK2qTput0i0jOG2VMw_7mjc/edit?ts=5739915e#

Thanks & Regards,
Karthik
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 17/06/2016

2016-06-17 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Tested the feature
-Working on state automatic state transition with write FOP


Plan for next week:

-Completing the state transition with write FOP
-Working on performance improvement part of the feature


Current work:

Patch: http://review.gluster.org/#/c/13429/
   http://review.gluster.org/#/c/14222/
   http://review.gluster.org/#/c/14539/
   http://review.gluster.org/#/c/14619/
Spec: 
https://github.com/gluster/glusterfs-specs/blob/master/under_review/worm-compliance.md
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1326308
  https://bugzilla.redhat.com/show_bug.cgi?id=1333263
  https://bugzilla.redhat.com/show_bug.cgi?id=1339524
  https://bugzilla.redhat.com/show_bug.cgi?id=1341556
  https://bugzilla.redhat.com/show_bug.cgi?id=1342259
  https://bugzilla.redhat.com/show_bug.cgi?id=1345900
Blog: 
https://docs.google.com/document/d/1YSTDsbA93--AIU_myjqKkK2qTput0i0jOG2VMw_7mjc/edit?ts=5739915e#
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster Build System not updating the Verified tag

2016-06-10 Thread Karthik Subrahmanya


- Original Message -
> From: "Nigel Babu" <nig...@redhat.com>
> To: "Karthik Subrahmanya" <ksubr...@redhat.com>
> Cc: "Niels de Vos" <nde...@redhat.com>, "Jeff Darcy" <jda...@redhat.com>, 
> "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Saturday, June 11, 2016 7:05:24 AM
> Subject: Re: Gluster Build System not updating the Verified tag
> 
> Please send an email to gluster-infra@ with review requests where you want
> the tag removed. We'll remove it for you.
Thank you for the response. I have sent a review request to gluster-infra.

Thanks,
Karthik
> 
> On Fri, Jun 10, 2016 at 11:03 PM, Karthik Subrahmanya <ksubr...@redhat.com>
> wrote:
> 
> > Hi,
> >
> > Due to the issues with the regression machines the Gluster Build System
> > had posted
> > -1 to the verified tag of http://review.gluster.org/#/c/14619/ previously
> > this week.
> > Now all the regression tests and the build are successful, even then it is
> > not updating
> > the verified tag for the patch. It would be great if someone can help me
> > to resolve the
> > issue.
> > This is a blocker bug for Gluster 3.8 and need to be merged ASAP.
> >
> > Thanks & Regards,
> > Karthik
> >
> 
> 
> 
> --
> nigelb
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Gluster Build System not updating the Verified tag

2016-06-10 Thread Karthik Subrahmanya
Hi,

Due to the issues with the regression machines the Gluster Build System had 
posted
-1 to the verified tag of http://review.gluster.org/#/c/14619/ previously this 
week.
Now all the regression tests and the build are successful, even then it is not 
updating
the verified tag for the patch. It would be great if someone can help me to 
resolve the
issue.
This is a blocker bug for Gluster 3.8 and need to be merged ASAP.

Thanks & Regards,
Karthik
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 03/06/2016

2016-06-03 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Fixed the review comments on http://review.gluster.org/#/c/14222/
 and patch got merged.
-Updated the 3.8 release notes for WORM-Retention feature
-Completed the blog for the feature, as requested by Amye
-Found a bug with the write FOP, and fixed it
 http://review.gluster.org/#/c/14619/


Plan for next week:

-Working on the performance improvement part


Current work:

Patch: http://review.gluster.org/#/c/13429/
   http://review.gluster.org/#/c/14222/
   http://review.gluster.org/#/c/14539/
   http://review.gluster.org/#/c/14619/
Spec: 
https://github.com/gluster/glusterfs-specs/blob/master/under_review/worm-compliance.md
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1326308
  https://bugzilla.redhat.com/show_bug.cgi?id=1333263
  https://bugzilla.redhat.com/show_bug.cgi?id=1339524
  https://bugzilla.redhat.com/show_bug.cgi?id=1341556
  https://bugzilla.redhat.com/show_bug.cgi?id=1342259
Blog: 
https://docs.google.com/document/d/1YSTDsbA93--AIU_myjqKkK2qTput0i0jOG2VMw_7mjc/edit?ts=5739915e#


Thanks & Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Smoke Test not running

2016-06-03 Thread Karthik Subrahmanya
Hi,

The smoke tests are not running. Can someone have a look?
[2] has blocked all the other tests from long time.

[1]https://build.gluster.org/job/netbsd6-smoke/
[2]https://build.gluster.org/job/netbsd6-smoke/14076/changes#detail0

Thanks & Regards,
Karthik
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Change in glusterfs[master]: features/worm: updating function names & unwinding FOPs with...

2016-05-31 Thread Karthik Subrahmanya
Hi Jeff,

Thank you for your time and the valuable reviews.
I have addressed the review comments. Can you please have a look.

Thanks & Regards,
Karthik


- Original Message -
> From: "Jeff Darcy (Code Review)" 
> To: "Karthik U S" 
> Cc: "Gluster Build System" , "Niels de Vos" 
> , "NetBSD Build System"
> , "Raghavendra Talur" , 
> "Vijaikumar Mallikarjuna"
> , "Joseph Fernandes" 
> Sent: Friday, May 27, 2016 2:52:28 AM
> Subject: Change in glusterfs[master]: features/worm: updating function names 
> & unwinding FOPs with...
> 
> Jeff Darcy has posted comments on this change.
> 
> Change subject: features/worm: updating function names & unwinding FOPs with
> op_errno
> ..
> 
> 
> Patch Set 4:
> 
> (1 comment)
> 
> http://review.gluster.org/#/c/14222/4/xlators/features/read-only/src/worm.c
> File xlators/features/read-only/src/worm.c:
> 
> Line 83: goto out;
> > Done.
> I think we can - and should - do better.  We don't adhere to a strict "only
> return from one place" policy, precisely because sometimes the contortions
> needed to comply make the code even less readable.  Playing hopscotch among
> several goto labels certainly qualifies, and we can see several clues that
> simplification is still possible:
> 
> (a) Whether we call STACK_WIND or STACK_UNWIND always corresponds to whether
> op_errno is zero or not.  At 60 we unwind with non-zero.  At 63, 67, and 71
> we wind with zero.  At 74 we unwind with non-zero.
> 
> (b) After we've wound or unwound, we return op_errno even though it's always
> zero by then (from 68 or 76).  In other words, we don't actually need
> op_errno by the time we return.
> 
> These "coincidences" suggest that an approach similar to that used in other
> translators will work.
> 
> int op_errno = 0;
> 
> /* Example: error or failure. */
> if (is_readonly...) {
>   op_errno = EROFS;
>   goto out;
> }
> 
> /* Example: optimization or easy case. */
> if (is_wormfile...) {
>   goto out;
> }
> 
> /* Example: result from another function. */
> op_errno = gf_worm_state_transition...;
> 
>   out:
> 
> /* Common cleanup actions could go here... */
> 
> if (op_errno) {
>   STACK_UNWIND (..., -1, op_errno, ...);
> } else {
>   STACK_WIND (...);
> }
> 
> /* ...or here. */
> 
> return 0;
> 
> Sometimes this is flipped around, with ret/op_errno/whatever initially set to
> an error value and only set to zero when we're sure of success.  Which to
> use is mostly a matter of whether success or failure paths are more common.
> In any case, this makes our state explicit in op_errno.  It's easier to
> verify/ensure that we always wind on success and unwind (with a non-zero
> op_errno) on failure, and that we return zero either way.  We've had many
> bugs in other translators that were the result of "escaping" from a fop
> function with neither a wind nor an unwind, and those tend to be hard to
> debug.  Making it hard for such mistakes to creep in when another engineer
> modifies this code a year from now is very valuable.  Also, before anyone
> else assumes otherwise, we don't have Coverity or clang or any other kind of
> rules to detect those particular things automatically.
> 
> I know it's a pain, and it's late in the game, but this seems to be a
> technical-debt-reduction patch already (as opposed to a true bug fix) so
> let's reduce as much as we can at once instead of having to review and
> regression-test the same code twice.  BTW, the same pattern recurs in
> setattr/setfattr, and there's a typo (perfix/prefix) in the commit message.
> 
> 
> --
> To view, visit http://review.gluster.org/14222
> To unsubscribe, visit http://review.gluster.org/settings
> 
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I3a2f114061aae4b422df54e91c4b3f702af5d0b0
> Gerrit-PatchSet: 4
> Gerrit-Project: glusterfs
> Gerrit-Branch: master
> Gerrit-Owner: Karthik U S 
> Gerrit-Reviewer: Gluster Build System 
> Gerrit-Reviewer: Jeff Darcy 
> Gerrit-Reviewer: Joseph Fernandes
> Gerrit-Reviewer: Joseph Fernandes 
> Gerrit-Reviewer: Karthik U S 
> Gerrit-Reviewer: NetBSD Build System 
> Gerrit-Reviewer: Niels de Vos 
> Gerrit-Reviewer: Raghavendra Talur 
> Gerrit-Reviewer: Vijaikumar Mallikarjuna 
> Gerrit-HasComments: Yes
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 27/05/2016

2016-05-27 Thread Karthik Subrahmanya
Hi all,

This week's status:

-http://review.gluster.org/#/c/14182/ got merged
-Got some review comments from Niels and Jeff Darcy on 
http://review.gluster.org/#/c/14222/.
 Thanks for the review. Fixed those review comments.
-Working on performance improvements of the translator


Plan for next week:

-Working on performance improvements of the feature
-Updating the WORM/Retention blog


Current work:

Patch: http://review.gluster.org/#/c/13429/
   http://review.gluster.org/#/c/14222/
   http://review.gluster.org/#/c/14539/
Spec: 
https://github.com/gluster/glusterfs-specs/blob/master/under_review/worm-compliance.md
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1326308
  https://bugzilla.redhat.com/show_bug.cgi?id=1333263
  https://bugzilla.redhat.com/show_bug.cgi?id=1339524
Blog: 
https://docs.google.com/document/d/1YSTDsbA93--AIU_myjqKkK2qTput0i0jOG2VMw_7mjc/edit?ts=5739915e#


Thanks & Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [smoke failure] Permission denied error while install-pygluypPYTHON

2016-05-17 Thread Karthik Subrahmanya

> 
> http://review.gluster.org/14337 has been posted but still needs some
> reviewing.
> 
> Niels
> 

Thanks!

~Karthik
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [smoke failure] Permission denied error while install-pygluypPYTHON

2016-05-17 Thread Karthik Subrahmanya
Got the same issue while backporting to 3.8 branch. Can some one please have a 
look?
https://build.gluster.org/job/smoke/27834/console

Regards,
Karthik

- Original Message -
> From: "Aravinda" 
> To: "Kaushal M" 
> Cc: "Gluster Devel" 
> Sent: Friday, May 13, 2016 11:50:34 AM
> Subject: Re: [Gluster-devel] [smoke failure] Permission denied error while 
> install-pygluypPYTHON
> 
> Refreshed the patch to fix glupy, systemd and mount.glusterfs files.
> http://review.gluster.org/14315
> regards
> Aravinda
> On 05/13/2016 10:25 AM, Kaushal M wrote:
> 
> 
> 
> On Fri, May 13, 2016 at 9:59 AM, Aravinda  wrote:
> 
> 
> 
> Sent patch to fix glupy installation issue.
> http://review.gluster.org/#/c/14315/ regards
> Aravinda
> 
> On 05/12/2016 11:28 PM, Aravinda wrote:
> 
> Sorry miss from my side. Updated list of files/dir which do not honour
> --prefix
> 
> usr/lib/
> usr/lib/systemd
> usr/lib/systemd/system
> usr/lib/systemd/system/glusterd.service
> usr/lib/python2.7
> usr/lib/python2.7/site-packages
> usr/lib/python2.7/site-packages/gluster
> usr/lib/python2.7/site-packages/gluster/__init__.pyo
> usr/lib/python2.7/site-packages/gluster/__init__.pyc
> usr/lib/python2.7/site-packages/gluster/__init__.py
> usr/lib/python2.7/site-packages/gluster/glupy
> usr/lib/python2.7/site-packages/gluster/glupy/__init__.pyo
> usr/lib/python2.7/site-packages/gluster/glupy/__init__.pyc
> usr/lib/python2.7/site-packages/gluster/glupy/__init__.py
> sbin/
> sbin/mount.glusterfs
> Thanks for identifying the list of paths. We need to fix all of this.
> I've opened a bug [1] so that this can be correctly tracked and fixed.
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1335717
> 
> 
> 
> Things I did to find above list.
> ./autogen.sh
> ./configure --prefix=/usr/local
> DESTDIR=/tmp/glusterfs make install
> 
> Then listed all the files which are in /tmp/glusterfs except
> /tmp/glusterfs/usr/local
> 
> regards
> Aravinda
> 
> On 05/12/2016 08:56 PM, Aravinda wrote:
> 
> 
> regards
> Aravinda
> 
> On 05/12/2016 08:23 PM, Niels de Vos wrote:
> 
> On Thu, May 12, 2016 at 04:28:40PM +0530, Aravinda wrote:
> 
> regards
> Aravinda
> 
> On 05/12/2016 04:08 PM, Kaushal M wrote:
> 
> The install path should be `$DESTDIR/$PREFIX/`.
> 
> PREFIX should be the path under which the file is going to be installed.
> 
> Yes. That is substituted during ./configure if --prefix is passed, otherwise
> generated Makefile will have $prefix variable. I think glupy need to
> installed on /usr/lib/python2.6/site-packages/  to import python packages
> globally while testing. Same rule is used to deploy systemd unit files.
> (Prefix is not used)
> 
> I'm not convinced about this yet. If someone decides to use --prefix, I
> think we should honour that everywhere. If that is not common, we can
> introduce an additional ./configure option for the uncommon use-cases
> like the Python site-packages.
> 
> Do you have a reference where the --prefix option explains that some
> contents may not use it?
> 
> Following files/dirs are not honoring prefix, I am not sure about the exact
> reason(for example, /var/log or /var/lib/glusterd)
> 
> sbin
> sbin/mount.glusterfs
> usr/lib/
> usr/lib/systemd
> usr/lib/systemd/system
> usr/lib/systemd/system/glustereventsd.service
> usr/lib/systemd/system/glusterd.service
> usr/lib/python2.7
> usr/lib/python2.7/site-packages
> usr/lib/python2.7/site-packages/gluster
> usr/lib/python2.7/site-packages/gluster/__init__.pyo
> usr/lib/python2.7/site-packages/gluster/__init__.pyc
> usr/lib/python2.7/site-packages/gluster/__init__.py
> usr/lib/python2.7/site-packages/gluster/glupy
> usr/lib/python2.7/site-packages/gluster/glupy/__init__.pyo
> usr/lib/python2.7/site-packages/gluster/glupy/__init__.pyc
> usr/lib/python2.7/site-packages/gluster/glupy/__init__.py
> var/
> var/lib
> var/lib/glusterd
> var/lib/glusterd/glusterfind
> var/lib/glusterd/glusterfind/.keys
> var/lib/glusterd/groups
> var/lib/glusterd/groups/virt
> var/lib/glusterd/hooks
> var/lib/glusterd/hooks/1
> var/lib/glusterd/hooks/1/delete
> var/lib/glusterd/hooks/1/delete/post
> var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post.py
> var/lib/glusterd/hooks/1/gsync-create
> var/lib/glusterd/hooks/1/gsync-create/post
> var/lib/glusterd/hooks/1/gsync-create/post/S56glusterd-geo-rep-create-post.sh
> var/lib/glusterd/hooks/1/reset
> var/lib/glusterd/hooks/1/reset/post
> var/lib/glusterd/hooks/1/reset/post/S31ganesha-reset.sh
> var/lib/glusterd/hooks/1/stop
> var/lib/glusterd/hooks/1/stop/pre
> var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh
> var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh
> var/lib/glusterd/hooks/1/start
> var/lib/glusterd/hooks/1/start/post
> var/lib/glusterd/hooks/1/start/post/S31ganesha-start.sh
> var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
> var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
> var/lib/glusterd/hooks/1/set
> 

[Gluster-devel] WORM/Retention Feature - 13/05/2016

2016-05-13 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Fixed some review comments on the patch
-Working on re-factoring some parts of the feature


Plan for next week:

-Completing the re-factoring of the feature


Current work:

Patch: http://review.gluster.org/#/c/13429/
   http://review.gluster.org/#/c/14222/
Spec: 
https://github.com/gluster/glusterfs-specs/blob/master/under_review/worm-compliance.md
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1326308
  https://bugzilla.redhat.com/show_bug.cgi?id=1333263
Blog: http://uskarthik.blogspot.in/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 06/05/2016

2016-05-06 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Feature got merged with 3.8 branch
-Fixed the bug #14182 reported by Krutika Dhananjay 
-Fixed the review comments by Vijay  on unwinding the FOPs
 and naming the functions by Raghavendra Talur  bug #1333263


Plan for next week:

-Fixing the review comments on caching the state of the file


Current work:

Patch: http://review.gluster.org/#/c/13429/
   http://review.gluster.org/#/c/14222/
Spec: 
https://github.com/gluster/glusterfs-specs/blob/master/under_review/worm-compliance.md
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1326308
  https://bugzilla.redhat.com/show_bug.cgi?id=1333263
Blog: http://uskarthik.blogspot.in/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Requesting for review

2016-05-06 Thread Karthik Subrahmanya
Hi,

Could you please review this fix?
http://review.gluster.org/#/c/14182/

Thanks,
Karthik
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Worm translator not truly disabled by default?

2016-05-03 Thread Karthik Subrahmanya
Thanks Krutika, Atin, Joseph for the inputs. I will send out a patch with this 
issue fixed.

Regards,
Karthik

- Original Message -
> From: "Joseph Fernandes" <josfe...@redhat.com>
> To: "Karthik Subrahmanya" <ksubr...@redhat.com>, "Krutika Dhananjay" 
> <kdhan...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>, "Atin Mukherjee" 
> <amukh...@redhat.com>
> Sent: Wednesday, May 4, 2016 6:21:23 AM
> Subject: Re: [Gluster-devel] Worm translator not truly disabled by default?
> 
> Well I completely agree with Krutika that doing a getxattr for every FOP is
> not required
> if the worm or worm-file option is off.
> 
> Karthik,
> And you need to check if the worm or worm-file option is set, then only go
> ahead and do the checking.
> For now as the feature is experimental and the whole purpose is to provide
> the WORM/Retention semantic
> experience to user.
> Later when the feature matures, Once the volume is changed to "Enterprise
> WORM/Retention" Mode,there
> would be no going back.
> 
> Could you please send out a patch for this asap ?
> 
> Regards,
> Joe
> 
> - Original Message -
> > From: "Atin Mukherjee" <amukh...@redhat.com>
> > To: "Karthik Subrahmanya" <ksubr...@redhat.com>, "Krutika Dhananjay"
> > <kdhan...@redhat.com>
> > Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > Sent: Tuesday, May 3, 2016 6:22:55 PM
> > Subject: Re: [Gluster-devel] Worm translator not truly disabled by default?
> > 
> > 
> > 
> > On 05/03/2016 05:10 PM, Karthik Subrahmanya wrote:
> > > 
> > > 
> > > - Original Message -
> > >> From: "Krutika Dhananjay" <kdhan...@redhat.com>
> > >> To: "Joseph Fernandes" <josfe...@redhat.com>, "Karthik Subrahmanya"
> > >> <ksubr...@redhat.com>
> > >> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> > >> Sent: Tuesday, May 3, 2016 2:53:02 PM
> > >> Subject: Worm translator not truly disabled by default?
> > >>
> > >> Hi,
> > >>
> > >> I noticed while testing that worm was sending in fgetxattr() fops as
> > >> part
> > >> of a writev() request from the parent, despite being disabled by
> > >> default.
> > >>
> > > This is because of the new feature called "file level worm" which is
> > > introduced in the worm
> > > translator. This will allow to make individual files as worm/retained by
> > > setting the volume
> > > option "worm-file-level". The files which are created when this option is
> > > enabled will have
> > > an xattr called "trusted.worm_file". This is implemented because unlike
> > > read-only or volume
> > > level worm where if the option on the volume is disabled, the entire
> > > translator will get
> > > disabled and you can perform any FOP on the files in that volume. But
> > > here
> > > if a file is once
> > > marked as worm-retained, it should not revert back to the normal state
> > > where we can change
> > > its contents even if the worm-file-level option is reset/disabled. So the
> > > xattr is set on the
> > > file and every time when a write, link, unlink, rename, or truncate fop
> > > comes it checks for
> > > the xattr.
> > I am not sure with what test Krutika observed it, but if any worm
> > tunable is not set then ideally we shouldn't hit it. I believe you set
> > this xattr only when worm-file-level is turned on but that's also
> > disabled by default. Krutika, could you confirm it?
> > > Hope it helps.
> > > 
> > > Thanks & Regards,
> > > Karthik
> > >>
> > >> I've sent a patch for this at http://review.gluster.org/#/c/14182/
> > >> I must admit I do not understand the internals of this new translator.
> > >>
> > >> Request your feedback/review.
> > >>
> > >> -Krutika
> > >>
> > > ___
> > > Gluster-devel mailing list
> > > Gluster-devel@gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Worm translator not truly disabled by default?

2016-05-03 Thread Karthik Subrahmanya


- Original Message -
> From: "Krutika Dhananjay" <kdhan...@redhat.com>
> To: "Joseph Fernandes" <josfe...@redhat.com>, "Karthik Subrahmanya" 
> <ksubr...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Tuesday, May 3, 2016 2:53:02 PM
> Subject: Worm translator not truly disabled by default?
> 
> Hi,
> 
> I noticed while testing that worm was sending in fgetxattr() fops as part
> of a writev() request from the parent, despite being disabled by default.
> 
This is because of the new feature called "file level worm" which is introduced 
in the worm
translator. This will allow to make individual files as worm/retained by 
setting the volume
option "worm-file-level". The files which are created when this option is 
enabled will have
an xattr called "trusted.worm_file". This is implemented because unlike 
read-only or volume
level worm where if the option on the volume is disabled, the entire translator 
will get
disabled and you can perform any FOP on the files in that volume. But here if a 
file is once
marked as worm-retained, it should not revert back to the normal state where we 
can change
its contents even if the worm-file-level option is reset/disabled. So the xattr 
is set on the
file and every time when a write, link, unlink, rename, or truncate fop comes 
it checks for
the xattr.
Hope it helps.

Thanks & Regards,
Karthik
> 
> I've sent a patch for this at http://review.gluster.org/#/c/14182/
> I must admit I do not understand the internals of this new translator.
> 
> Request your feedback/review.
> 
> -Krutika
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster 3.8 : File level WORM/Retention

2016-05-01 Thread Karthik Subrahmanya


- Original Message -
> From: "Joseph Fernandes" <josfe...@redhat.com>
> To: "Gluster Devel" <gluster-devel@gluster.org>
> Cc: "Anoop Chirayath Manjiyil Sajan" <achir...@redhat.com>
> Sent: Monday, May 2, 2016 7:57:00 AM
> Subject: [Gluster-devel] Gluster 3.8 : File level WORM/Retention
> 
> Hi All,
> 
> I would like to Congratulate Karthik for introducing the "File level
> WORM/Retention" feature(Experimental in 3.8)
Thank you Joseph :)
> in Gluster v3.8rc0 (http://review.gluster.org/#/c/13429/ patch merged)
> 
> Would also like to thank Atin, Anoop CS, Vijay M, Niels, Raghavendra Talur
> and Prasanth Pai
Thank you all for your valuable time and guidance :)
> for helping Karthik in doing so (reviews and guidance) :)
> 
> There are few of the action items that still remaining for 3.8 and should be
> done before 3.8 is released.
> 
> Action Items before 3.8 release:
> 
> Address review comments from Atin, Vijay and Raghavendra Talur,
> 1. Testing of effects of WORM Xlator positioning in the brick stack on other
> components
>like barrier(snapshots), Quotas. If there are any immediate bugs.
>Though in the later versions there will be a client side WORM-Cache
>Xlator,
>which will cache worm/retention states of file inodes and return back the
>appropriate errors.
> 
> 2.  Correction on the error path as Vijay has suggested.
> In file worm.c, you are doing unwind in all FOPs with errno as -1, which is
> wrong.
> 
> Change the code something like below
>  if (label == 0)
> goto wind;
>  if (label == 1)
> op_errno = EROFS
>  else if (label == 2)
> op_errno = ENOMEM
> 
>   Unwind here...
>   goto out;
> 
> wind:
>   ret = 0;
>   Wind here...
> 
> out: return ret;
> 
> 3. Talur's comment : Most of the functions in worm-helper need to have
> gf_worm prefix.
> 
> 4. Caching the retention state in the xlator inode context (stretch goal for
> 3.8)
> 
> Please feel free to add/update the list if I have missed something.
I will be addressing these in the subsequent patches.

Thanks & Regards,
Karthik Subrahmanya
> 
> Regards,
> Joe
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ./tests/basic/afr/arbiter-mount.t failing

2016-04-30 Thread Karthik Subrahmanya


- Original Message -
> From: "Karthik Subrahmanya" <ksubr...@redhat.com>
> To: "Atin Mukherjee" <amukh...@redhat.com>
> Cc: "gluster-devel" <gluster-devel@gluster.org>
> Sent: Saturday, April 30, 2016 4:36:11 PM
> Subject: Re: [Gluster-devel] ./tests/basic/afr/arbiter-mount.t failing
> 
> 
> 
> - Original Message -
> > From: "Atin Mukherjee" <amukh...@redhat.com>
> > To: "Karthik Subrahmanya" <ksubr...@redhat.com>, "gluster-devel"
> > <gluster-devel@gluster.org>
> > Sent: Saturday, April 30, 2016 4:22:31 PM
> > Subject: Re: [Gluster-devel] ./tests/basic/afr/arbiter-mount.t failing
> > 
> > 
> > 
> > On 04/30/2016 03:51 PM, Karthik Subrahmanya wrote:
> > > Hi,
> > > 
> > > I am running the CentOS regression for the WORM/Retention feature.
> > > The ./tests/basic/afr/arbiter-mount.t [1] test is failing again and again
> > > even if
> > > the WORM/Retention feature is not enabled.
> > Is it that 20 seconds timeout (NFS_EXPORT_TIMEOUT) is not sufficient?
> > Did you check the glusterd and nfs log whether nfs process has come up?
> > 
> I am not able to download the log files. It is showing "Unable to connect"
> error
> when I try to download.

I re triggered the test. Now it got passed.

> > > Can someone have a look into it?
> > > 
> > > [1]
> > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/20216/console
> > > 
> > > Thanks & Regards,
> > > Karthik Subrahmanya
> > > ___
> > > Gluster-devel mailing list
> > > Gluster-devel@gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ./tests/basic/afr/arbiter-mount.t failing

2016-04-30 Thread Karthik Subrahmanya


- Original Message -
> From: "Atin Mukherjee" <amukh...@redhat.com>
> To: "Karthik Subrahmanya" <ksubr...@redhat.com>, "gluster-devel" 
> <gluster-devel@gluster.org>
> Sent: Saturday, April 30, 2016 4:22:31 PM
> Subject: Re: [Gluster-devel] ./tests/basic/afr/arbiter-mount.t failing
> 
> 
> 
> On 04/30/2016 03:51 PM, Karthik Subrahmanya wrote:
> > Hi,
> > 
> > I am running the CentOS regression for the WORM/Retention feature.
> > The ./tests/basic/afr/arbiter-mount.t [1] test is failing again and again
> > even if
> > the WORM/Retention feature is not enabled.
> Is it that 20 seconds timeout (NFS_EXPORT_TIMEOUT) is not sufficient?
> Did you check the glusterd and nfs log whether nfs process has come up?
> 
I am not able to download the log files. It is showing "Unable to connect" error
when I try to download.
> > Can someone have a look into it?
> > 
> > [1]
> > https://build.gluster.org/job/rackspace-regression-2GB-triggered/20216/console
> > 
> > Thanks & Regards,
> > Karthik Subrahmanya
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Requesting for NetBSD setup

2016-04-29 Thread Karthik Subrahmanya


- Original Message -
> From: "Emmanuel Dreyfus" <m...@netbsd.org>
> To: "Karthik Subrahmanya" <ksubr...@redhat.com>
> Cc: "gluster-devel" <gluster-devel@gluster.org>, gluster-in...@gluster.org
> Sent: Friday, April 29, 2016 12:35:24 PM
> Subject: Re: [Gluster-devel] Requesting for NetBSD setup
> 
> On Fri, Apr 29, 2016 at 01:28:53AM -0400, Karthik Subrahmanya wrote:
> > I would like to ask for a NetBSD setup
> 
> nbslave7[4gh] are disabled in Jenkins right now. They are labeled
> "Disconnected by kaushal", but I don't kno why. Once it is confirmed
> that they are not alread used for testing, you could pick one.
> 
> I still does not know who is the password guardian at Rehat, though.
> 
Thanks for the advice Emmanuel, I think that is going to take some time.
Can you point me some other alternative way to test it in my system? 
I have actually been stuck with this for sometime now and I really can't
understand why it's failing.

Thanks,
Karthik Subrahmanya

> --
> Emmanuel Dreyfus
> m...@netbsd.org
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Requesting for NetBSD setup

2016-04-28 Thread Karthik Subrahmanya
Hi,

I would like to ask for a NetBSD setup as my test cases and patches are 
failing[1]
regression testings during code review at gerrit. Although the aforementioned 
tests
are passing Smoke tests. So I would require the NetBSD setup to understand why 
the
test is failing.
[1]http://review.gluster.org/#/c/13429/

Thanks & Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 22/04/2016

2016-04-22 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Tested the program with different modes of retention
 and by setting different values for the retention profile
-Uploaded the test case (worm.t) with the patch
-Updated the WORM design-specs
-Written blogs about the Semantics of WORM/Retention and WORM on Gluster


Plan for next week:

-Exploring on distaf and writing tests
-Writing blogs on the implementation of the WORM/Retention feature
 on Gluster and how to use the feature


Current work:

POC: http://review.gluster.org/#/c/13429/
Spec: http://review.gluster.org/13538
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1326308
Blog: http://uskarthik.blogspot.in/

Your valuable suggestions, reviews, and wish lists are most welcome

Thanks & Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 15/04/2016

2016-04-15 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Added option to switch between the existing volume level
 WORM and the file level WORM
-Fixed the issue with the rename fop with the distributed
 volume
-Wrote some test cases for the current work
-Updated the design-specs


Plan for next week:

-Handling the other fops which are not yet handled
-Correcting the design-specs to focus only on the WORM/Retention part


Current work:

POC: http://review.gluster.org/#/c/13429/
Spec: http://review.gluster.org/13538
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1326308

Your valuable suggestions, reviews, and wish lists are most welcome

Thanks & Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] WORM/Retention Feature: 07-04-2016

2016-04-12 Thread Karthik Subrahmanya


- Original Message -
> From: "Niels de Vos" <nde...@redhat.com>
> To: "Karthik Subrahmanya" <ksubr...@redhat.com>
> Cc: "gluster-devel" <gluster-devel@gluster.org>
> Sent: Tuesday, April 12, 2016 2:00:57 PM
> Subject: Re: [Gluster-devel] WORM/Retention Feature: 07-04-2016
> 
> On Thu, Apr 07, 2016 at 08:18:43AM -0400, Karthik Subrahmanya wrote:
> > Hi all,
> > 
> > This week's status:
> 
> Many thanks for sending these regular updates!
> 
> Could you create a bug and have it block the 'glusterfs-3.8.0' one? Just
> put that string in the "blocks" field in bugzilla.
> 
> The next time you update the patch in Gerrit, ./rfc.sh will ask you for
> the bug number, and it will add it as a tag in teh commit message. This
> causes Gerrit to mention the patch updates in the bug report. It helps
> me with tracking the progress of the feature.
> 
> Cheers,
> Niels
> 

Hi Niels,

I have filed a bug in RHBZ #1326308

Thanks & Regards,
Karthik
> 
> > 
> > -Explored on Gluster Test Framework
> > -Writing the test file for the WORM feature
> > -Added the volume set options for the WORM
> >  to set some options like enabling/disabling
> >  the feature, setting the retention period,
> >  auto commit period etc.
> > 
> > 
> > Plan for next week:
> > 
> > -Handling the smoke test failure
> > -Updating the code to take the values from the
> >  options set using the volume set command
> > -Completing the test file
> > 
> > 
> > Current work:
> > 
> > POC: http://review.gluster.org/#/c/13429/
> > Spec: http://review.gluster.org/13538
> > Feature page:
> > http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
> > 
> > Your valuable suggestions, reviews, and wish lists are most welcome
> > 
> > Thanks & Regards,
> > Karthik Subrahmanya
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature: 01-04-2016

2016-04-01 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Tested the program with different volume configurations
-Exploring on the gluster test framework
-Writing regression test for the feature


Plan for next week:

-Completing the regression test
-Handling the issue with the smoke regression


Current work:

POC: http://review.gluster.org/#/c/13429/
Spec: http://review.gluster.org/13538
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive

Your valuable suggestions, reviews, and wish lists are most welcome

Thanks & Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Requesting for help with gluster test framework

2016-04-01 Thread Karthik Subrahmanya


- Original Message -
From: "Prasanna Kalever" <pkale...@redhat.com>
To: "Karthik Subrahmanya" <ksubr...@redhat.com>
Cc: "Joseph Fernandes" <josfe...@redhat.com>, "Raghavendra Talur" 
<rta...@redhat.com>, "Vijaikumar Mallikarjuna" <vmall...@redhat.com>, 
"gluster-devel" <gluster-devel@gluster.org>
Sent: Friday, April 1, 2016 5:52:47 PM
Subject: Re: [Gluster-devel] Requesting for help with gluster test framework

On Fri, Apr 1, 2016 at 5:37 PM, Karthik Subrahmanya <ksubr...@redhat.com> wrote:
>
> Hi all,
>
> As I am trying to write a test for the WORM translator
> which I am working on right now, I am facing some issues
> while executing the test framework.
> I followed the steps in
> https://github.com/gluster/glusterfs/blob/master/tests/README.md
>
>
> [Issue #1]
> While running the run-tests.sh
>
> ... GlusterFS Test Framework ...
>
>
> ==
> Running tests in file ./tests/basic/0symbol-check.t
> [11:48:09] ./tests/basic/0symbol-check.t .. Dubious, test returned 1 (wstat 
> 256, 0x100)
> No subtests run
> [11:48:09]
>
> Test Summary Report
> ---
> ./tests/basic/0symbol-check.t (Wstat: 256 Tests: 0 Failed: 0)
>   Non-zero exit status: 1
>   Parse errors: No plan found in TAP output
> Files=1, Tests=0,  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU)
> Result: FAIL
> End of test ./tests/basic/0symbol-check.t
> ==
>
>
> Run complete
> 1 test(s) failed
> ./tests/basic/0symbol-check.t
> 0 test(s) generated core
>
> Slowest 10 tests:
> ./tests/basic/0symbol-check.t  -  1
> Result is 1
>
>
>
> [Issue #2]
> While running a single .t file using "prove -vf"
>
> tests/features/worm.t ..
> Aborting.
> Aborting.
>
> env.rc not found
> env.rc not found
>
> Please correct the problem and try again.
> Please correct the problem and try again.
>
> Dubious, test returned 1 (wstat 256, 0x100)
> No subtests run
>
> Test Summary Report
> ---
> tests/features/worm.t (Wstat: 256 Tests: 0 Failed: 0)
>   Non-zero exit status: 1
>   Parse errors: No plan found in TAP output
> Files=1, Tests=0,  0 wallclock secs ( 0.02 usr +  0.01 sys =  0.03 CPU)
> Result: FAIL
>

This is due to lag of configuration stuff,
run ./autogen.sh && ./configure and then try to run the tests

Thank you for the prompt reply and suggestion. Its working now.

--
Prasanna

>
> It would be awesome if someone can guide me with this.
>
> Thanks & Regards,
> Karthik Subrahmanya
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Requesting for help with gluster test framework

2016-04-01 Thread Karthik Subrahmanya
Hi all,

As I am trying to write a test for the WORM translator
which I am working on right now, I am facing some issues
while executing the test framework.
I followed the steps in
https://github.com/gluster/glusterfs/blob/master/tests/README.md


[Issue #1]
While running the run-tests.sh

... GlusterFS Test Framework ...


==
Running tests in file ./tests/basic/0symbol-check.t
[11:48:09] ./tests/basic/0symbol-check.t .. Dubious, test returned 1 (wstat 
256, 0x100)
No subtests run 
[11:48:09]

Test Summary Report
---
./tests/basic/0symbol-check.t (Wstat: 256 Tests: 0 Failed: 0)
  Non-zero exit status: 1
  Parse errors: No plan found in TAP output
Files=1, Tests=0,  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU)
Result: FAIL
End of test ./tests/basic/0symbol-check.t
==


Run complete
1 test(s) failed 
./tests/basic/0symbol-check.t
0 test(s) generated core 

Slowest 10 tests: 
./tests/basic/0symbol-check.t  -  1
Result is 1



[Issue #2]
While running a single .t file using "prove -vf"

tests/features/worm.t .. 
Aborting.
Aborting.

env.rc not found
env.rc not found

Please correct the problem and try again.
Please correct the problem and try again.

Dubious, test returned 1 (wstat 256, 0x100)
No subtests run 

Test Summary Report
---
tests/features/worm.t (Wstat: 256 Tests: 0 Failed: 0)
  Non-zero exit status: 1
  Parse errors: No plan found in TAP output
Files=1, Tests=0,  0 wallclock secs ( 0.02 usr +  0.01 sys =  0.03 CPU)
Result: FAIL


It would be awesome if someone can guide me with this.

Thanks & Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 24/03/2016

2016-03-24 Thread Karthik Subrahmanya
Hi all,

This week's status:

-Implemented the lazy-auto-commit feature:
 Auto commit period is the time interval at/after which
 the namespace scan should take place to check for the
 dormant file to do the state transition.

 In lazy auto commit each file will have a start time
 as an xattr, in which we store the create time of the
 file. If a file already exists in a volume before the
 worm feature is enabled, then the start time will
 points to the time when the next fop has come for
 that file.

 Currently the transition takes place only with the
 rename, unlink, and the truncate fops. If a file is
 not wormed yet during the next fop and if the
 file's auto-commit period is expired, it will convert
 the file to worm-retained state. Then it blocks the
 unlink, rename, truncate, and write fops.

 If a file is in worm-retained state and its auto-
 commit period is expired when the next fop comes, then
 it converts the file to worm state. It will then block
 the rename, write, and truncate fops but the unlink
 will succeed.

-Updated the WORM-Retention design-specs document


Plan for next week:

-Testing the program
-Handling the other fops to do the state transition

Current work:

POC: http://review.gluster.org/#/c/13429/
Spec: http://review.gluster.org/13538
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive

Your valuable suggestions, reviews, and wish lists are most welcome

Thanks & Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 04/03/2016

2016-03-04 Thread Karthik Subrahmanya
Hi all,

This week's status:

-The implementation of buffered way of storing the WORM-Retention profile:
In this way we will have more space for storing additional profile entries
for a file, for which the requirements may arise in the future.

-Tested the program with distributed and replicated type of volumes.

-Working on handling the utime changes for the WORM-Retained files:
Based on the "Mode of Retention" (Relax/Enterprise) I'm trying to handle
the utime changes.
  -If the mode is set to "Relax", then it allows to both increase
   and decrease the retention time of a WORM-Retained file.
  -If it is "Enterprise" then it allows only to increase the retention time
   of a WORM-Retained file.

Plan for next week:
-Complete the utime handling task
-Implementing the "auto-commit" feature
-Writing a blog on WORM/Retention Semantics
-Addressing the review comments by Prashanth Pai <p...@redhat.com> on the
 WORM/Retention Semantics specs document

Current work:
POC: http://review.gluster.org/#/c/13429/
Spec: http://review.gluster.org/13538
Feature page: 
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive

Your valuable reviews and suggestions are most welcome

Thanks & Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] WORM/Retention Feature - 26/02/2016

2016-02-26 Thread Karthik Subrahmanya
Hi all,

The current status of the project is:

-It works as a file level WORM
-Handles the setattr call if all the write bits are removed
-Sets an xattr storing the WORM/Retention state along with the retention period
-atime of the file will point to the time till which the file is retained
-When a write/unlink/rename/truncate request comes for a WORM/Retained file,
 it returns EROFS error
-Whenever a fop request comes for a file, it will do a lookup
-Lookup will do the state transition if the retention period is expired
-It will reset the state from WORM/Retained to WORM
-The atime of the file will also revert back to the actual atime
-The file will still be read-only and blocks the write, truncate, and the 
rename requests
-The unlink call will succeed for a WORM file
-You can transition back to the WORM/Retained state by again doing setattr


Plans for next week:

-As Niels<nde...@redhat.com> suggestion, preparing a specs document
-Fixing the bugs in the program
-Working on handling the ctime change

You can find the feature page at:
http://www.gluster.org/community/documentation/index.php/Features/gluster_compliance_archive
Patch: http://review.gluster.org/#/c/13429/

Your valuable suggestions, wish lists, and reviews are most welcome.

Regards,
Karthik Subrahmanya
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel