Re: [Gluster-devel] [Gluster-users] Rebalance improvement.

2020-08-03 Thread Susant Palai
On Mon, Aug 3, 2020 at 2:53 PM sankarshan 
wrote:

> On Mon, 3 Aug 2020 at 12:47, Susant Palai  wrote:
> >
> > Centos Users can add the following repo and install the build from the
> master branch to try out the feature. [Testing purpose only, not ready for
> consumption in production env.]
> >
> > [gluster-nightly-master] baseurl=
> http://artifacts.ci.centos.org/gluster/nightly/master/7/x86_64/
> gpgcheck=0 keepalive=1 enabled=1 repo_gpgcheck = 0 name=Gluster Nightly
> builds (master branch)
> >
> > A summary of perf numbers from our test lab :
> >
>
> Are these numbers impacted by sizing of the machine instance/hardware?
> What is the configuration on which these numbers were recorded?
>

There were 4 bricks(one brick per node) to begin with. Post creation of
directories, 2 more bricks(from two more nodes) were added.

Disks - HDD
Network - 10Gbps ethernet link
cores -  24

Numbers will definitely vary according to disk, network configuration.


>
> > DirSize - 1Million Old New %diff Depth - 100 (Run 1) 353 74 +377% Depth
> - 100 (Run 2) 348 72 +377~% Depth - 50 246 122 +100% Depth - 3 174 114 +52%
> >
> > Susant
>
> --
> sankarshan mukhopadhyay
> <https://about.me/sankarshan.mukhopadhyay>
> ___
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
>
>
>
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] [Gluster-users] Rebalance improvement.

2020-08-03 Thread Susant Palai


> On 03-Aug-2020, at 13:58, Aravinda VK  wrote:
> 
> Interesting numbers. Thanks for the effort.
> 
> What is the unit of old/new numbers? seconds? 

Minutes. 

> 
>> On 03-Aug-2020, at 12:47 PM, Susant Palai > <mailto:spa...@redhat.com>> wrote:
>> 
>> Centos Users can add the following repo and install the build from the 
>> master branch to try out the feature. [Testing purpose only, not ready for 
>> consumption in production env.]
>> 
>> [gluster-nightly-master]
>> baseurl=http://artifacts.ci.centos.org/gluster/nightly/master/7/x86_64/ 
>> <http://artifacts.ci.centos.org/gluster/nightly/master/7/x86_64/>
>> gpgcheck=0
>> keepalive=1
>> enabled=1
>> repo_gpgcheck = 0
>> name=Gluster Nightly builds (master branch)
>> 
>> A summary of perf numbers from our test lab :
>> 
>> DirSize - 1Million   Old New %diff
>> Depth- 100 (Run 1)   353 74  +377%
>> Depth- 100 (Run 2)   348 72  +377~%
>> Depth    - 50246 122 +100%
>> Depth- 3 174 114 +52%
>> 
>> Susant
>> 
>> 
>> On Mon, Aug 3, 2020 at 11:16 AM Susant Palai > <mailto:spa...@redhat.com>> wrote:
>> Hi,
>> Recently, we have pushed some performance improvements for Rebalance 
>> Crawl which used to consume a significant amount of time, out of the entire 
>> rebalance process.
>> 
>> 
>> The patch [1] is recently merged in upstream and may land as an experimental 
>> feature in the upcoming upstream release.
>> 
>> The improvement currently works only for pure-distribute Volume. (which can 
>> be expanded).
>> 
>> 
>> Things to look forward to in future :
>>  - Parallel Crawl in Rebalance
>>  - Global Layout
>> 
>> Once these improvements are in place, we would be able to reduce the overall 
>> rebalance time by a significant time.
>> 
>> Would request our community to try out the feature and give us feedback.
>> 
>> More information regarding the same will follow.
>> 
>> 
>> Thanks & Regards,
>> Susant Palai
>> 
>> 
>> [1] https://review.gluster.org/#/c/glusterfs/+/24443/ 
>> <https://review.gluster.org/#/c/glusterfs/+/24443/>
>> 
>> 
>> 
>> Community Meeting Calendar:
>> 
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968 <https://bluejeans.com/441850968>
>> 
>> Gluster-users mailing list
>> gluster-us...@gluster.org <mailto:gluster-us...@gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> Aravinda Vishwanathapura
> https://kadalu.io <https://kadalu.io/>
> 
> 
> 

___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Rebalance improvement.

2020-08-03 Thread Susant Palai
Centos Users can add the following repo and install the build from the
master branch to try out the feature. [Testing purpose only, not ready for
consumption in production env.]

[gluster-nightly-master] baseurl=
http://artifacts.ci.centos.org/gluster/nightly/master/7/x86_64/ gpgcheck=0
keepalive=1 enabled=1 repo_gpgcheck = 0 name=Gluster Nightly builds (master
branch)

A summary of perf numbers from our test lab :

DirSize - 1Million Old New %diff Depth - 100 (Run 1) 353 74 +377% Depth -
100 (Run 2) 348 72 +377~% Depth - 50 246 122 +100% Depth - 3 174 114 +52%

Susant


On Mon, Aug 3, 2020 at 11:16 AM Susant Palai  wrote:

> Hi,
> Recently, we have pushed some performance improvements for Rebalance
> Crawl which used to consume a significant amount of time, out of the entire
> rebalance process.
>
>
> The patch [1] is recently merged in upstream and may land as an
> experimental feature in the upcoming upstream release.
>
> The improvement currently works only for pure-distribute Volume. (which
> can be expanded).
>
>
> Things to look forward to in future :
>  - Parallel Crawl in Rebalance
>  - Global Layout
>
> Once these improvements are in place, we would be able to reduce the
> overall rebalance time by a significant time.
>
> Would request our community to try out the feature and give us feedback.
>
> More information regarding the same will follow.
>
>
> Thanks & Regards,
> Susant Palai
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/24443/
>
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] Rebalance improvement.

2020-08-02 Thread Susant Palai
Hi,
Recently, we have pushed some performance improvements for Rebalance Crawl 
which used to consume a significant amount of time, out of the entire rebalance 
process.


The patch [1] is recently merged in upstream and may land as an experimental 
feature in the upcoming upstream release.

The improvement currently works only for pure-distribute Volume. (which can be 
expanded).


Things to look forward to in future :
 - Parallel Crawl in Rebalance
 - Global Layout

Once these improvements are in place, we would be able to reduce the overall 
rebalance time by a significant time.

Would request our community to try out the feature and give us feedback.

More information regarding the same will follow.


Thanks & Regards,
Susant Palai


[1] https://review.gluster.org/#/c/glusterfs/+/24443/ 
<https://review.gluster.org/#/c/glusterfs/+/24443/>___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Detect is quota enabled or disabled - in DHT code?

2020-01-08 Thread Susant Palai
On Wed, Jan 8, 2020 at 4:29 PM Yaniv Kaul  wrote:

> I'd like to add to the DHT conf something like conf->quota_enabled, so if
> it's not, I can skip quite a bit of work done today in DHT. I'm just unsure
> where, in the init and reconfigure of DHT, I can detect and introduce this.
>
In dht_init and dht_reconfigure you will find examples of getting
information from dictionaries and update conf. But I am wondering if
glusterd communicates(I assume not) quota settngs to client, which is a
server side graph change.


>
> Any ideas?
> TIA,
> Y.
> ___
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968


NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Potential impact of Cloudsync on posix performance

2019-12-18 Thread Susant Palai
On Tue, Dec 17, 2019 at 7:00 PM Yaniv Kaul  wrote:

> I'm looking at the code, and I'm seeing calls everywhere to
> posix_cs_maintenance().
> perhaps we should add to the volume configuration some boolean if
> cloudsync feature is even enabled for that volume?
> https://review.gluster.org/#/c/glusterfs/+/23576/ is a very modest effort
> to reduce the impact, but the real one should not be call these functions
> at all if cloud sync is not enabled.
>
> Thoughts?
>

Agreed. This was discussed before as well. The problem is that currently
there is no easy way to communicate client graph changes to server side(let
me know if I am wrong,I guess we faced similar problem in RIO as well).
The performance penalty without such medium is that we have a key check in
the dictionary(did I miss something else). I am of the opinion that it is
really not costly.

Susant


>
> Y.
> ___
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968


NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Adding ALUA support for Gluster-Block

2018-10-29 Thread Susant Palai
On Mon, Oct 29, 2018 at 2:49 PM Niels de Vos  wrote:

> On Mon, Oct 29, 2018 at 12:06:53PM +0530, Susant Palai wrote:
> > On Fri, Oct 26, 2018 at 6:22 PM Niels de Vos  wrote:
> >
> > > On Fri, Oct 26, 2018 at 05:54:28PM +0530, Susant Palai wrote:
> > > > Hi,
> > > >For ALUA in Gluster-Block, we need fencing support from GlusterFS.
> > > This
> > > > is targeted mainly to avoid corruption issues during fail-over from
> > > > INITIATOR.
> > > >
> > > > You can find the problem statement, design document at [1] and the
> GitHub
> > > > discussions at [2].
> > > >
> > > > Requesting your feedback on the same.
> > >
> > > From a quick glance, this looks very much like leases/delegations that
> > > have been added for Samba and NFS-Ganesha. Can you explain why using
> > > that is not sufficient?
> > >
> > Niels, is it that you are suggesting leases/delegations already solves
> the
> > problem we are trying to solve mentioned in the design document or just
> the
> > mandatory lock part?
>
> I would be interested to know if you can use leases/delegations to solve
> the issue. If you can not, can leases/delegations be extended instead of
> proposing an new API?
>

>From what I understand Block-D keeps all the file open bfore beginning of
the session (exporting file as block devices). Which I guess won't work
with lease, since
lease I guess(please correct me if wrong) breaks the existing lease on  an
open request. Certainly, with selfheal daemon the lease will be released.
Hence, mandatory lock fits here IMO.

@Kalever, Prasanna  Please give your feedback here.


> From theory, the high-available NFS-Ganesha and Samba services should
> have solved similar problems already.
>

>From what I understand the multipath layer does not have any control over
restarting tcmu-runner on Gluster side (If that is how NFS_Ganesha and
Samba provides blacklist for it's clients).
The targetcli does certain tasks only on failover switch which would be
like taking mandatory lock, open a session as mentioned in the design doc.
Hence, no control over data cached at Gluster-client layer to be replayed
in the event of a disconnection.

Again @Kalever, Prasanna  and @Xiubo Li
  will be able to clarify more here.


> IIRC Anoop CS And Soumya have been working on this mostly. If you have
> specific questions about the implementation in Samba or NFS-Ganesha, ask
> on this list and include them on CC.
>
> Also, we do have the (low-volume) integrat...@gluster.org list for
> discussions around integrating gfapi with other projects. There might be
> others that are interested in these kind of details.
>
> Thanks,
> Niels
>
>
> >
> > >
> > > Thanks,
> > > Niels
> > >
> > >
> > > >
> > > > Thanks,
> > > > Susant/Amar/Shyam/Prasanna/Xiubo
> > > >
> > > >
> > > >
> > > > [1]
> > > >
> > >
> https://docs.google.com/document/d/1up5egL9SxmVKFpZMUEuuYML6xS2mNmBGzyZbMaw1fl0/edit?usp=sharing
> > > > [2]
> > > >
> > >
> https://github.com/gluster/gluster-block/issues/53#issuecomment-432924044
> > >
> > > > ___
> > > > Gluster-devel mailing list
> > > > Gluster-devel@gluster.org
> > > > https://lists.gluster.org/mailman/listinfo/gluster-devel
> > >
> > >
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Adding ALUA support for Gluster-Block

2018-10-29 Thread Susant Palai
On Fri, Oct 26, 2018 at 6:22 PM Niels de Vos  wrote:

> On Fri, Oct 26, 2018 at 05:54:28PM +0530, Susant Palai wrote:
> > Hi,
> >For ALUA in Gluster-Block, we need fencing support from GlusterFS.
> This
> > is targeted mainly to avoid corruption issues during fail-over from
> > INITIATOR.
> >
> > You can find the problem statement, design document at [1] and the GitHub
> > discussions at [2].
> >
> > Requesting your feedback on the same.
>
> From a quick glance, this looks very much like leases/delegations that
> have been added for Samba and NFS-Ganesha. Can you explain why using
> that is not sufficient?
>
Niels, is it that you are suggesting leases/delegations already solves the
problem we are trying to solve mentioned in the design document or just the
mandatory lock part?

>
> Thanks,
> Niels
>
>
> >
> > Thanks,
> > Susant/Amar/Shyam/Prasanna/Xiubo
> >
> >
> >
> > [1]
> >
> https://docs.google.com/document/d/1up5egL9SxmVKFpZMUEuuYML6xS2mNmBGzyZbMaw1fl0/edit?usp=sharing
> > [2]
> >
> https://github.com/gluster/gluster-block/issues/53#issuecomment-432924044
>
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Adding ALUA support for Gluster-Block

2018-10-26 Thread Susant Palai
Hi,
   For ALUA in Gluster-Block, we need fencing support from GlusterFS. This
is targeted mainly to avoid corruption issues during fail-over from
INITIATOR.

You can find the problem statement, design document at [1] and the GitHub
discussions at [2].

Requesting your feedback on the same.

Thanks,
Susant/Amar/Shyam/Prasanna/Xiubo



[1]
https://docs.google.com/document/d/1up5egL9SxmVKFpZMUEuuYML6xS2mNmBGzyZbMaw1fl0/edit?usp=sharing
[2]
https://github.com/gluster/gluster-block/issues/53#issuecomment-432924044
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 5: Release calendar and status updates

2018-08-22 Thread Susant Palai
Comments inline
On Wed, Aug 22, 2018 at 11:34 PM Shyam Ranganathan 
wrote:

> On 08/14/2018 02:28 PM, Shyam Ranganathan wrote:
> > 2) Branching date: (Monday) Aug-20-2018 (~40 days before GA tagging)
>
> We are postponing branching to 2nd week of September (10th), as the
> entire effort in this release has been around stability and fixing
> issues across the board.
>
> Thus, we are expecting no net new features from hereon till branching,
> and features that are already a part of the code base and its details
> are as below.
>
> >
> > 3) Late feature back port closure: (Friday) Aug-24-2018 (1 week from
> > branching)
>
> As stated above, there is no late feature back port.
>
> The features that are part of master since 4.1 release are as follows,
> with some questions for the authors,
>
> 1) Changes to options tables in xlators (#302)
>
> @Kaushal/GD2 team, can we call this complete? There maybe no real
> release notes for the same, as these are internal in nature, but
> checking nevertheless.
>
> 2) CloudArchival (#387)
>
> @susant, what is the status of this feature? Is it complete?
>
The feature is complete from a functional point of view. But still would
like to retain "experimental" status for few releases.

> I am missing user documentation, and code coverage from the tests is
>
User documentation is here:
https://review.gluster.org/#/c/glusterfs/+/20064/
Or should there be some other doc I missed?

> very low (see:
> https://build.gluster.org/job/line-coverage/485/Line_20Coverage_20Report/
> )
>
This is expected as without any plugin most of the code is untouched. It's
just a bypass in the build setup.

>
> 3) Quota fsck (#390)
>
> @Sanoj I do have documentation in the github issue, but would prefer if
> the user facing documentation moves to glusterdocs instead.
>
> Further I see no real test coverage for the tool provided here, any
> thoughts around the same?
>
> The script is not part of the tarball and hence the distribution RPMs as
> well, what is the thought around distributing the same?
>
> 4) Ensure python3 compatibility across code base (#411)
>
> @Kaleb/others, last patch to call this issue done (sans real testing at
> the moment) is https://review.gluster.org/c/glusterfs/+/20868 request
> review and votes here, to get this merged before branching.
>
> 5) Turn on Dentry fop serializer by default in brick stack (#421)
>
> @du, the release note for this can be short, as other details are
> captured in 4.0 release notes.
>
> However, in 4.0 release we noted a limitation with this feature as follows,
>
> "Limitations: This feature is released as a technical preview, as
> performance implications are not known completely." (see section
> https://docs.gluster.org/en/latest/release-notes/4.0.0/#standalone )
>
> Do we now have better data regarding the same that we can use when
> announcing the release?
>
> Thanks,
> Shyam
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-01 Thread Susant Palai
Will have a look at it and update.

Susant

On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay,  wrote:

> Same here - https://build.gluster.org/job/centos7-regression/2024/console
>
> -Krutika
>
> On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee 
> wrote:
>
>> tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5 times)
>> running with master branch. As per my knowledge I've not seen this test
>> failing earlier. Looks like some recent changes has caused it. One of such
>> instance is https://build.gluster.org/job/centos7-regression/1955/ .
>>
>> Request the component owners to take a look at it.
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./tests/basic/md-cache/bug-1418249.t failing

2018-03-25 Thread Susant Palai
Sent a patch here - https://review.gluster.org/#/c/19768/ .

On Mon, Mar 26, 2018 at 7:50 AM, Susant Palai <spa...@redhat.com> wrote:

> Hi Poornima,
> As https://review.gluster.org/#/c/19744/ got merged, the $subject
> test case is failing on master.
> The "network.inode-lru-limit" is expecting the old "5" vs the current
> "20".
>
> Requesting to fix it.
>
>
> Susant
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] ./tests/basic/md-cache/bug-1418249.t failing

2018-03-25 Thread Susant Palai
Hi Poornima,
As https://review.gluster.org/#/c/19744/ got merged, the $subject test
case is failing on master.
The "network.inode-lru-limit" is expecting the old "5" vs the current
"20".

Requesting to fix it.


Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Cloudsync Xlator for Archival Use case

2018-03-16 Thread Susant Palai
Hi,
  We are proposing cloudsync xlator for the archival use case for 4.1.
 For more details refer following documents.

- Github-issue: https://github.com/gluster/glusterfs/issues/387
- Spec-file :  https://review.gluster.org/#/c/18854/
- More design details : [ 1]

Please give any input/suggestion you have.

Note:
In version-1, we are targetting one cloud-store per volume. A sample script
will be provided for crawling and uploading. I will be updating more
details in this thread as they come.


Thanks,
Susant/Aravinda


[1] https://docs.google.com/document/d/1jw1Z5ez6gCjcpOjtH6sdCraKUjNWQ
VfORvbDT7BlAYI/edit?usp=sharing
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 4.1: LTM release targeted for end of May

2018-03-15 Thread Susant Palai
Hi,
Would like to propose Cloudsync Xlator(for Archival use-case) for 4.1.
(github  issue-id #387).
Initial patch (under review) is posted here:
https://review.gluster.org/#/c/18532/.
Spec file: https://review.gluster.org/#/c/18854/

Thanks,
Susant


On Thu, Mar 15, 2018 at 4:05 PM, Ravishankar N 
wrote:

>
>
> On 03/13/2018 07:07 AM, Shyam Ranganathan wrote:
>
>> Hi,
>>
>> As we wind down on 4.0 activities (waiting on docs to hit the site, and
>> packages to be available in CentOS repositories before announcing the
>> release), it is time to start preparing for the 4.1 release.
>>
>> 4.1 is where we have GD2 fully functional and shipping with migration
>> tools to aid Glusterd to GlusterD2 migrations.
>>
>> Other than the above, this is a call out for features that are in the
>> works for 4.1. Please *post* the github issues to the *devel lists* that
>> you would like as a part of 4.1, and also mention the current state of
>> development.
>>
> Hi,
>
> We are targeting the 'thin-arbiter' feature for 4.1 :
> https://github.com/gluster/glusterfs/issues/352
> Status: High level design is there in the github issue.
> Thin arbiter xlator patch https://review.gluster.org/#/c/19545/ is
> undergoing reviews.
> Implementation details on AFR and glusterd(2) related changes are being
> discussed.  Will make sure all patches are posted against issue 352.
>
> Thanks,
> Ravi
>
>
>
>> Further, as we hit end of March, we would make it mandatory for features
>> to have required spec and doc labels, before the code is merged, so
>> factor in efforts for the same if not already done.
>>
>> Current 4.1 project release lane is empty! I cleaned it up, because I
>> want to hear from all as to what content to add, than add things marked
>> with the 4.1 milestone by default.
>>
>> Thanks,
>> Shyam
>> P.S: Also any volunteers to shadow/participate/run 4.1 as a release owner?
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] core in regression run

2018-03-08 Thread Susant Palai
Hi,
There is a core for test case: ./tests/bugs/rpc/bug-847624.t.
*link*: https://build.gluster.org/job/centos7-regression/232/console
Requesting concerned team to look into this.

Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] cluster/dht: restrict migration of opened files

2018-01-18 Thread Susant Palai
This does not restrict tiered migrations.

Susant

On 18 Jan 2018 8:18 pm, "Milind Changire"  wrote:

On Tue, Jan 16, 2018 at 2:52 PM, Raghavendra Gowdappa 
wrote:

> All,
>
> Patch [1] prevents migration of opened files during rebalance operation.
> If patch [1] affects you, please voice out your concerns. [1] is a stop-gap
> fix for the problem discussed in issues [2][3]
>
> [1] https://review.gluster.org/#/c/19202/
> [2] https://github.com/gluster/glusterfs/issues/308
> [3] https://github.com/gluster/glusterfs/issues/347
>
> regards,
> Raghavendra
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>


Would this patch affect tiering as well ?
Do we need to worry about tiering anymore ?

--
Milind


___
Gluster-users mailing list
gluster-us...@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 4.0: Making it happen!

2018-01-17 Thread Susant Palai
Hi,
   I would request some extension.  Targetting for the 27th weekend.

Thanks,
Susant

On Tue, Jan 16, 2018 at 8:57 PM, Shyam Ranganathan 
wrote:

> On 01/10/2018 01:14 PM, Shyam Ranganathan wrote:
> > Hi,
> >
> > 4.0 branching date is slated on the 16th of Jan 2018 and release is
> > slated for the end of Feb (28th), 2018.
>
> This is today! So read on...
>
> Short update: I am going to wait a couple more days before branching, to
> settle release content and exceptions. Branching is hence on Jan, 18th
> (Thursday).
>
> >
> > We are at the phase when we need to ensure our release scope is correct
> > and *must* release features are landing. Towards this we need the
> > following information for all contributors.
> >
> > 1) Features that are making it to the release by branching date
> >
> > - There are currently 35 open github issues marked as 4.0 milestone [1]
> > - Need contributors to look at this list and let us know which will meet
> > the branching date
>
> Other than the protocol changes (from Amar), I did not receive any
> requests for features that are making it to the release. I have compiled
> a list of features based on patches in gerrit that are open, to check
> what features are viable to make it to 4.0. This can be found here [3].
>
> NOTE: All features, other than the ones in [3] are being moved out of
> the 4.0 milestone.
>
> > - Need contributors to let us know which may slip and hence needs a
> > backport exception to 4.0 branch (post branching).
> > - Need milestone corrections on features that are not making it to the
> > 4.0 release
>
> I need the following contributors to respond and state if the feature in
> [3] should still be tracked against 4.0 and how much time is possibly
> needed to make it happen.
>
> - Poornima, Amar, Jiffin, Du, Susant, Sanoj, Vijay
>
> >
> > NOTE: Slips are accepted if they fall 1-1.5 weeks post branching, not
> > post that, and called out before branching!
> >
> > 2) Reviews needing priority
> >
> > - There could be features that are up for review, and considering we
> > have about 6-7 days before branching, we need a list of these commits,
> > that you want review attention on.
> > - This will be added to this [2] dashboard, easing contributor access to
> > top priority reviews before branching
>
> As of now, I am adding a few from the list in [3] for further review
> attention as I see things evolving, more will be added as the point
> above is answered by the respective contributors.
>
> >
> > 3) Review help!
> >
> > - This link [2] contains reviews that need attention, as they are
> > targeted for 4.0. Request maintainers and contributors to pay close
> > attention to this list on a daily basis and help out with reviews.
> >
> > Thanks,
> > Shyam
> >
> > [1] github issues marked for 4.0:
> > https://github.com/gluster/glusterfs/milestone/3
> >
> > [2] Review focus for features planned to land in 4.0:
> > https://review.gluster.org/#/q/owner:srangana%2540redhat.com+is:starred
>
> [3] Release 4.0 features with pending code reviews: http://bit.ly/2rbjcl8
>
> >
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-devel
> >
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 3.11: Pending features and reviews (2 days to branching)

2017-04-27 Thread Susant Palai
Hi Shyam,
 Here is the patch list for 3.11 from my side.
 1- https://review.gluster.org/#/c/16427/
 2- https://review.gluster.org/#/c/16980/

Thanks,
Susant

- Original Message -
> From: "Shyam" <srang...@redhat.com>
> To: "Susant Palai" <spa...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>, "Kaushal Madappa" 
> <kmada...@redhat.com>
> Sent: Wednesday, 26 April, 2017 8:20:09 PM
> Subject: Re: [Gluster-devel] Release 3.11: Pending features and reviews (2 
> days to branching)
> 
> On 04/26/2017 03:37 AM, Susant Palai wrote:
> >> 3.2) Rebalance performance improvement #155
> >> @susant any updates? Should this be marked in the 3.11 scope?
> > All the patches are up in master. I guess few patches need some brushing.
> > Would like to get this in 3.11.
> > Will try my best to finish by Apr 27 EOD EST.
> 
> Can I get the patch list, so that I can star the same for review focus.
> 
> Thanks,
> Shyam
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Release 3.11: Pending features and reviews (2 days to branching)

2017-04-26 Thread Susant Palai


- Original Message -
> From: "Shyam" 
> To: "Gluster Devel" 
> Cc: "Kaushal Madappa" 
> Sent: Wednesday, 26 April, 2017 5:11:29 AM
> Subject: [Gluster-devel] Release 3.11: Pending features and reviews (2 days   
> to branching)
> 
> Hi,
> 
> This mail should have been out 3-4 days earlier than now as branching is
> 2 days away, but hopefully it is not too late.
> 
> The current release scope can be seen at [1].
> 
> If you do *not* recognize your nick in the following list then you can
> help us out by looking at pending reviews [2] and moving them along and
> optionally skip the rest of the mail.
> 
> If your nick is in the list below, please read along to update status on
> the action called out against you.
> 
> nick list: @amarts, @csabahenk, @ndevos, @pranith, @kaushal, @jiffin,
> @rabhat, @kaleb, @samikshan, @poornimag, @kotresh, @susant
> 
> If any reviews for the features listed below are still open and not
> appearing in [2] drop me a mail, and I will star it, so that it appears
> in the list as needed.
> 
> Status request of features targeted for 3.11:
> 
> 1) Starting with features that slipped 3.10 and were marked for 3.11
> 
> 1.1) In gfapi fix memory leak during graph switch #61
> @ndevos I know a series of fixes are up for review, will this be
> completed for this release, or would it be an ongoing effort across
> releases? If the latter, we possibly continue tracking this for the next
> release as well.
> 
> 1.2) SELinux support for Gluster Volumes #55
> Latest reviews indicate this may be ready by branching, @jiffin or
> @ndevos will this make it by branching date?
> 
> 1.3) Introduce force option for Snapshot Restore #62
> There seems to be no owner for this now, @rabhat any updates or anything
> more than what we know about this at this time?
> 
> 1.4) switch to storhaug for HA for ganesha and samba #59
> @kaleb, are there any open reviews for this? Is it already done?
> 
> 2) New in 3.11 and tracked in the release scope [1]
> 
> 2.1) get-state CLI needs to provide client and brick capacity related
> information as well #158
> Code is in. Documentation changes are pending (heads up, @samikshan). No
> updates needed at present.
> 
> 2.2) Serve negative lookups from cache #82
> Code is in. Documentation changes are pending, which can come in later
> (heads up, @poornimag)
> 
> 2.3) New xlator to help developers detecting resource leaks #176
> Code and developer documentation is in, issue is auto-closed post merge
> of the commit. (thanks @ndevos)
> 
> 2.4) Make the feature metadata-caching/small file performance production
> ready #167
> Just a release-note update, hence issue will be updated post branching
> when the release notes are updated (heads up, @poornimag)
> 
> 2.5) Make the feature "Parallel Readdir" production ready in 3.11 #166
> Just a release-note update, hence issue will be updated post branching
> when the release notes are updated (heads up, @poornimag)
> 
> 2.6) bitrot: [RFE] Enable object versioning only if bitrot is enabled. #188
> Code is merged, needs release notes updates once branching is done,
> possibly no documentation changes from what I can see, hence will get
> closed once release notes are updated (heads up, @kotresh).
> 
> 3) New in 3.11 and not tracked in release scope [1] as there are no
> visible mail requests to consider these for 3.11 in the gluster devel lists
> 
> 3.1) Use standard refcounting functions #156
> @ndevos any updates? Should this be marked in the 3.11 scope?
> 
> 3.2) Rebalance performance improvement #155
> @susant any updates? Should this be marked in the 3.11 scope?
All the patches are up in master. I guess few patches need some brushing. Would 
like to get this in 3.11.
Will try my best to finish by Apr 27 EOD EST.

Thanks,
Susant
> 
> 3.3) rpc-clnt reconnect timer #152
> @amarts any updates? Should this be marked in the 3.11 scope?
> 
> 3.4) [RFE] libfuse rebase to latest? #153
> @amarts, @csabahenk any updates? Should this be marked in the 3.11 scope?
> 
> 4) Pending issue still to be opened at github (and possibly making into
> the relase)
> 
> 4.1) IPv6 support enhancements from FB
> heads up, @kaushal. Mail discussions are already done, possibly if we
> make it by the cut a github issue would be needed.
> 
> 4.2) Halo replication enhancements from FB
> Heads up, @pranith. As this may make it a week post branching and we
> will take in the backport, a github issue would be needed to track this.
> 
> Thanks,
> Shyam
> 
> [1] Release scope: https://github.com/gluster/glusterfs/projects/1
> 
> [2] Reviews needing attention:
> https://review.gluster.org/#/q/status:open+starredby:srangana%2540redhat.com
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list

[Gluster-devel] ./tests/bitrot/bug-1373520.t failure on master

2017-02-28 Thread Susant Palai
Hi,
   test case: ./tests/bitrot/bug-1373520.t is seen to be failing on different 
regression runs on master.
Requesting to look in to it.

Few instances:
https://build.gluster.org/job/netbsd7-regression/3118/consoleFull
https://build.gluster.org/job/centos6-regression/3451/
https://build.gluster.org/job/netbsd7-regression/3122/consoleFull

Thanks,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Feature: Rebalance completion time estimation

2016-11-11 Thread Susant Palai
Hello All,
   We have been receiving many requests from users to give a "Rebalance  
completion time estimation". This email is to gather ideas and feedback from 
the community for the same. We have one proposal, but nothing is concrete. 
Please feel free to give your input for this problem.

A brief about rebalance operation:
- Rebalance process is used to rebalance data across cluster most likely in the 
event of add-brick and remove-brick. Rebalance is spawned on each node. The job 
for the process is to read directories, fix it's layout to include the newly 
added brick. Read children files(only those reside on local bricks) of the 
directory and migrate them if necessary decided by the new layout.


Here is one of the solution pitched by Manoj Pillai.

Assumptions for this idea:
 - files are of similar size.
 - Max 40% of the total files will be migrated

1- Do a statfs on the local bricks. Say the total size is St.
2- Based on first file size say Sf, assume the no of files in the local system 
to be: Nt
3- So the time estimation would be: (Nt * migration time for one file) * 40%.
4- Rebalance will keep updating this estimation as more files are crawled and 
will try to give a fare estimation.

Problem with this approach: This method assumes that the files size will be 
almost similar. For cluster  with variable file sizes this estimation go wrong.

So this is one initial idea. Please give your suggestions/ideas/feedback on 
this.


Thanks,
Susant






 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Review request for lock migration patches

2016-09-14 Thread Susant Palai
+Poornima, Talur

- Original Message -
> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> To: "Susant Palai" <spa...@redhat.com>
> Cc: "Raghavendra Gowdappa" <rgowd...@redhat.com>, "gluster-devel" 
> <gluster-devel@gluster.org>
> Sent: Wednesday, 14 September, 2016 8:13:25 PM
> Subject: Re: [Gluster-devel] Review request for lock migration patches
> 
> Could you get the reviews from one of Poornima/Raghavendra Talur once?
> 
> On Wed, Sep 14, 2016 at 6:12 PM, Susant Palai <spa...@redhat.com> wrote:
> 
> > Hi,
> >   It would be nice to get the patches in 3.9. The reviews are pending for
> > a long time. Requesting reviews.
> >
> > Thanks,
> > Susant
> >
> >
> > - Original Message -
> > > From: "Susant Palai" <spa...@redhat.com>
> > > To: "Raghavendra Gowdappa" <rgowd...@redhat.com>, "Pranith Kumar
> > Karampuri" <pkara...@redhat.com>
> > > Cc: "gluster-devel" <gluster-devel@gluster.org>
> > > Sent: Wednesday, 7 September, 2016 9:54:04 AM
> > > Subject: Re: [Gluster-devel] Review request for lock migration patches
> > >
> > > Gentle reminder for reviews.
> > >
> > > Thanks,
> > > Susant
> > >
> > > - Original Message -
> > > > From: "Susant Palai" <spa...@redhat.com>
> > > > To: "Raghavendra Gowdappa" <rgowd...@redhat.com>, "Pranith Kumar
> > Karampuri"
> > > > <pkara...@redhat.com>
> > > > Cc: "gluster-devel" <gluster-devel@gluster.org>
> > > > Sent: Tuesday, 30 August, 2016 3:19:13 PM
> > > > Subject: [Gluster-devel] Review request for lock migration patches
> > > >
> > > > Hi,
> > > >
> > > > There are few patches targeted for lock migration. Requesting for
> > review.
> > > > 1. http://review.gluster.org/#/c/13901/
> > > > 2. http://review.gluster.org/#/c/14286/
> > > > 3. http://review.gluster.org/#/c/14492/
> > > > 4. http://review.gluster.org/#/c/15076/
> > > >
> > > >
> > > > Thanks,
> > > > Susant~
> > > >
> > > > ___
> > > > Gluster-devel mailing list
> > > > Gluster-devel@gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >
> > > ___
> > > Gluster-devel mailing list
> > > Gluster-devel@gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> >
> 
> 
> 
> --
> Pranith
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Review request for lock migration patches

2016-09-14 Thread Susant Palai
Hi,
  It would be nice to get the patches in 3.9. The reviews are pending for a 
long time. Requesting reviews.

Thanks,
Susant
  

- Original Message -
> From: "Susant Palai" <spa...@redhat.com>
> To: "Raghavendra Gowdappa" <rgowd...@redhat.com>, "Pranith Kumar Karampuri" 
> <pkara...@redhat.com>
> Cc: "gluster-devel" <gluster-devel@gluster.org>
> Sent: Wednesday, 7 September, 2016 9:54:04 AM
> Subject: Re: [Gluster-devel] Review request for lock migration patches
> 
> Gentle reminder for reviews.
> 
> Thanks,
> Susant
> 
> - Original Message -
> > From: "Susant Palai" <spa...@redhat.com>
> > To: "Raghavendra Gowdappa" <rgowd...@redhat.com>, "Pranith Kumar Karampuri"
> > <pkara...@redhat.com>
> > Cc: "gluster-devel" <gluster-devel@gluster.org>
> > Sent: Tuesday, 30 August, 2016 3:19:13 PM
> > Subject: [Gluster-devel] Review request for lock migration patches
> > 
> > Hi,
> > 
> > There are few patches targeted for lock migration. Requesting for review.
> > 1. http://review.gluster.org/#/c/13901/
> > 2. http://review.gluster.org/#/c/14286/
> > 3. http://review.gluster.org/#/c/14492/
> > 4. http://review.gluster.org/#/c/15076/
> > 
> > 
> > Thanks,
> > Susant~
> > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Review request for lock migration patches

2016-09-06 Thread Susant Palai
Gentle reminder for reviews.

Thanks,
Susant

- Original Message -
> From: "Susant Palai" <spa...@redhat.com>
> To: "Raghavendra Gowdappa" <rgowd...@redhat.com>, "Pranith Kumar Karampuri" 
> <pkara...@redhat.com>
> Cc: "gluster-devel" <gluster-devel@gluster.org>
> Sent: Tuesday, 30 August, 2016 3:19:13 PM
> Subject: [Gluster-devel] Review request for lock migration patches
> 
> Hi,
> 
> There are few patches targeted for lock migration. Requesting for review.
> 1. http://review.gluster.org/#/c/13901/
> 2. http://review.gluster.org/#/c/14286/
> 3. http://review.gluster.org/#/c/14492/
> 4. http://review.gluster.org/#/c/15076/
> 
> 
> Thanks,
> Susant~
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures for ./tests/basic/afr/root-squash-self-heal.t

2016-09-01 Thread Susant Palai
>From glusterd log:
[2016-08-31 07:54:24.817811] E [run.c:191:runner_log] 
(-->/build/install/lib/glusterfs/3.9dev/xlator/mgmt/glusterd.so(+0xe1c30) 
[0x7f1a34ebac30] 
-->/build/install/lib/glusterfs/3.9dev/xlator/mgmt/glusterd.so(+0xe1794) 
[0x7f1a34eba794] -->/build/install/lib/libglusterfs.so.0(runner_log+0x1ae) 
[0x7f1a3fa15cea] ) 0-management: Failed to execute script: 
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=patchy 
--first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2016-08-31 07:54:24.819166]:++ 
G_LOG:./tests/basic/afr/root-squash-self-heal.t: TEST: 20 1 afr_child_up_status 
patchy 0 ++

The above is spawned from a "volume start force". I checked the brick logs and 
the killed brick had started successfully.

Links to failures:
 https://build.gluster.org/job/centos6-regression/429/console
 https://build.gluster.org/job/netbsd7-regression/358/consoleFull


Thanks,
Susant

- Original Message -
> From: "Susant Palai" <spa...@redhat.com>
> To: "gluster-devel" <gluster-devel@gluster.org>
> Sent: Thursday, 1 September, 2016 12:13:01 PM
> Subject: [Gluster-devel] spurious failures for
> ./tests/basic/afr/root-squash-self-heal.t
> 
> Hi,
>  $subject is failing spuriously for one of my patch.
> One of the test case is: EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1"
> afr_child_up_status $V0 0
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failures for ./tests/basic/afr/root-squash-self-heal.t

2016-09-01 Thread Susant Palai
Hi,
 $subject is failing spuriously for one of my patch.
One of the test case is: EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" 
afr_child_up_status $V0 0 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Review request for lock migration patches

2016-08-30 Thread Susant Palai
Hi,

There are few patches targeted for lock migration. Requesting for review. 
1. http://review.gluster.org/#/c/13901/
2. http://review.gluster.org/#/c/14286/
3. http://review.gluster.org/#/c/14492/
4. http://review.gluster.org/#/c/15076/


Thanks,
Susant~

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] CFP for Gluster Developer Summit

2016-08-24 Thread Susant Palai
Topic: Lock migration and how it overcomes data inconsistency in glusterFS - 
design, status and roadmap
Theme: Stability
Abstract: Rebalance process in glusterFS currently doesn't retain posix locks
  when it migrates files from source server to destination server.
  This can lead to data inconsistency in the files owing to successful
  writes by more than one client on a file incorrectly.
  In this talk, I will present the design of lock migration, its status 
and how this
  solves the problem of data inconsistency.
Thanks,
Susant Palai

- Original Message -
> From: "Shyam" <srang...@redhat.com>
> To: "Vijay Bellur" <vbel...@redhat.com>, "Gluster Devel" 
> <gluster-devel@gluster.org>, "gluster-users Discussion List"
> <gluster-us...@gluster.org>
> Cc: "Amye Scavarda" <ascav...@redhat.com>
> Sent: Tuesday, 23 August, 2016 11:29:01 PM
> Subject: Re: [Gluster-users] [Gluster-devel] CFP for Gluster Developer Summit
> 
> Theme: Gluster.Next
> 
> Topic: "DHT2 - O Brother, Where Art Thou?"
> 
> Description:
> An update on DHT2 design and it's progress, with the intention of
> enabling discussions around the love or lack of the same, for the
> proposed model.
> 
> Shyam
> 
> On 08/12/2016 03:48 PM, Vijay Bellur wrote:
> > Hey All,
> >
> > Gluster Developer Summit 2016 is fast approaching [1] on us. We are
> > looking to have talks and discussions related to the following themes in
> > the summit:
> >
> > 1. Gluster.Next - focusing on features shaping the future of Gluster
> >
> > 2. Experience - Description of real world experience and feedback from:
> >a> Devops and Users deploying Gluster in production
> >b> Developers integrating Gluster with other ecosystems
> >
> > 3. Use cases  - focusing on key use cases that drive Gluster.today and
> > Gluster.Next
> >
> > 4. Stability & Performance - focusing on current improvements to reduce
> > our technical debt backlog
> >
> > 5. Process & infrastructure  - focusing on improving current workflow,
> > infrastructure to make life easier for all of us!
> >
> > If you have a talk/discussion proposal that can be part of these themes,
> > please send out your proposal(s) by replying to this thread. Please
> > clearly mention the theme for which your proposal is relevant when you
> > do so. We will be ending the CFP by 12 midnight PDT on August 31st, 2016.
> >
> > If you have other topics that do not fit in the themes listed, please
> > feel free to propose and we might be able to accommodate some of them as
> > lightening talks or something similar.
> >
> > Please do reach out to me or Amye if you have any questions.
> >
> > Thanks!
> > Vijay
> >
> > [1] https://www.gluster.org/events/summit2016/
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Requesting review for patch: http://review.gluster.org/#/c/13901/

2016-06-20 Thread Susant Palai
Hi,
  Patch in $subject is needed for lock(post lock migration), lease to work. 
Requesting your reviews asap so that this can be targeted for 3.8.1.

Thanks,
 Susant~
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] review request

2016-05-30 Thread Susant Palai
Hi,
  Requesting reviews for http://review.gluster.org/#/c/14251 and 
http://review.gluster.org/#/c/14252.

Thanks,
Susant~
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] ./tests/basic/tier/new-tier-cmds.t failing on master branch

2016-04-30 Thread Susant Palai
Hi,
  ./tests/basic/tier/new-tier-cmds.t failed on 
https://build.gluster.org/job/rackspace-regression-2GB-triggered/20191/console.

I ran it on master branch on my local setup and it fails continuously. 

volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
volume detach-tier commit: failed: Detach is in progress. Please retry after 
completion
./tests/basic/tier/new-tier-cmds.t .. 20/21 RESULT 20: 1
=
TEST 21 (line 86): Tier command failed gluster --mode=script --wignore 
--glusterd-sock=/d/backends/1/glusterd/gd.sock 
--log-file=/var/log/glusterfs/new-tier-cmds.t_cli1.log volume tier patchy 
detach status
RESULT 21: 1
tar: Removing leading `/' from member names
./tests/basic/tier/new-tier-cmds.t .. Failed 2/21 subtests 

Test Summary Report
---
./tests/basic/tier/new-tier-cmds.t (Wstat: 0 Tests: 21 Failed: 2)
  Failed tests:  20-21



Can someone from tiering team look into it?

Thanks,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] netbsd smoke failure

2016-04-29 Thread Susant Palai
Hi All,
  On many of my patches the following error is seen from netbsd smoke test.

Triggered by Gerrit: http://review.gluster.org/13993
Building remotely on netbsd0.cloud.gluster.org (netbsd_build) in workspace 
/home/jenkins/root/workspace/netbsd6-smoke
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url git://review.gluster.org/glusterfs.git # 
 > timeout=10
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
git://review.gluster.org/glusterfs.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:810)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1066)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1097)
at hudson.scm.SCM.checkout(SCM.java:485)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1269)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
at hudson.model.Run.execute(Run.java:1738)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:410)
Caused by: hudson.plugins.git.GitException: Command "git config 
remote.origin.url git://review.gluster.org/glusterfs.git" returned status code 
255:
stdout: 
stderr: error: could not lock config file .git/config: File exists



Please let me know how this can be resolved.

Here are few links of netbsd logs:
https://build.gluster.org/job/netbsd6-smoke/13136/console
https://build.gluster.org/job/netbsd6-smoke/13137/console

Thanks,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Requesting lock-migration reviews

2016-04-29 Thread Susant Palai
Hi All,
  Following patches need reviews for lock-migration feature. They are targeted 
for 3.8.
Requesting for reviews.

1- http://review.gluster.org/#/c/13970/
2- http://review.gluster.org/#/c/13993/
3- http://review.gluster.org/#/c/13994/
4- http://review.gluster.org/#/c/13995/
5- http://review.gluster.org/#/c/14011/
6- http://review.gluster.org/#/c/14012/
7- http://review.gluster.org/#/c/14013/
8- http://review.gluster.org/#/c/14014/
9- http://review.gluster.org/#/c/14024/
10- http://review.gluster.org/#/c/13493/
11-http://review.gluster.org/#/c/14074/

Thanks,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Posix lock migration design

2016-04-12 Thread Susant Palai
Hi all,
  The feature page for lock migration can be found here: 
https://github.com/gluster/glusterfs-specs/blob/master/accepted/Lock-Migration.md.
Please give your feedback on this mail thread it self or on gerrit 
(http://review.gluster.org/#/c/13924/).

Thanks,
Susant

- Original Message -
From: "Susant Palai" <spa...@redhat.com>
To: "Gluster Devel" <gluster-devel@gluster.org>
Sent: Thursday, 10 March, 2016 12:30:15 PM
Subject: Re: [Gluster-devel] Posix lock migration design

Forgot to detail about the problems/race. Details inline!

- Original Message -
> From: "Susant Palai" <spa...@redhat.com>
> To: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Thursday, 10 March, 2016 11:35:53 AM
> Subject: Re: [Gluster-devel] Posix lock migration design
> 
> Update:
> 
> Here is an initial solution on how to solve races between (fd-migration with
> lock migration) and (client-disconnect with lock migration).
> 
> Please give your suggestion and comments.
> 
> Fuse fd migration:
> ---


> part1:  Fuse fd migration with out fd association
> - How it is working currently:
> - Fuse initiate fd migration task from graph1 to graph2
> - As part of this a new fd is opened on graph2
> - Locks are associated with fd and client (connection id) currently.
> With fd association out of the picture there will be just new client
> update to the locks.
> 
> part2: fd migration interaction with lock migration

Problem: The problem here is the locks on the destination need to have new 
client-id information from the new graph. 
Race:  1: getlkinfo as part of lock migration read the locks which has old 
client-id.
   2: Post this the fd-migration does an update to lock with the new 
client-id but on the source.

   This can lead to problem where the locks on destination has old client 
id.
   It has two part problems.
   - 1. When old graph disconnects, the migrated locks on destination 
will be flushed (client-id or connection-d is same per graph across protocol 
clients).
   - 2. These locks on destination post fd migration should be flushed 
when the new client disconnects. But this won't happen as the locks are 
associated with old client id which will lead to stale locks and hang. 


   So the solution is to wait till lock migration happens. This 
synchronization will help placing the new client id properly.

>- As part of "fuse-fd-lock-migration" we do two operations.
>a. getxattr (lockinfo): Get the old fd number on old graph
>b. setxattr (lockinfo): set (new fd number + new client info) on
>the new graph through the new fd
>- So meta-lock acts as  a hint of lock migration for any lock related
>operations (fd-migraiton, server_connection_cleanup, new lk requests,
>flush etc...)
>- Now getxattr need not worry about metalk presence at all. Once it
>reads the necessary information the bulk of the job is left to
>setxattr.
>- Setxattr:
>- case 1: whether meta lock is present
>- if YES, wait till meta-unlock is executed on the lock.
>Unwind the call with EREMOTE. Now it's dht translator's
>responsibility to lookup the file to figure out the file
>location and redirect the setxattr. So destination will have
>the new graph client-id.
>- if NO,  set new client information. Which will be migrated
>by rebalance.
>- case 2: What if setxattr has missed (meta lock + unlock)
>- Meta-unlock upon successful lock migration will set a REPLAY
>flag. Which indicates the data as well as locks have been
>migrated.
>- So unwind with EREMOTE, and leave it to dht for the
>redirection part.
> 
>  ->>> right?>
> 
> client talking to source disconnects  during lock migration:
> -
> - There are many phases of data+lock_migraiton.  The following describes
> disconnect around all the phases.

Problem: Post lock migration the locks will become stale, if application 
fd_close does not go on to the destination. 

 - So rebalance might have transferred the locks before a fd_close 
reaches the source. And they will be stale on destination and can lead to hang.
   Hence, synchronization is essential.

> phase-1: disconnect before data migration
> - server cleanup will flush the locks. Hence, there are no locks left for
> migraiton.
> 
> phase-2: disconnect before meatlk reaches 

Re: [Gluster-devel] Posix lock migration design

2016-03-09 Thread Susant Palai
Forgot to detail about the problems/race. Details inline!

- Original Message -
> From: "Susant Palai" <spa...@redhat.com>
> To: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Thursday, 10 March, 2016 11:35:53 AM
> Subject: Re: [Gluster-devel] Posix lock migration design
> 
> Update:
> 
> Here is an initial solution on how to solve races between (fd-migration with
> lock migration) and (client-disconnect with lock migration).
> 
> Please give your suggestion and comments.
> 
> Fuse fd migration:
> ---


> part1:  Fuse fd migration with out fd association
> - How it is working currently:
> - Fuse initiate fd migration task from graph1 to graph2
> - As part of this a new fd is opened on graph2
> - Locks are associated with fd and client (connection id) currently.
> With fd association out of the picture there will be just new client
> update to the locks.
> 
> part2: fd migration interaction with lock migration

Problem: The problem here is the locks on the destination need to have new 
client-id information from the new graph. 
Race:  1: getlkinfo as part of lock migration read the locks which has old 
client-id.
   2: Post this the fd-migration does an update to lock with the new 
client-id but on the source.

   This can lead to problem where the locks on destination has old client 
id.
   It has two part problems.
   - 1. When old graph disconnects, the migrated locks on destination 
will be flushed (client-id or connection-d is same per graph across protocol 
clients).
   - 2. These locks on destination post fd migration should be flushed 
when the new client disconnects. But this won't happen as the locks are 
associated with old client id which will lead to stale locks and hang. 


   So the solution is to wait till lock migration happens. This 
synchronization will help placing the new client id properly.

>- As part of "fuse-fd-lock-migration" we do two operations.
>a. getxattr (lockinfo): Get the old fd number on old graph
>b. setxattr (lockinfo): set (new fd number + new client info) on
>the new graph through the new fd
>- So meta-lock acts as  a hint of lock migration for any lock related
>operations (fd-migraiton, server_connection_cleanup, new lk requests,
>flush etc...)
>- Now getxattr need not worry about metalk presence at all. Once it
>reads the necessary information the bulk of the job is left to
>setxattr.
>- Setxattr:
>- case 1: whether meta lock is present
>- if YES, wait till meta-unlock is executed on the lock.
>Unwind the call with EREMOTE. Now it's dht translator's
>responsibility to lookup the file to figure out the file
>location and redirect the setxattr. So destination will have
>the new graph client-id.
>- if NO,  set new client information. Which will be migrated
>by rebalance.
>- case 2: What if setxattr has missed (meta lock + unlock)
>- Meta-unlock upon successful lock migration will set a REPLAY
>flag. Which indicates the data as well as locks have been
>migrated.
>- So unwind with EREMOTE, and leave it to dht for the
>redirection part.
> 
>  ->>> right?>
> 
> client talking to source disconnects  during lock migration:
> -
> - There are many phases of data+lock_migraiton.  The following describes
> disconnect around all the phases.

Problem: Post lock migration the locks will become stale, if application 
fd_close does not go on to the destination. 

 - So rebalance might have transferred the locks before a fd_close 
reaches the source. And they will be stale on destination and can lead to hang.
   Hence, synchronization is essential.

> phase-1: disconnect before data migration
> - server cleanup will flush the locks. Hence, there are no locks left for
> migraiton.
> 
> phase-2: disconnect before meatlk reaches server
> - same case as phase-1
> 
> phase-3: disconnect just after metalk
> - server_cleanup on seeing metalk waits till meta-unlock.
> - flush the locks on source.
> - incoming ops (write/lk) well will fail with ENOTCONN.
> - fd_close on ENOTCONN will refresh it's inode to check whether the file has
> migrated else where and flush the locks
> 
> 
> 
> Thanks,
> Susant
> 
> 
> 
> - Original Message -
> > From: "Susant Palai" <spa...@redhat.co

Re: [Gluster-devel] ./tests/bugs/fuse/bug-924726.t failing regression

2015-12-03 Thread Susant Palai
/tests/bugs/fuse/bug-924726.t failing consistently.

here are the links:
https://build.gluster.org/job/rackspace-regression-2GB-triggered/16405/consoleFull
https://build.gluster.org/job/rackspace-regression-2GB-triggered/16423/consoleFull

Regards,
Susant

- Original Message -
From: "Dan Lambright" 
To: "Gluster Devel" 
Sent: Wednesday, 2 December, 2015 4:10:44 AM
Subject: [Gluster-devel] ./tests/bugs/fuse/bug-924726.t failing regression


This test

./tests/bugs/fuse/bug-924726.t

has failed regression multiple times. I sent an email the other week about it. 
I'd like to put it to sleep for a bit... if no objections (or fixes) I'll do 
that tomorrow.

Dan



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failure on ./tests/bugs/replicate/bug-1221481-allow-fops-on-dir-split-brain.t

2015-10-01 Thread Susant Palai
Hi,
 ./tests/bugs/replicate/bug-1221481-allow-fops-on-dir-split-brain.t has failed 
twice on patch: http://review.gluster.org/#/c/12235/

Requesting AFT team to look into it.
links to failures:
https://build.gluster.org/job/rackspace-regression-2GB-triggered/14597/consoleFull
https://build.gluster.org/job/rackspace-regression-2GB-triggered/14613/consoleFull

Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.7.5 release

2015-10-01 Thread Susant Palai
Would like to get in http://review.gluster.org/#/c/12235/. Once merged in 
master will backport to 3.7.5.

Susant

- Original Message -
From: "Pranith Kumar Karampuri" 
To: "Gluster Devel" 
Sent: Thursday, 1 October, 2015 9:22:15 AM
Subject: [Gluster-devel] 3.7.5 release

I am waiting for backport merges of:
http://review.gluster.com/11938 and http://review.gluster.com/12250

Let me know if you guys need anymore patches merged. I will wait till 
Monday(IST) in the worst case and if the patches can't get merged by 
then, I will go ahead with the tagging.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.7.5 release

2015-10-01 Thread Susant Palai
yes spurious failure. The test case has nothing to do with rebalance.

- Original Message -
From: "Raghavendra Gowdappa" <rgowd...@redhat.com>
To: "Susant Palai" <spa...@redhat.com>
Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, "Gluster Devel" 
<gluster-devel@gluster.org>
Sent: Thursday, 1 October, 2015 11:54:39 AM
Subject: Re: [Gluster-devel] 3.7.5 release



- Original Message -
> From: "Susant Palai" <spa...@redhat.com>
> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Thursday, October 1, 2015 11:50:29 AM
> Subject: Re: [Gluster-devel] 3.7.5 release
> 
> Would like to get in http://review.gluster.org/#/c/12235/. Once merged in
> master will backport to 3.7.5.

Seems like there is a regression failure. Is it spurious? If yes, will merge 
the patch.

> 
> Susant
> 
> - Original Message -
> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> To: "Gluster Devel" <gluster-devel@gluster.org>
> Sent: Thursday, 1 October, 2015 9:22:15 AM
> Subject: [Gluster-devel] 3.7.5 release
> 
> I am waiting for backport merges of:
> http://review.gluster.com/11938 and http://review.gluster.com/12250
> 
> Let me know if you guys need anymore patches merged. I will wait till
> Monday(IST) in the worst case and if the patches can't get merged by
> then, I will go ahead with the tagging.
> 
> Pranith
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] tier.t failing consistently

2015-09-11 Thread Susant Palai
Hi,
 Can some body take a look into to ./tests/basic/tier/tier.t ?

 Here are the links:

 
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/10217/consoleFull
 
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/10207/consoleFull
 
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/10226/consoleFull

Regards,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Lock migration as a part of rebalance

2015-09-04 Thread Susant Palai


- Original Message -
From: "Raghavendra G" 
To: "Shyam" 
Cc: "Gluster Devel" 
Sent: Thursday, 2 July, 2015 10:21:38 AM
Subject: Re: [Gluster-devel] Lock migration as a part of rebalance












One solution I can think of is to have the responsibility of lock migration 
process spread between both client and rebalance process. A rough algo is 
outlined below: 


1. We should've a static identifier for client process (something like 
process-uuid of mount process - lets call it client-uuid) in the lock 
structure. This identifier won't change across reconnects. 
2. rebalance just copies the entire lock-state verbatim to dst-node (fd-number 
and client-uuid values would be same as the values on src-node). 
3. rebalance process marks these half-migrated locks as "migration-in-progress" 
on dst-node. Any lock request which overlaps with "migration-in-progress" locks 
is considered as conflicting and dealt with appropriately (if SETLK unwind with 
EAGAIN and if SETLKW block till these locks are released). Same approach is 
followed for mandatory locking too. 

4. whenever an fd based operation (like writev, release, lk, flush etc) happens 
on the fd, the client (through which lock was acquired), "migrates" the lock. 
Migration is basically, 
* does a fgetxattr (fd, LOCKINFO_KEY, src-subvol). This will fetch the fd 
number on src subvol - lockinfo. 
* opens new fd on dst-subvol. Then does fsetxattr (new-fd, LOCKINFO_KEY, 
lockinfo, dst-subvol). The brick on receiving setxattr on virtual xattr 
LOCKINFO_KEY looks for all the locks with ((fd == lockinfo) && (client-uuid == 
uuid-of-client-on-which-this-setxattr-came)) and then fills in appropriate 
values for client_t and fd (basically sets lock->fd = 
fd-num-of-the-fd-on-which-setxattr-came). 

Some issues and solutions: 

1. What if client never connects to dst brick? 

We'll have a time-out for "migration-in-progress" locks to be converted into 
"complete" locks. If DHT doesn't migrate within this timeout, server will 
cleanup these locks. This is similar to current protocol/client implementation 
of lock-heal (This functionality is disabled on client as of now. But, upcall 
needs this feature too and we can get this functionality working). If a dht 
tries to migrate the locks after this timeout, it'll will have to re-aquire 
lock on destination (This has to be a non-blocking lock request, irrespective 
of mode of original lock). We get information of current locks opened through 
the fd opened on src. If lock acquisition fails for some reason, dht marks the 
fd bad, so that application will be notified about lost locks. One problem 
unsolved with this solution is another client (say c2) acquiring and releasing 
the lock during the period starting from timeout and client (c1) initiates lock 
migration. However, that problem is present even with existin
 g lock implementation and not really something new introduced by lock 
migration. 

2. What if client connects but disconnects before it could've attempted to 
migrate "migration-in-progress" locks? 

The server can identify locks belonging to this client using client-uuid and 
cleans them up. Dht trying to migrate locks after first disconnect will try to 
reaquire locks as outlined in 1. 

3. What if client disconnects with src subvol and cannot get lock information 
from src for handling issues 1 and 2? 

We'll mark the fd bad. We can optimize this to mark fd bad only if locks have 
been acquired. To do this client has to store some history in the fd on 
successful lock acquisition. 

regards, 



On Wed, Dec 17, 2014 at 12:45 PM, Raghavendra G < raghaven...@gluster.com > 
wrote: 









On Wed, Dec 17, 2014 at 1:25 AM, Shyam < srang...@redhat.com > wrote: 

This mail intends to present the lock migration across subvolumes problem and 
seek solutions/thoughts around the same, so any feedback/corrections are 
appreciated. 

# Current state of file locks post file migration during rebalance 
Currently when a file is migrated during rebalance, its lock information is not 
transferred over from the old subvol to the new subvol, that the file now 
resides on. 

As further lock requests, post migration of the file, would now be sent to the 
new subvol, any potential lock conflicts would not be detected, until the locks 
are migrated over. 

The term locks above can refer to the POSIX locks aquired using the FOP lk by 
consumers of the volume, or to the gluster internal(?) inode/dentry locks. For 
now we limit the discussion to the POSIX locks supported by the FOP lk. 

# Other areas in gluster that migrate locks 
Current scheme of migrating locks in gluster on graph switches, trigger an fd 
migration process that migrates the lock information from the old fd to the new 
fd. This is driven by the gluster client stack, protocol layer (FUSE, gfapi). 

This is done using the (set/get)xattr call with the attr name, 

Re: [Gluster-devel] [Gluster-users] cluster.min-free-disk is not working in distributed disperse volume

2015-08-25 Thread Susant Palai
Mohamed,
   Will investigate in to weighted rebalance behavior.

Susant 

- Original Message -
From: Mohamed Pakkeer mdfakk...@gmail.com
To: Susant Palai spa...@redhat.com
Cc: Mathieu Chateau mathieu.chat...@lotp.fr, gluster-users 
gluster-us...@gluster.org, Gluster Devel gluster-devel@gluster.org
Sent: Tuesday, 25 August, 2015 9:40:01 AM
Subject: Re: [Gluster-users] cluster.min-free-disk is not working in 
distributed disperse volume


Hi Sasant, 
We have created the disperse volume across nodes. We stopped all the upload 
operations and started the rebalance last night.After overnight re-balance, 
some harddisk is occupied 100% and some disks have 13% disk space. 


disk1 belongs to disperse-set-0 . disk36 belongs to disperse-set-35 


df -h result of one data node 



/dev/sdb1 3.7T 3.7T 545M 100% /media/disk1 
/dev/sdc1 3.7T 3.2T 496G 87% /media/disk2 
/dev/sdd1 3.7T 3.7T 30G 100% /media/disk3 
/dev/sde1 3.7T 3.5T 173G 96% /media/disk4 
/dev/sdf1 3.7T 3.2T 458G 88% /media/disk5 
/dev/sdg1 3.7T 3.5T 143G 97% /media/disk6 
/dev/sdh1 3.7T 3.5T 220G 95% /media/disk7 
/dev/sdi1 3.7T 3.3T 415G 89% /media/disk8 
/dev/sdj1 3.7T 3.6T 72G 99% /media/disk9 
/dev/sdk1 3.7T 3.5T 186G 96% /media/disk10 
/dev/sdl1 3.7T 3.6T 65G 99% /media/disk11 
/dev/sdm1 3.7T 3.5T 195G 95% /media/disk12 
/dev/sdn1 3.7T 3.5T 199G 95% /media/disk13 
/dev/sdo1 3.7T 3.6T 78G 98% /media/disk14 
/dev/sdp1 3.7T 3.5T 200G 95% /media/disk15 
/dev/sdq1 3.7T 3.6T 119G 97% /media/disk16 
/dev/sdr1 3.7T 3.5T 206G 95% /media/disk17 
/dev/sds1 3.7T 3.5T 193G 95% /media/disk18 
/dev/sdt1 3.7T 3.6T 131G 97% /media/disk19 
/dev/sdu1 3.7T 3.5T 141G 97% /media/disk20 
/dev/sdv1 3.7T 3.5T 243G 94% /media/disk21 
/dev/sdw1 3.7T 3.4T 299G 92% /media/disk22 
/dev/sdx1 3.7T 3.5T 163G 96% /media/disk23 
/dev/sdy1 3.7T 3.5T 168G 96% /media/disk24 
/dev/sdz1 3.7T 3.5T 219G 95% /media/disk25 
/dev/sdaa1 3.7T 3.7T 37G 100% /media/disk26 
/dev/sdab1 3.7T 3.5T 172G 96% /media/disk27 
/dev/sdac1 3.7T 3.4T 276G 93% /media/disk28 
/dev/sdad1 3.7T 3.6T 108G 98% /media/disk29 
/dev/sdae1 3.7T 3.3T 399G 90% /media/disk30 
/dev/sdaf1 3.7T 3.5T 240G 94% /media/disk31 
/dev/sdag1 3.7T 3.6T 122G 97% /media/disk32 
/dev/sdah1 3.7T 3.5T 147G 97% /media/disk33 
/dev/sdai1 3.7T 3.4T 342G 91% /media/disk34 
/dev/sdaj1 3.7T 3.4T 288G 93% /media/disk35 
/dev/sdak1 3.7T 3.4T 342G 91% /media/disk36 


disk1 belongs to disperse-set-0. Rebalancer logs shows, still rebalancer is 
trying to fill the disperse-set-0 after filling to 100% 




[2015-08-24 19:52:53.036622] E [MSGID: 109023] 
[dht-rebalance.c:672:__dht_check_free_space] 0-glustertest-dht: data movement 
attempted from node (glustertest-disperse-7) to node (glustertest-disperse-0) 
which does not have required free space for 
(/Packages/Features/MPEG/A/AMEO-N-CHALLANGE_FTR_S_BEN-XX_IN-UA_51_HD_RIC_OV/AMEO-N-CHALLANGE_FTR_S_BEN-XX_IN-UA_51_HD_20110521_RIC_OV/AMI-NEBO-C_R3_AUDIO_190511.mxf)
 


[2015-08-24 19:52:53.042026] I [dht-rebalance.c:1002:dht_migrate_file] 
0-glustertest-dht: 
/Packages/Features/MPEG/A/AMEO-N-CHALLANGE_FTR_S_BEN-XX_IN-UA_51_HD_RIC_OV/AMEO-N-CHALLANGE_FTR_S_BEN-XX_IN-UA_51_HD_20110521_RIC_OV/AMINEBO-CHALLANGE_BEN_R1-2-3-4-5-6_MPEG_200511-reel-5-mpeg2.mxf:
 attempting to move from glustertest-disperse-13 to glustertest-disperse-0 


I think, cluster.weighted-rebalance and cluster.min-free-dis k have bugs for 
re-balancing the data, based on weight and disk free space. 


Thanks 
Backer 




On Mon, Aug 24, 2015 at 4:28 PM, Mohamed Pakkeer  mdfakk...@gmail.com  wrote: 



Hi Susant, 


Thanks for your quick reply. We are not updating any files. Actually we are 
archiving video files on this cluster. I think there is a bug in 
cluster.min-free-disk. 


Also i would like to know about rebalance the cluster. Currently we have 20 
nodes and 10 nodes hard disks are almost full . So we need to rebalance the 
data. If i run the rebalancer, it starts on first node(node1) and starts the 
migration process. The first node cpu usage is always high during rebalance 
compare with rest of the cluster nodes.To reduce the cpu usage of rebalancer 
datanode( node1), i peer a new node( without disk) for rebalance and start the 
rebalancer. It starts again the rebalancer on same node1. How can we run a 
rebalancer on a dedicated node? 


Also we are facing memory leaks in fixlayout and heal full operations. 


Regards 
Backer 




On Mon, Aug 24, 2015 at 2:57 PM, Susant Palai  spa...@redhat.com  wrote: 


Hi, 
Cluster.min-free-disk controls new file creation on the bricks. If you happen 
to write to the existing files on the brick and that is leading to brick 
getting full, then most probably you should run a rebalance. 

Regards, 
Susant 



- Original Message - 
From: Mathieu Chateau  mathieu.chat...@lotp.fr  
To: Mohamed Pakkeer  mdfakk...@gmail.com  
Cc: gluster-users  gluster-us...@gluster.org , Gluster Devel  
gluster-devel@gluster.org  
Sent: Monday, 24 August, 2015 2:47:00 PM 
Subject: Re

[Gluster-devel] Netbsd failures on ./tests/basic/afr/arbiter-statfs.t

2015-08-24 Thread Susant Palai
Ravi,
 The test case ./tests/basic/afr/arbiter-statfs.t failing frequently on netbsd 
machine. Requesting to take a look.

Thanks,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd failures on ./tests/basic/afr/arbiter-statfs.t

2015-08-24 Thread Susant Palai
Cool. Need to rebase then.

- Original Message -
From: Atin Mukherjee amukh...@redhat.com
To: Susant Palai spa...@redhat.com, Ravishankar N ravishan...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Monday, 24 August, 2015 4:33:26 PM
Subject: Re: [Gluster-devel] Netbsd failures on 
./tests/basic/afr/arbiter-statfs.t

http://review.gluster.org/#/c/12005/ has already added it to the bad
tests :)

On 08/24/2015 04:31 PM, Susant Palai wrote:
 Ravi,
  The test case ./tests/basic/afr/arbiter-statfs.t failing frequently on 
 netbsd machine. Requesting to take a look.
 
 Thanks,
 Susant
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 

-- 
~Atin
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd failures on ./tests/basic/afr/arbiter-statfs.t

2015-08-24 Thread Susant Palai
links: 
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/9627/consoleFull
   
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/9628/consoleFull

- Original Message -
From: Susant Palai spa...@redhat.com
To: Ravishankar N ravishan...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Monday, 24 August, 2015 4:31:42 PM
Subject: [Gluster-devel] Netbsd failures on 
./tests/basic/afr/arbiter-statfs.t

Ravi,
 The test case ./tests/basic/afr/arbiter-statfs.t failing frequently on netbsd 
machine. Requesting to take a look.

Thanks,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] cluster.min-free-disk is not working in distributed disperse volume

2015-08-24 Thread Susant Palai
Hi,
  Cluster.min-free-disk controls new file creation on the bricks. If you happen 
to write to the existing files on the brick and that is leading to brick 
getting full, then most probably you should run a rebalance.

Regards,
Susant

- Original Message -
From: Mathieu Chateau mathieu.chat...@lotp.fr
To: Mohamed Pakkeer mdfakk...@gmail.com
Cc: gluster-users gluster-us...@gluster.org, Gluster Devel 
gluster-devel@gluster.org
Sent: Monday, 24 August, 2015 2:47:00 PM
Subject: Re: [Gluster-users] cluster.min-free-disk is not working in 
distributed disperse volume




720 brick! Respect ! 
Le 24 août 2015 09:48, Mohamed Pakkeer  mdfakk...@gmail.com  a écrit : 



Hi, 


I have a cluster of 720 bricks, all bricks are 4TB in size. I have change the 
cluster.min-free-disk default value 10% to 3%. So all the disks should have 3% 
minimum disk space free. But some cluster disks are getting full now. Is there 
any additional configuration for keeping some percentage of disk space kept 
free? 





Volume Name: glustertest 
Type: Distributed-Disperse 
Volume ID: 2b575b5c-df2e-449c-abb9-c56cec27e609 
Status: Started 
Number of Bricks: 72 x (8 + 2) = 720 
Transport-type: tcp 





Options Reconfigured: 
features.default-soft-limit: 95% 
cluster.min-free-disk: 3% 
performance.readdir-ahead: on 


df -h of one node 



/dev/sdb1 3.7T 3.6T 132G 97% /media/disk1 
/dev/sdc1 3.7T 3.2T 479G 88% /media/disk2 
/dev/sdd1 3.7T 3.6T 109G 98% /media/disk3 


Any help will be greatly appreciated. 




Regards 
Backer 







___ 
Gluster-users mailing list 
gluster-us...@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-users 

___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Skipped files during rebalance

2015-08-21 Thread Susant Palai
Hi,
 Mostly the rebalance failures are due to the network problem.

Here is the log:

[2015-08-16 20:31:36.301467] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003002002.flex 
lookup failed
[2015-08-16 20:31:36.921405] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003004005.flex 
lookup failed
[2015-08-16 20:31:36.921591] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/006004004.flex 
lookup failed
[2015-08-16 20:31:36.921770] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/005004007.flex 
lookup failed
[2015-08-16 20:31:37.577758] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/007004005.flex 
lookup failed
[2015-08-16 20:34:12.387425] E [socket.c:2332:socket_connect_finish] 
0-live-client-4: connection to 192.168.123.106:24007 failed (Connection refused)
[2015-08-16 20:34:12.392820] E [socket.c:2332:socket_connect_finish] 
0-live-client-5: connection to 192.168.123.106:24007 failed (Connection refused)
[2015-08-16 20:34:12.398023] E [socket.c:2332:socket_connect_finish] 
0-live-client-0: connection to 192.168.123.104:24007 failed (Connection refused)
[2015-08-16 20:34:12.402904] E [socket.c:2332:socket_connect_finish] 
0-live-client-2: connection to 192.168.123.104:24007 failed (Connection refused)
[2015-08-16 20:34:12.407464] E [socket.c:2332:socket_connect_finish] 
0-live-client-3: connection to 192.168.123.106:24007 failed (Connection refused)
[2015-08-16 20:34:12.412249] E [socket.c:2332:socket_connect_finish] 
0-live-client-1: connection to 192.168.123.104:24007 failed (Connection refused)
[2015-08-16 20:34:12.416621] E [socket.c:2332:socket_connect_finish] 
0-live-client-6: connection to 192.168.123.105:24007 failed (Connection refused)
[2015-08-16 20:34:12.420906] E [socket.c:2332:socket_connect_finish] 
0-live-client-8: connection to 192.168.123.105:24007 failed (Connection refused)
[2015-08-16 20:34:12.425066] E [socket.c:2332:socket_connect_finish] 
0-live-client-7: connection to 192.168.123.105:24007 failed (Connection refused)
[2015-08-16 20:34:17.479925] E [socket.c:2332:socket_connect_finish] 
0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)
[2015-08-16 20:36:23.788206] E [MSGID: 101075] 
[common-utils.c:314:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or 
service not known)
[2015-08-16 20:36:23.788286] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-4: DNS resolution failed on host stor106
[2015-08-16 20:36:23.788387] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-5: DNS resolution failed on host stor106
[2015-08-16 20:36:23.788918] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-0: DNS resolution failed on host stor104
[2015-08-16 20:36:23.789233] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-2: DNS resolution failed on host stor104
[2015-08-16 20:36:23.789295] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-3: DNS resolution failed on host stor106


For the high mem usage part I will try to run rebalance and analyze. In the 
mean time it will be help full if you can take a state dump of the rebalance 
process when it is using high RAM.

Here are the steps to take the state dump.

1. Find your state-dump destination; Run gluster --print-statedumpdir. The 
state dump will be stored in this location.

2. When you see any of the rebalance process on any of the servers using high 
memory issue the following command.
   kill -USR1 pid-of-rebalance-process.  --- ps aux | grep rebalance 
should give the rebalance process pid.

The state dump should give some hint about the high mem-usage.

Thanks,
Susant

- Original Message -
From: Susant Palai spa...@redhat.com
To: Christophe TREFOIS christophe.tref...@uni.lu
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Friday, 21 August, 2015 3:52:07 PM
Subject: Re: [Gluster-devel] Skipped files during rebalance

Thanks Christophe for the details. Will get back to you with the analysis.

Regards,
Susant

- Original Message -
From: Christophe TREFOIS christophe.tref...@uni.lu
To: Susant Palai spa...@redhat.com
Cc: Raghavendra Gowdappa rgowd...@redhat.com, Nithya Balachandran 
nbala...@redhat.com, Shyamsundar Ranganathan srang...@redhat.com, 
Mohammed Rafi K C rkavu...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Friday, 21 August, 2015 12

Re: [Gluster-devel] Skipped files during rebalance

2015-08-21 Thread Susant Palai
Thanks Christophe for the details. Will get back to you with the analysis.

Regards,
Susant

- Original Message -
From: Christophe TREFOIS christophe.tref...@uni.lu
To: Susant Palai spa...@redhat.com
Cc: Raghavendra Gowdappa rgowd...@redhat.com, Nithya Balachandran 
nbala...@redhat.com, Shyamsundar Ranganathan srang...@redhat.com, 
Mohammed Rafi K C rkavu...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Friday, 21 August, 2015 12:39:05 AM
Subject: Re: [Gluster-devel] Skipped files during rebalance

Dear Susant,

The rebalance failed again and also had (in my opinion) excessive RAM usage.

Please find a very detailled list below.

All logs:

http://wikisend.com/download/651948/allstores.tar.gz

Thank you for letting me know how I could successfully complete the rebalance 
process.
The fedora pastes are the output of top of each node at that time (more or 
less).

Please let me know if you need more information,

Best,

—— Start of mem info

# After reboot, before starting glusterd

[root@highlander ~]# pdsh -g live 'free -m'
stor106:   totalusedfree  shared  buff/cache   
available
stor106: Mem: 1932492208  190825   9 215
  190772
stor106: Swap: 0   0   0
stor105:   totalusedfree  shared  buff/cache   
available
stor105: Mem: 1932482275  190738   9 234
  190681
stor105: Swap: 0   0   0
stor104:   totalusedfree  shared  buff/cache   
available
stor104: Mem: 1932492221  190811   9 216
  190757
stor104: Swap: 0   0   0
[root@highlander ~]#

# Gluster Info

[root@stor106 glusterfs]# gluster volume info

Volume Name: live
Type: Distribute
Volume ID: 1328637d-7730-4627-8945-bbe43626d527
Status: Started
Number of Bricks: 9
Transport-type: tcp
Bricks:
Brick1: stor104:/zfs/brick0/brick
Brick2: stor104:/zfs/brick1/brick
Brick3: stor104:/zfs/brick2/brick
Brick4: stor106:/zfs/brick0/brick
Brick5: stor106:/zfs/brick1/brick
Brick6: stor106:/zfs/brick2/brick
Brick7: stor105:/zfs/brick0/brick
Brick8: stor105:/zfs/brick1/brick
Brick9: stor105:/zfs/brick2/brick
Options Reconfigured:
nfs.disable: true
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.write-behind-window-size: 4MB
performance.io-thread-count: 32
performance.client-io-threads: on
performance.cache-size: 1GB
performance.cache-refresh-timeout: 60
performance.cache-max-file-size: 4MB
cluster.data-self-heal-algorithm: full
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
cluster.min-free-disk: 1%
server.allow-insecure: on

# Starting gluserd

[root@highlander ~]# pdsh -g live 'systemctl start glusterd'
[root@highlander ~]# pdsh -g live 'free -m'
stor106:   totalusedfree  shared  buff/cache   
available
stor106: Mem: 1932492290  190569   9 389
  190587
stor106: Swap: 0   0   0
stor104:   totalusedfree  shared  buff/cache   
available
stor104: Mem: 1932492297  190557   9 394
  190571
stor104: Swap: 0   0   0
stor105:   totalusedfree  shared  buff/cache   
available
stor105: Mem: 1932482286  190554   9 407
  190595
stor105: Swap: 0   0   0

[root@highlander ~]# systemctl start glusterd
[root@highlander ~]# gluster volume start live
volume start: live: success
[root@highlander ~]# gluster volume status
Status of volume: live
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick stor104:/zfs/brick0/brick 49164 0  Y   5945
Brick stor104:/zfs/brick1/brick 49165 0  Y   5963
Brick stor104:/zfs/brick2/brick 49166 0  Y   5981
Brick stor106:/zfs/brick0/brick 49158 0  Y   5256
Brick stor106:/zfs/brick1/brick 49159 0  Y   5274
Brick stor106:/zfs/brick2/brick 49160 0  Y   5292
Brick stor105:/zfs/brick0/brick 49155 0  Y   5284
Brick stor105:/zfs/brick1/brick 49156 0  Y   5302
Brick stor105:/zfs/brick2/brick 49157 0  Y   5320
NFS Server on localhost N/A   N/AN   N/A
NFS Server on 192.168.123.106   N/A   N/AN   N/A
NFS Server on stor105   N/A   N/AN   N/A
NFS Server on 192.168.123.104   N/A   N/AN   N/A

Task Status of Volume live

[Gluster-devel] Netbsd build failure

2015-08-20 Thread Susant Palai
Hi,
  I tried running netbsd regression twice on a patch. And twice it failed at 
the same point. Here is the error:

snip
Build GlusterFS
***

+ '/opt/qa/build.sh'
  File /usr/pkg/lib/python2.7/site.py, line 601
[2015-08-19 05:45:06.N]:++ G_LOG:./tests/basic/quota-anon-fd-nfs.t: 
TEST: 85 ! fd_write 3 content ++
   ^
SyntaxError: invalid token
+ RET=1
+ '[' 1 '!=' 0 ']'
+ exit 1
Build step 'Ex?cuter un script shell' marked build as failure
Finished: FAILURE
/snip

 Requesting you to take look into it.

Thanks,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd build failure

2015-08-20 Thread Susant Palai
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/9480/console

- Original Message -
 From: Susant Palai spa...@redhat.com
 To: Emmanuel Dreyfus m...@netbsd.org
 Sent: Thursday, 20 August, 2015 2:14:11 PM
 Subject: Re: Netbsd build failure
 
 And there is a new failure now:
 https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/9494/console
 
 - Original Message -
 From: Emmanuel Dreyfus m...@netbsd.org
 To: Susant Palai spa...@redhat.com
 Sent: Thursday, 20 August, 2015 1:13:59 PM
 Subject: Re: Netbsd build failure
 
 On Thu, Aug 20, 2015 at 03:05:56AM -0400, Susant Palai wrote:
   Requesting you to take look into it.
 
 Is it on a rackspace VM? Which one?
 
 --
 Emmanuel Dreyfus
 m...@netbsd.org
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Skipped files during rebalance

2015-08-19 Thread Susant Palai
Comments inline.

- Original Message -
 From: Christophe TREFOIS christophe.tref...@uni.lu
 To: Susant Palai spa...@redhat.com
 Cc: Raghavendra Gowdappa rgowd...@redhat.com, Nithya Balachandran 
 nbala...@redhat.com, Shyamsundar
 Ranganathan srang...@redhat.com, Mohammed Rafi K C 
 rkavu...@redhat.com, Gluster Devel
 gluster-devel@gluster.org
 Sent: Tuesday, August 18, 2015 8:08:41 PM
 Subject: Re: [Gluster-devel] Skipped files during rebalance
 
 Hi Susan,
 
 Thank you for the response.
 
  On 18 Aug 2015, at 10:45, Susant Palai spa...@redhat.com wrote:
  
  Hi Christophe,
  
Need some info regarding the high mem-usage.
  
  1. Top output: To see whether any other process eating up memory.

I will be interested to know the memory usage of all the gluster process 
referring to the high mem-usage. These process includes glusterfsd, glusterd, 
gluster, any mount process (glusterfs), and rebalance(glusterfs).


  2. Gluster volume info
 
 root@highlander ~]# gluster volume info
 
 Volume Name: live
 Type: Distribute
 Volume ID: 1328637d-7730-4627-8945-bbe43626d527
 Status: Started
 Number of Bricks: 9
 Transport-type: tcp
 Bricks:
 Brick1: stor104:/zfs/brick0/brick
 Brick2: stor104:/zfs/brick1/brick
 Brick3: stor104:/zfs/brick2/brick
 Brick4: stor106:/zfs/brick0/brick
 Brick5: stor106:/zfs/brick1/brick
 Brick6: stor106:/zfs/brick2/brick
 Brick7: stor105:/zfs/brick0/brick
 Brick8: stor105:/zfs/brick1/brick
 Brick9: stor105:/zfs/brick2/brick
 Options Reconfigured:
 diagnostics.count-fop-hits: on
 diagnostics.latency-measurement: on
 server.allow-insecure: on
 cluster.min-free-disk: 1%
 diagnostics.brick-log-level: ERROR
 diagnostics.client-log-level: ERROR
 cluster.data-self-heal-algorithm: full
 performance.cache-max-file-size: 4MB
 performance.cache-refresh-timeout: 60
 performance.cache-size: 1GB
 performance.client-io-threads: on
 performance.io-thread-count: 32
 performance.write-behind-window-size: 4MB
 
  3. Is rebalance process still running? If yes can you point to specific mem
  usage by rebalance process? The high mem-usage was seen during rebalance
  or even post rebalance?
 
 I would like to restart the rebalance process since it failed… But I can’t as
 the volume cannot be stopped (I wanted to reboot the servers to have a clean
 testing grounds).
 
 Here are the logs from the three nodes:
 http://paste.fedoraproject.org/256183/43989079
 
 Maybe you could help me figure out how to stop the volume?
 
 This is what happens
 
 [root@highlander ~]# gluster volume rebalance live stop
 volume rebalance: live: failed: Rebalance not started.

Requesting glusterd team to give input. 
 
 [root@highlander ~]# ssh stor105 gluster volume rebalance live stop
 volume rebalance: live: failed: Rebalance not started.
 
 [root@highlander ~]# ssh stor104 gluster volume rebalance live stop
 volume rebalance: live: failed: Rebalance not started.
 
 [root@highlander ~]# ssh stor106 gluster volume rebalance live stop
 volume rebalance: live: failed: Rebalance not started.
 
 [root@highlander ~]# gluster volume rebalance live stop
 volume rebalance: live: failed: Rebalance not started.
 
 [root@highlander ~]# gluster volume stop live
 Stopping volume will make its data inaccessible. Do you want to continue?
 (y/n) y
 volume stop: live: failed: Staging failed on stor106. Error: rebalance
 session is in progress for the volume 'live'
 Staging failed on stor104. Error: rebalance session is in progress for the
 volume ‘live'
Can you run [ps aux |  grep rebalance] on all the servers and post here? Just 
want to check whether rebalance is really running or not. Again requesting 
glusterd team to give inputs.

 
 
  4. Gluster version
 
 [root@highlander ~]# pdsh -g live 'rpm -qa | grep gluster'
 stor104: glusterfs-api-3.7.3-1.el7.x86_64
 stor104: glusterfs-server-3.7.3-1.el7.x86_64
 stor104: glusterfs-libs-3.7.3-1.el7.x86_64
 stor104: glusterfs-3.7.3-1.el7.x86_64
 stor104: glusterfs-fuse-3.7.3-1.el7.x86_64
 stor104: glusterfs-client-xlators-3.7.3-1.el7.x86_64
 stor104: glusterfs-cli-3.7.3-1.el7.x86_64
 
 stor105: glusterfs-3.7.3-1.el7.x86_64
 stor105: glusterfs-client-xlators-3.7.3-1.el7.x86_64
 stor105: glusterfs-api-3.7.3-1.el7.x86_64
 stor105: glusterfs-cli-3.7.3-1.el7.x86_64
 stor105: glusterfs-server-3.7.3-1.el7.x86_64
 stor105: glusterfs-libs-3.7.3-1.el7.x86_64
 stor105: glusterfs-fuse-3.7.3-1.el7.x86_64
 
 stor106: glusterfs-libs-3.7.3-1.el7.x86_64
 stor106: glusterfs-fuse-3.7.3-1.el7.x86_64
 stor106: glusterfs-client-xlators-3.7.3-1.el7.x86_64
 stor106: glusterfs-api-3.7.3-1.el7.x86_64
 stor106: glusterfs-cli-3.7.3-1.el7.x86_64
 stor106: glusterfs-server-3.7.3-1.el7.x86_64
 stor106: glusterfs-3.7.3-1.el7.x86_64
 
  
  Will ask for more information in case needed.
  
  Regards,
  Susant
  
  
  - Original Message -
  From: Christophe TREFOIS christophe.tref...@uni.lu
  To: Raghavendra Gowdappa rgowd...@redhat.com, Nithya Balachandran
  nbala...@redhat.com, Susant Palai
  spa...@redhat.com, Shyamsundar

Re: [Gluster-devel] Skipped files during rebalance

2015-08-19 Thread Susant Palai
Hi Christophe,
   Forgot to ask you to post the rebalance and glusterd logs.

Regards,
Susant
   

- Original Message -
 From: Susant Palai spa...@redhat.com
 To: Christophe TREFOIS christophe.tref...@uni.lu
 Cc: Gluster Devel gluster-devel@gluster.org
 Sent: Wednesday, August 19, 2015 11:44:35 AM
 Subject: Re: [Gluster-devel] Skipped files during rebalance
 
 Comments inline.
 
 - Original Message -
  From: Christophe TREFOIS christophe.tref...@uni.lu
  To: Susant Palai spa...@redhat.com
  Cc: Raghavendra Gowdappa rgowd...@redhat.com, Nithya Balachandran
  nbala...@redhat.com, Shyamsundar
  Ranganathan srang...@redhat.com, Mohammed Rafi K C
  rkavu...@redhat.com, Gluster Devel
  gluster-devel@gluster.org
  Sent: Tuesday, August 18, 2015 8:08:41 PM
  Subject: Re: [Gluster-devel] Skipped files during rebalance
  
  Hi Susan,
  
  Thank you for the response.
  
   On 18 Aug 2015, at 10:45, Susant Palai spa...@redhat.com wrote:
   
   Hi Christophe,
   
 Need some info regarding the high mem-usage.
   
   1. Top output: To see whether any other process eating up memory.
 
 I will be interested to know the memory usage of all the gluster process
 referring to the high mem-usage. These process includes glusterfsd,
 glusterd, gluster, any mount process (glusterfs), and rebalance(glusterfs).
 
 
   2. Gluster volume info
  
  root@highlander ~]# gluster volume info
  
  Volume Name: live
  Type: Distribute
  Volume ID: 1328637d-7730-4627-8945-bbe43626d527
  Status: Started
  Number of Bricks: 9
  Transport-type: tcp
  Bricks:
  Brick1: stor104:/zfs/brick0/brick
  Brick2: stor104:/zfs/brick1/brick
  Brick3: stor104:/zfs/brick2/brick
  Brick4: stor106:/zfs/brick0/brick
  Brick5: stor106:/zfs/brick1/brick
  Brick6: stor106:/zfs/brick2/brick
  Brick7: stor105:/zfs/brick0/brick
  Brick8: stor105:/zfs/brick1/brick
  Brick9: stor105:/zfs/brick2/brick
  Options Reconfigured:
  diagnostics.count-fop-hits: on
  diagnostics.latency-measurement: on
  server.allow-insecure: on
  cluster.min-free-disk: 1%
  diagnostics.brick-log-level: ERROR
  diagnostics.client-log-level: ERROR
  cluster.data-self-heal-algorithm: full
  performance.cache-max-file-size: 4MB
  performance.cache-refresh-timeout: 60
  performance.cache-size: 1GB
  performance.client-io-threads: on
  performance.io-thread-count: 32
  performance.write-behind-window-size: 4MB
  
   3. Is rebalance process still running? If yes can you point to specific
   mem
   usage by rebalance process? The high mem-usage was seen during rebalance
   or even post rebalance?
  
  I would like to restart the rebalance process since it failed… But I can’t
  as
  the volume cannot be stopped (I wanted to reboot the servers to have a
  clean
  testing grounds).
  
  Here are the logs from the three nodes:
  http://paste.fedoraproject.org/256183/43989079
  
  Maybe you could help me figure out how to stop the volume?
  
  This is what happens
  
  [root@highlander ~]# gluster volume rebalance live stop
  volume rebalance: live: failed: Rebalance not started.
 
 Requesting glusterd team to give input.
  
  [root@highlander ~]# ssh stor105 gluster volume rebalance live stop
  volume rebalance: live: failed: Rebalance not started.
  
  [root@highlander ~]# ssh stor104 gluster volume rebalance live stop
  volume rebalance: live: failed: Rebalance not started.
  
  [root@highlander ~]# ssh stor106 gluster volume rebalance live stop
  volume rebalance: live: failed: Rebalance not started.
  
  [root@highlander ~]# gluster volume rebalance live stop
  volume rebalance: live: failed: Rebalance not started.
  
  [root@highlander ~]# gluster volume stop live
  Stopping volume will make its data inaccessible. Do you want to continue?
  (y/n) y
  volume stop: live: failed: Staging failed on stor106. Error: rebalance
  session is in progress for the volume 'live'
  Staging failed on stor104. Error: rebalance session is in progress for the
  volume ‘live'
 Can you run [ps aux |  grep rebalance] on all the servers and post here?
 Just want to check whether rebalance is really running or not. Again
 requesting glusterd team to give inputs.
 
  
  
   4. Gluster version
  
  [root@highlander ~]# pdsh -g live 'rpm -qa | grep gluster'
  stor104: glusterfs-api-3.7.3-1.el7.x86_64
  stor104: glusterfs-server-3.7.3-1.el7.x86_64
  stor104: glusterfs-libs-3.7.3-1.el7.x86_64
  stor104: glusterfs-3.7.3-1.el7.x86_64
  stor104: glusterfs-fuse-3.7.3-1.el7.x86_64
  stor104: glusterfs-client-xlators-3.7.3-1.el7.x86_64
  stor104: glusterfs-cli-3.7.3-1.el7.x86_64
  
  stor105: glusterfs-3.7.3-1.el7.x86_64
  stor105: glusterfs-client-xlators-3.7.3-1.el7.x86_64
  stor105: glusterfs-api-3.7.3-1.el7.x86_64
  stor105: glusterfs-cli-3.7.3-1.el7.x86_64
  stor105: glusterfs-server-3.7.3-1.el7.x86_64
  stor105: glusterfs-libs-3.7.3-1.el7.x86_64
  stor105: glusterfs-fuse-3.7.3-1.el7.x86_64
  
  stor106: glusterfs-libs-3.7.3-1.el7.x86_64
  stor106: glusterfs-fuse-3.7.3-1.el7.x86_64
  stor106: glusterfs

Re: [Gluster-devel] Skipped files during rebalance

2015-08-18 Thread Susant Palai
Hi Christophe,
  
   Need some info regarding the high mem-usage.

1. Top output: To see whether any other process eating up memory.
2. Gluster volume info
3. Is rebalance process still running? If yes can you point to specific mem 
usage by rebalance process? The high mem-usage was seen during rebalance or 
even post rebalance?
4. Gluster version

Will ask for more information in case needed.

Regards,
Susant


- Original Message -
 From: Christophe TREFOIS christophe.tref...@uni.lu
 To: Raghavendra Gowdappa rgowd...@redhat.com, Nithya Balachandran 
 nbala...@redhat.com, Susant Palai
 spa...@redhat.com, Shyamsundar Ranganathan srang...@redhat.com
 Cc: Mohammed Rafi K C rkavu...@redhat.com
 Sent: Monday, 17 August, 2015 7:03:20 PM
 Subject: Fwd: [Gluster-devel] Skipped files during rebalance
 
 Hi DHT team,
 
 This email somehow didn’t get forwarded to you.
 
 In addition to my problem described below, here is one example of free memory
 after everything failed
 
 [root@highlander ~]# pdsh -g live 'free -m'
 stor106:   totalusedfree  shared  buff/cache
 available
 stor106: Mem: 193249  1247841347   9   67118
 12769
 stor106: Swap: 0   0   0
 stor104:   totalusedfree  shared  buff/cache
 available
 stor104: Mem: 193249  107617   31323   9   54308
 42752
 stor104: Swap: 0   0   0
 stor105:   totalusedfree  shared  buff/cache
 available
 stor105: Mem: 193248  1418046736   9   44707
 9713
 stor105: Swap: 0   0   0
 
 So after the failed operation, there’s almost no memory free, and it is also
 not freed up.
 
 Thank you for pointing me to any directions,
 
 Kind regards,
 
 —
 Christophe
 
 
 Begin forwarded message:
 
 From: Christophe TREFOIS
 christophe.tref...@uni.lumailto:christophe.tref...@uni.lu
 Subject: Re: [Gluster-devel] Skipped files during rebalance
 Date: 17 Aug 2015 11:54:32 CEST
 To: Mohammed Rafi K C rkavu...@redhat.commailto:rkavu...@redhat.com
 Cc: gluster-devel@gluster.orgmailto:gluster-devel@gluster.org
 gluster-devel@gluster.orgmailto:gluster-devel@gluster.org
 
 Dear Rafi,
 
 Thanks for submitting a patch.
 
 @DHT, I have two additional questions / problems.
 
 1. When doing a rebalance (with data) RAM consumption on the nodes goes
 dramatically high, eg out of 196 GB available per node, RAM usage would fill
 up to 195.6 GB. This seems quite excessive and strange to me.
 
 2. As you can see, the rebalance (with data) failed as one endpoint becomes
 unconnected (even though it still is connected). I’m thinking this could be
 due to the high RAM usage?
 
 Thank you for your help,
 
 —
 Christophe
 
 Dr Christophe Trefois, Dipl.-Ing.
 Technical Specialist / Post-Doc
 
 UNIVERSITÉ DU LUXEMBOURG
 
 LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
 Campus Belval | House of Biomedicine
 6, avenue du Swing
 L-4367 Belvaux
 T: +352 46 66 44 6124
 F: +352 46 66 44 6949
 http://www.uni.lu/lcsb
 
 [Facebook]https://www.facebook.com/trefex  [Twitter]
 https://twitter.com/Trefex   [Google Plus]
 https://plus.google.com/+ChristopheTrefois/   [Linkedin]
 https://www.linkedin.com/in/trefoischristophe   [skype]
 http://skype:Trefex?call
 
 
 
 This message is confidential and may contain privileged information.
 It is intended for the named recipient only.
 If you receive it in error please notify me and permanently delete the
 original message and any copies.
 
 
 
 
 On 17 Aug 2015, at 11:27, Mohammed Rafi K C
 rkavu...@redhat.commailto:rkavu...@redhat.com wrote:
 
 
 
 On 08/17/2015 01:58 AM, Christophe TREFOIS wrote:
 Dear all,
 
 I have successfully added a new node to our setup, and finally managed to get
 a successful fix-layout run as well with no errors.
 
 Now, as per the documentation, I started a gluster volume rebalance live
 start task and I see many skipped files.
 The error log contains then entires as follows for each skipped file.
 
 [2015-08-16 20:23:30.591161] E [MSGID: 109023]
 [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file
 failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea
 s_05(2013-10-11_17-12-02)/004010008.flex lookup failed
 [2015-08-16 20:23:30.768391] E [MSGID: 109023]
 [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file
 failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea
 s_05(2013-10-11_17-12-02)/007005003.flex lookup failed
 [2015-08-16 20:23:30.804811] E [MSGID: 109023]
 [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file
 failed:/hcs/hcs/OperaArchiveCol/SK 20131011_Oligo_Rot_lowConc_P1/Mea
 s_05(2013-10-11_17-12-02)/006005009.flex lookup failed
 [2015-08-16 20:23:30.805201] E [MSGID: 109023]
 [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file
 failed:/hcs/hcs/OperaArchiveCol/SK

[Gluster-devel] testcase ./tests/geo-rep/georep-basic-dr-rsync.t failure

2015-08-12 Thread Susant Palai
Hi,
   ./tests/geo-rep/georep-basic-dr-rsync.t fails in regression machine as well 
as in my local machine also. Requesting geo-rep team to look in to it.

link: 
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/9158/consoleFull

Regards,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Iusses with Random read/write

2015-07-23 Thread Susant Palai
++CCing gluster-devel to have more eye on this problem.

Susant

- Original Message -
 From: Subrata Ghosh subrata.gh...@ericsson.com
 To: Susant Palai spa...@redhat.com (spa...@redhat.com) 
 spa...@redhat.com, Vijay Bellur vbel...@redhat.com
 (vbel...@redhat.com) vbel...@redhat.com
 Cc: Subrata Ghosh subrata.gh...@ericsson.com
 Sent: Sunday, 19 July, 2015 7:57:28 PM
 Subject: Iusses with Random read/write
 
 Hi Vijay/Prashant,
 
 How you are  you :).
 
 We need your  immediate  help / suggestion  to meet our random I/IO
 performance metrics.
 Currently we have performance issues with random read/write - our basic
 requirement 20 MB/sec for random I/O.
 
 We tried with both iozone and fio, received almost same ( random I/O)
 performance which is not meeting our fundamental I/IO requirements.
 
 Our use case is as below.
 
 Application running on different cards Writes/Reads (random) continuous
 files to the volume comprising with storage belonging from different cards
 in the distributed system, where replica presence  across cards and
 applications are using non-local storages.
 We have verified and identified  the bottleneck mostly on Gluster Client side
 inside the application, however gluster server to server I/O speed looks
 enough good. Performance tuning on gluster server side would not expected to
 help.
 
 We also cross verified checked using NFS client we are getting far better
 performance, but we cannot use NFS client /libgfapi because of use case
 limitation ( brick failures cases etc..)
 
 Please throw some lights or thoughts to improve gluster client to achieve 
 20 MB/Secs
 
 Observations:
 
 Fio:
 
 
 lease find the test results of Random write  read in 2 APPs scenarios.
 
 Scenario
 
 APP_1
 
 APP_2
 
 File size
 
 No of AMC's
 
 Random-Write
 
 3.06  MB/s
 
 3.02 MB/s
 
 100 MB
 
 4
 
 Random-Read
 
 8.1 MB/s
 
 8.4 MB/s
 
 100 MB
 
 4
 
 
 
 Iozone:
 
 ./iozone -R -l 1 -u 1 -r 4k -s 2G -F /home/cdr/f1 | tee -a
 /tmp/iozone_results.txt 
 
 
 APP 1
 
 APP2
 
 File Size : 2GB
 
 File size : 2GB
 
 Record size = 4 Kbytes
 
 Record size = 4 Kbytes
 
 Output is in Kbytes/sec
 
 Output is in Kbytes/sec
 
 
 
 
 Initial write41061.78
 
 Initial write41167.36
 
 
 Rewrite40395.64
 
 Rewrite40810.41
 
 
 Read   262685.69
 
 Read   269644.62
 
 
 Re-read  263751.66
 
 Re-read   270760.62
 
 
 Reverse Read   27715.72
 
 Reverse Read28604.22
 
 
 Stride read   83776.44
 
 Stride read84347.88
 
 
 Random read16239.74 (15.8 MB/s )
 
 Random read15815.94  (15.4 MB/s )
 
 
 Mixed workload16260.95
 
 Mixed workload15787.55
 
 
 Random write 3356.57 (3.3 MB/s )
 
 Random write 3365.17 ( 3.3 MB/s)
 
 
 Pwrite40914.55
 
 Pwrite40692.34
 
 
 Pread   260613.83
 
 Pread   269850.59
 
 
 Fwrite40412.40
 
 Fwrite40369.78
 
 
 Fread   261506.61
 
 Fread   267142.41
 
 
 
 Some of the info on performance testing is at
 http://www.gluster.org/community/documentation/index.php/Performance_Testing
 Also pls check iozone limitations listed there.
 
 WARNING: random I/O testing in iozone is very restricted by iozone
 constraint that it must randomly read then randomly write the entire file!
 This is not what we want - instead it should randomly read/write for some
 fraction of file size or time duration, allowing us to spread out more on
 the disk while not waiting too long for test to finish. This is why fio
 (below) is the preferred test tool for random I/O workloads.
 
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] regression tests: tests/bugs/distribute/bug-1066798.t

2015-07-21 Thread Susant Palai
Comments inline.

- Original Message -
 From: Niels de Vos nde...@redhat.com
 To: gluster-devel@gluster.org
 Cc: spa...@redhat.com, aseng...@redhat.com
 Sent: Tuesday, 21 July, 2015 2:13:37 PM
 Subject: Re: [Gluster-devel] [FAILED] regression tests: 
 tests/bugs/distribute/bug-1066798.t
 
 On Mon, Jul 20, 2015 at 03:00:27PM +0530, Raghavendra Talur wrote:
  Adding Susant and Avra for dht and snapshot test cases respectively.
 
 Same test-case, but a different failure, no core dump:
 
 [08:20:10] ./tests/bugs/distribute/bug-1066798.t ..
 not ok 213
 Failed 1/214 subtests
 [08:20:10]
 
 Test Summary Report
 ---
 ./tests/bugs/distribute/bug-1066798.t (Wstat: 0 Tests: 214 Failed: 1)
   Failed test:  213
 Files=1, Tests=214, 18 wallclock secs ( 0.07 usr  0.00 sys +  4.22 cusr
 1.47 csys =  5.76 CPU)
 Result: FAIL
 ./tests/bugs/distribute/bug-1066798.t: bad status 1
 
 https://build.gluster.org/job/rackspace-regression-2GB-triggered/12661/consoleFull
Getting connection refused error while trying to access logs.
 
 Let me know if I should file a bug for this, so that we can all track
 the progress of the fix.
 
 Thanks,
 Niels
 
  
  
  On Mon, Jul 20, 2015 at 11:45 AM, Milind Changire
  milindchang...@gmail.com
  wrote:
  
  
   http://build.gluster.org/job/rackspace-regression-2GB-triggered/12541/consoleFull
  
  
   http://build.gluster.org/job/rackspace-regression-2GB-triggered/12499/consoleFull

Checked log file for this. The test case in question tries to migrate 50 hard 
links. But
from log I could see only 16 log messages of migration. How come the readdirp 
missed the other
entries is the question to be figured here ? Trying to reproduce the issue on 
my system. So
far it is working properly. Will update if able to reproduce the issue.

  
  
   Please advise.
  
   --
   Milind
  
  
   ___
   Gluster-devel mailing list
   Gluster-devel@gluster.org
   http://www.gluster.org/mailman/listinfo/gluster-devel
  
  
  
  
  --
  *Raghavendra Talur *
 
  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-devel
 
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t

2015-07-02 Thread Susant Palai
Comments inline.

- Original Message -
 From: Sachin Pandit span...@redhat.com
 To: Kotresh Hiremath Ravishankar khire...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org
 Sent: Thursday, July 2, 2015 12:21:44 PM
 Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
 
 - Original Message -
  From: Vijaikumar M vmall...@redhat.com
  To: Kotresh Hiremath Ravishankar khire...@redhat.com, Gluster Devel
  gluster-devel@gluster.org
  Cc: Sachin Pandit span...@redhat.com
  Sent: Thursday, July 2, 2015 12:01:03 PM
  Subject: Re: Regression Failure: ./tests/basic/quota.t
  
  We look into this issue
  
  Thanks,
  Vijay
  
  On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar wrote:
   Hi,
  
   I see quota.t regression failure for the following. The changes are
   related
   to
   example programs in libgfchangelog.
  
   http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull
  
   Could someone from quota team, take a look at it.
 
 Hi,
 
 I had a quick look at this. It looks like the following test case failed
 
 TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4}
 EXPECT_WITHIN $REBALANCE_TIMEOUT 0 rebalance_completed
 
 
 I looked at the logs too, and found out the following errors
 
 patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026]
 [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix layout on /
 failed
 build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01 09:27:23.040998] E
 [MSGID: 106224]
 [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle] 0-management:
 Failed to update status
 StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.557887]
 E [rpc-clnt.c:362:saved_frames_unwind] (--
 /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7fc882d04d5a]
 (--
 /build/install/lib/libgfrpc.so.0(saved_frames_unwind+0x212)[0x7fc882ace086]
 (--
 /build/install/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7fc882ace183]
 (--
 /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7fc882ace615]
 (-- /build/install/lib/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fc882acf00f]
 ) 0-StartMigrationDuringRebalanceTest-client-0: forced unwinding frame
 type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-06-19 14:34:47.554862
 (xid=0xc)
 StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.561191]
 E [MSGID: 114031] [client-rpc-fops.c:1623:client3_3_inodelk_cbk]
 0-StartMigrationDuringRebalanceTest-client-0: remote operation failed:
 Transport endpoint is not connected [Transport endpoint is not connected]
 StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.561417]
 E [socket.c:2332:socket_connect_finish]
 0-StartMigrationDuringRebalanceTest-client-0: connection to
 23.253.62.104:24007 failed (Connection refused)
 StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.561707]
 E [dht-common.c:2643:dht_find_local_subvol_cbk]
 0-StartMigrationDuringRebalanceTest-dht: getxattr err (Transport endpoint is
 not connected) for dir
 
Seems like a network partition. Rebalance fails if there it receives ENOTCONN 
on it's child.

 
 Any help regarding this or more information on this would be much
 appreciated.
 
 Thanks,
 Sachin Pandit.
 
 
  
   Thanks and Regards,
   Kotresh H R
  
  
  
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Three Issues Confused me recently

2015-06-26 Thread Susant Palai
Comment inline.

- Original Message -
 From: christ1...@sina.com
 To: gluster-devel gluster-devel@gluster.org
 Sent: Thursday, 25 June, 2015 7:56:45 PM
 Subject: [Gluster-devel] Three Issues Confused me recently
 
 
 
 Hi, everyone!
 
 
 
 
 Nowadays, t here are three issues confusing me recently when I used the
 glusterfs to save huge datas. Like below:
 1) Is there any reason for reserving 10% free space of each brick in the
 volume ? And Can I do not reserve the 10% free space of each brick in the
 volume? You know, I will use the glusterfs to save huge surveillance videos,
 so each brick will be set a large disk space. If each brick will be reserved
 10% free space, it must led to low usage of disk and waster many disk
 spaces.
 

10% is the default and it can be modified by the cluster.min-free-disk option.

e.g gluster v set _VOL_NAME_ min-free-disk 8GB


 *On the question of what should be this cluster.min-free-disk's value?*
  
  Cluster.min-free-disk: The min-free-disk setting establishes a data threshold 
for each brick in a volume. The primary intention of this is to ensure that 
there is adequate space to perform self-heal and rebalance operations, both of 
which require disk overhead. The min-free-disk value is taken into account when 
it is already exceeded before a file is being written. When that is the case, 
the DHT algorithm will choose to write the file instead to another brick where 
min-free-disk is not exceeded, and will write a 0-byte link-to file on the 
brick where min-free-disk is exceeded and where the file was originally hashed. 
This link-to file contains metadata to point the client to the brick where the 
data was actually written.  Because min-free-disk is only considered after it 
has been exceeded, and because the DHT algorithm makes no other consideration 
of available space on a brick, it is possible to write a large file that will 
exceed the space on the brick it is hashed to even
  while another brick has enough space to hold the file. This would result in 
an I/O error to the client.

   So if you know you routinely write files up to  nGB size, then min-free-disk 
can be set to arbitrarily a little larger value than n. For example if your 
file size is 5GB which is at the high end of the file sizes you will be 
writing, then 
you might consider setting min-free-disk to be 8GB. Doing this will ensure that 
the file will go to a brick with enough available space (assuming one exist).


 2) Will it appear some exceptions when the filesystem, like xfs, ext4, had
 been written fully?
 
As I already mentioned above, the new file creation will be redirected to a 
different brick with adequate space considering min-free-disk is exceeded. 

 3) Is it natural that a very high cpu usage when the directory quota is
 enabled ? (glusterfs 3.6.2)
 
CCing quota team for this.

 And is there any solution to avoid it ?
 
 
 I am very appreciate for your help, thanks very much.
 
 
 
 
 
 
 
 Best regards.
 
 
 
 
 
 
 
 Louis
 
 2015/6/25
 
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regarding the issues gluster DHT and Layouts of bricks

2015-05-21 Thread Susant Palai
Commets inline.

- Original Message -
 From: Subrata Ghosh subrata.gh...@ericsson.com
 To: gluster-devel@gluster.org, gluster-us...@gluster.org
 Cc: Nobin Mathew nobin.mat...@ericsson.com, Susant Palai 
 spa...@redhat.com, Vijay Bellur
 vbel...@redhat.com
 Sent: Thursday, 21 May, 2015 4:26:05 PM
 Subject: Regarding the issues gluster DHT and  Layouts of bricks
 
 
 Hi  All,
 
 Could you please guide us  to solve the following DHT and brick layout
 problem we are  dealing with ? Questions are marked bold.
 
 Problem statement :
 
 
 1.  We have a requirement to achieve maximum write and read performance
 and we have to meet some committed performance metrics.
 
Our goal is to place each file into different bricks to get
optimal performance and also observer the nature of the
throughput , hence need to have a mechanism  to generate
different hash using gluster glusterfs.gf_dm_hashfn,
 (assuming number of files are : N , Number of Bricks :N)  to place spate
 bricks.
 
 
 -How to make sure each file has different hash and   falls to
 different bricks ?
 
 
 
 -Other way to put the question if I  know the range of the brick
 layout or more precisely if I know the  hex value of the desired hash ( so
 that it will be placed desired brick)  that we need to generate from
 Davis-Meyer algorithm used in gluster,  Can we create a file name such that,
 that also solve our problem to some extent?
 
 
 2.  We tried to experiment to see  how a file in gluster is decided to be
 placed in a particular brick following gluster glusterfs.gf_dm_hashfn and
 took some idea from
some articles  like
http://gluster.readthedocs.org/en/latest/Features/dht/ ,
https://joejulian.name/blog/dht-misses-are-expensive/ page which
describes layout for that brick  and calculate a hash for the file.
 
 
 To minimize collisions or generating different hash in such way to
 place each file in different bricks ( file 1 = brick A, file 2 =
 Brick B, file 3=  Brick C, file 4 = brick D)
 
We use kind of similar script to get the hash value for a file
 
 def gf_dm_hashfn(filename):
 return ctypes.c_uint32(glusterfs.gf_dm_hashfn(
 filename,
 len(filendame)))
 
 if __name__ == __main__:
 print hex(gf_dm_hashfn(sys.argv[1]).value)
 
 We can then calculate the hash for a filename:
 # python gf_dm_hash.py file1
 0x99d1b6fL
 
 
 Extended attribute is fetch to check the range and try to match the above
 generated hash value.
 
 getfattr -n trusted.glusterfs.dht -e hex file1
 
 
   However we are not able to exactly follow till this point ,  how the
   hash value matched to one of the layout assignments, to yield what we
   call a hashed location.
 
 
 -My question is if I  know the range of brick lay out ( say
 0xc000 to  0x, is range  select a hash 0xc007 ) where to be
 placed the next file can we generate the name ( kind of reverse of  gluster
 glusterfs.gf_dm_hashfn) ?

I am not aware of any such mechanism.  You will have to generate file names 
manually and run them through your script to check whether it falls in the 
brick range.

 
 PS :  Susant : Can you throw some light or suggest  a method we are trying to
 solve.
 
 Thanks for your time.
 
 
 Best Regards,
 Subrata Ghosh
 
 
 
 
 
 
 
Regards,
Susant
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] gluster crashes in dht_getxattr_cbk() due to null pointer dereference.

2015-05-08 Thread Susant Palai
Fixed a similar crash in dht_getxattr_cbk here: 
http://review.gluster.org/#/c/10467/

Susant

- Original Message -
From: Paul Guo bigpaul...@foxmail.com
To: gluster-devel@gluster.org
Sent: Friday, 8 May, 2015 3:25:01 PM
Subject: [Gluster-devel] gluster crashes in dht_getxattr_cbk() due to null  
pointer dereference.

Hi,

gdb debugging shows the rootcause seems to be quite straightforward. The 
gluster version is 3.4.5 and the stack:

#0  0x7eff735fe354 in dht_getxattr_cbk (frame=0x7eff775b6360, 
cookie=value optimized out, this=value optimized out, op_ret=value 
optimized out, op_errno=0,
 xattr=value optimized out, xdata=0x0) at dht-common.c:2043
2043DHT_STACK_UNWIND (getxattr, frame, local-op_ret, 
op_errno,
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.80.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64 
krb5-libs-1.9-33.el6.x86_64 libcom_err-1.41.12-12.el6.x86_64 
libgcc-4.4.6-4.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 
openssl-1.0.1e-16.el6_5.14.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x7eff735fe354 in dht_getxattr_cbk (frame=0x7eff775b6360, 
cookie=value optimized out, this=value optimized out, op_ret=value 
optimized out, op_errno=0,
 xattr=value optimized out, xdata=0x0) at dht-common.c:2043
#1  0x7eff7383c168 in afr_getxattr_cbk (frame=0x7eff7756ab58, 
cookie=value optimized out, this=value optimized out, op_ret=0, 
op_errno=0, dict=0x7eff76f21dc8, xdata=0x0)
 at afr-inode-read.c:618
#2  0x7eff73d8 in client3_3_getxattr_cbk (req=value optimized 
out, iov=value optimized out, count=value optimized out, 
myframe=0x7eff77554d4c) at client-rpc-fops.c:1115
#3  0x003de700d6f5 in rpc_clnt_handle_reply (clnt=0xc36ad0, 
pollin=0x14b21560) at rpc-clnt.c:771
#4  0x003de700ec6f in rpc_clnt_notify (trans=value optimized out, 
mydata=0xc36b00, event=value optimized out, data=value optimized 
out) at rpc-clnt.c:891
#5  0x003de700a4e8 in rpc_transport_notify (this=value optimized 
out, event=value optimized out, data=value optimized out) at 
rpc-transport.c:497
#6  0x7eff74af6216 in socket_event_poll_in (this=0xc46530) at 
socket.c:2118
#7  0x7eff74af7c3d in socket_event_handler (fd=value optimized 
out, idx=value optimized out, data=0xc46530, poll_in=1, poll_out=0, 
poll_err=0) at socket.c:2230
#8  0x003de785e907 in event_dispatch_epoll_handler 
(event_pool=0xb70e90) at event-epoll.c:384
#9  event_dispatch_epoll (event_pool=0xb70e90) at event-epoll.c:445
#10 0x00406818 in main (argc=4, argv=0x7fff24878238) at 
glusterfsd.c:1934

See dht_getxattr_cbk() (below). When frame-local is equal to 0, gluster 
jumps to the label out where when it accesses local-xattr (i.e. 
0-xattr), it crashes. Note in 
DHT_STACK_UNWIND()-STACK_UNWIND_STRICT(), fn looks fine.

(gdb) p __local
$11 = (dht_local_t *) 0x0
(gdb) p frame-local
$12 = (void *) 0x0
(gdb) p fn
$1 = (fop_getxattr_cbk_t) 0x7eff7298c940 mdc_readv_cbk

I did not read the dht code much so I have not idea whether zero 
frame-local is normal or not, but from the code's perspective this is 
an obvious bug and it still exists in latest glusterfs workspace.

The following code change is a simple fix, but maybe there's a better one.
-if (is_last_call (this_call_cnt)) {
+if (is_last_call (this_call_cnt)  local != NULL) {

Similar issues exist in other functions also, e.g. stripe_getxattr_cbk() 
(I did not check all code).

int
dht_getxattr_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
   int op_ret, int op_errno, dict_t *xattr, dict_t *xdata)
{
 int this_call_cnt = 0;
 dht_local_t *local = NULL;

 VALIDATE_OR_GOTO (frame, out);
 VALIDATE_OR_GOTO (frame-local, out);

 ..

out:
 if (is_last_call (this_call_cnt)) {
 DHT_STACK_UNWIND (getxattr, frame, local-op_ret, op_errno,
   local-xattr, NULL);
 }
 return 0;
}


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Rebalance improvement design

2015-05-05 Thread Susant Palai
Comments inline

- Original Message -
 From: Benjamin Turner bennytu...@gmail.com
 To: Susant Palai spa...@redhat.com
 Cc: Vijay Bellur vbel...@redhat.com, Gluster Devel 
 gluster-devel@gluster.org
 Sent: Monday, May 4, 2015 8:58:13 PM
 Subject: Re: [Gluster-devel] Rebalance improvement design
 
 I see:
 
 #define GF_DECIDE_DEFRAG_THROTTLE_COUNT(throttle_count, conf) { \
 \
 throttle_count = MAX ((get_nprocs() - 4), 4);
   \
 \
 if (!strcmp (conf-dthrottle, lazy))  \
 conf-defrag-rthcount = 1; \
 \
 if (!strcmp (conf-dthrottle, normal))\
 conf-defrag-rthcount = (throttle_count / 2);  \
 \
 if (!strcmp (conf-dthrottle, aggressive))\
 conf-defrag-rthcount = throttle_count;  \
 
 So aggressive will give us the default of (20 + 16), normal is that divided
The 16 here you mentioned are sync threads that scales with the workload 
independent of migration. The number 20 is the no. of dedicated threads for 
carrying out migration. Planning to make the maximum threads allowed to be the 
number of processing units available or 4. [MAX (get_nprocs() , 4)]. 
 by 2, and lazy is 1, is that correct?  If so that is what I was looking to
 see.  The only other thing I can think of here is making the tunible a
 number like event threads, but I like this.  IDK if I saw it documented but
 if its not we should note this in help.
Sure will be documented.
 
 Also to note, the old time was 98500.00 the new one is 55088.00, that is a
 44% improvement!
 
 -b
 
 
 On Mon, May 4, 2015 at 9:06 AM, Susant Palai spa...@redhat.com wrote:
 
  Ben,
  On no. of threads:
   Sent throttle patch here:http://review.gluster.org/#/c/10526/ to
  limit thread numbers[Not merged]. The rebalance process in current model
  spawns 20 threads and in addition to that there will be a max 16 syncop
  threads.
 
  Crash:
   The crash should be fixed by this:
  http://review.gluster.org/#/c/10459/.
 
   Rebalance time taken is a factor of number of files and their size.
  If the frequency of files getting added to the global queue[on which the
  migrator threads act] is higher, faster will be the rebalance. I guess here
  we are seeing the effect of local crawl mostly as only 81GB is migrated out
  of 500GB.
 
  Thanks,
  Susant
 
  - Original Message -
   From: Benjamin Turner bennytu...@gmail.com
   To: Vijay Bellur vbel...@redhat.com
   Cc: Gluster Devel gluster-devel@gluster.org
   Sent: Monday, May 4, 2015 5:18:13 PM
   Subject: Re: [Gluster-devel] Rebalance improvement design
  
   Thanks Vijay! I forgot to upgrade the kernel(thinp 6.6 perf bug gah)
  before I
   created this data set, so its a bit smaller:
  
   total threads = 16
   total files = 7,060,700 (64 kb files, 100 files per dir)
   total data = 430.951 GB
   88.26% of requested files processed, minimum is 70.00
   10101.355737 sec elapsed time
   698.985382 files/sec
   698.985382 IOPS
   43.686586 MB/sec
  
   I updated everything and ran the rebalanace on
   glusterfs-3.8dev-0.107.git275f724.el6.x86_64.:
  
   [root@gqas001 ~]# gluster v rebalance testvol status
   Node Rebalanced-files size scanned failures skipped status run time in
  secs
   - --- --- --- --- ---
    --
   localhost 1327346 81.0GB 3999140 0 0 completed 55088.00
   gqas013.sbu.lab.eng.bos.redhat.com 0 0Bytes 1 0 0 completed 26070.00
   gqas011.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00
   gqas014.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00
   gqas016.sbu.lab.eng.bos.redhat.com 1325857 80.9GB 4000865 0 0 completed
   55088.00
   gqas015.sbu.lab.eng.bos.redhat.com 0 0Bytes 0 0 0 failed 0.00
   volume rebalance: testvol: success:
  
  
   A couple observations:
  
   I am seeing lots of threads / processes running:
  
   [root@gqas001 ~]# ps -eLf | grep glu | wc -l
   96 - 96 gluster threads
   [root@gqas001 ~]# ps -eLf | grep rebal | wc -l
   36 - 36 rebal threads.
  
   Is this tunible? Is there a use case where we would need to limit this?
  Just
   curious, how did we arrive at 36 rebal threads?
  
   # cat /var/log/glusterfs/testvol-rebalance.log | wc -l
   4,577,583
   [root@gqas001 ~]# ll /var/log/glusterfs/testvol-rebalance.log -h
   -rw--- 1 root root 1.6G May 3 12:29
   /var/log/glusterfs/testvol-rebalance.log
  
   :) How big is this going to get when I do the 10-20 TB? I'll keep tabs on
   this, my default test setup only has:
  
   [root@gqas001 ~]# df -h
   Filesystem Size Used Avail Use

Re: [Gluster-devel] Rebalance improvement design

2015-04-29 Thread Susant Palai
Hi Ben
   I checked out the glusterfs process attaching gdb and I could not find the 
newer code. Can you confirm whether you took the new patch ? patch i: 
http://review.gluster.org/#/c/9657/

Thanks,
Susant


- Original Message -
 From: Susant Palai spa...@redhat.com
 To: Benjamin Turner bennytu...@gmail.com, Nithya Balachandran 
 nbala...@redhat.com
 Cc: Shyamsundar Ranganathan srang...@redhat.com
 Sent: Wednesday, April 29, 2015 1:22:02 PM
 Subject: Re: [Gluster-devel] Rebalance improvement design
 
 This is how it looks for 2000 file. each 1MB. Done rebalance on 2*2 + 2.
 
 OLDER:
 [root@gprfs030 ~]# gluster v rebalance test1 status
 Node Rebalanced-files  size
 scanned  failures
 skipped   status   run
 time in secs
-  ---   ---
---   ---   ---
 --
localhost 2000 1.9GB
3325 0 0
completed  63.00
gprfs032-10ge00Bytes
2158 0 0
completed   6.00
 volume rebalance: test1: success:
 [root@gprfs030 ~]#
 
 
 NEW:
 [root@gprfs030 upstream_rebalance]# gluster v rebalance test1 status
 Node Rebalanced-files  size
 scanned  failures
 skipped   status   run
 time in secs
-  ---   ---
---   ---   ---
 --
localhost 2000 1.9GB
2011 0 0
completed  12.00
gprfs032-10ge00Bytes
0 0 0
failed   0.00 [Failed
because of a crash which I will address in next
patch]
 volume rebalance: test1: success:
 
 
 Just trying out replica behaviour for rebalance.
 
 Here is the volume info.
 [root@gprfs030 ~]# gluster v i
  
 Volume Name: test1
 Type: Distributed-Replicate
 Volume ID: e12ef289-86f2-454a-beaa-72ea763dbada
 Status: Started
 Number of Bricks: 3 x 2 = 6
 Transport-type: tcp
 Bricks:
 Brick1: gprfs030-10ge:/bricks/gprfs030/brick1
 Brick2: gprfs032-10ge:/bricks/gprfs032/brick1
 Brick3: gprfs030-10ge:/bricks/gprfs030/brick2
 Brick4: gprfs032-10ge:/bricks/gprfs032/brick2
 Brick5: gprfs030-10ge:/bricks/gprfs030/brick3
 Brick6: gprfs032-10ge:/bricks/gprfs032/brick3
 
 
 
 - Original Message -
  From: Susant Palai spa...@redhat.com
  To: Benjamin Turner bennytu...@gmail.com
  Cc: Gluster Devel gluster-devel@gluster.org
  Sent: Wednesday, April 29, 2015 1:13:04 PM
  Subject: Re: [Gluster-devel] Rebalance improvement design
  
  Ben, will you be able to give rebal stat for the same configuration and
  data
  set with older rebalance infra ?
  
  Thanks,
  Susant
  
  - Original Message -
   From: Susant Palai spa...@redhat.com
   To: Benjamin Turner bennytu...@gmail.com
   Cc: Gluster Devel gluster-devel@gluster.org
   Sent: Wednesday, April 29, 2015 12:08:38 PM
   Subject: Re: [Gluster-devel] Rebalance improvement design
   
   Hi Ben,
 Yes we were using pure dist volume. Will check in to your systems for
 more
 info.
   
   Can you please update which patch set you used ? In the mean time I will
   do
   one set of test with the same configuration on a small data set.
   
   Thanks,
   Susant
   
   
   - Original Message -
From: Benjamin Turner bennytu...@gmail.com
To: Nithya Balachandran nbala...@redhat.com
Cc: Susant Palai spa...@redhat.com, Gluster Devel
gluster-devel@gluster.org
Sent: Wednesday, April 29, 2015 2:13:05 AM
Subject: Re: [Gluster-devel] Rebalance improvement design

I am not seeing the performance you were.  I am running on 500GB of
data:

[root@gqas001 ~]# gluster v rebalance testvol status
  Node Rebalanced-files
 size   scanned  failures   skipped   status
 run
time in secs

Re: [Gluster-devel] Rebalance improvement design

2015-04-29 Thread Susant Palai
Hi Ben,
  Yes we were using pure dist volume. Will check in to your systems for more 
info. 

Can you please update which patch set you used ? In the mean time I will do one 
set of test with the same configuration on a small data set.

Thanks,
Susant


- Original Message -
 From: Benjamin Turner bennytu...@gmail.com
 To: Nithya Balachandran nbala...@redhat.com
 Cc: Susant Palai spa...@redhat.com, Gluster Devel 
 gluster-devel@gluster.org
 Sent: Wednesday, April 29, 2015 2:13:05 AM
 Subject: Re: [Gluster-devel] Rebalance improvement design
 
 I am not seeing the performance you were.  I am running on 500GB of data:
 
 [root@gqas001 ~]# gluster v rebalance testvol status
   Node Rebalanced-files
  size   scanned  failures   skipped   status   run
 time in secs
 -  ---
 ---   ---   ---   --- 
 --
 localhost   129021
 7.9GB912104 0 0  in progress
 10100.00
 gqas012.sbu.lab.eng.bos.redhat.com00Bytes
 1930312 0 0  in progress   10100.00
 gqas003.sbu.lab.eng.bos.redhat.com00Bytes
 1930312 0 0  in progress   10100.00
 gqas004.sbu.lab.eng.bos.redhat.com   128903 7.9GB
  946730 0 0  in progress   10100.00
 gqas013.sbu.lab.eng.bos.redhat.com00Bytes
 1930312 0 0  in progress   10100.00
 gqas014.sbu.lab.eng.bos.redhat.com00Bytes
 1930312 0 0  in progress   10100.00
 
 Based on what I am seeing I expect this to take 2 days.  Was you rebal run
 on a pure dist volume?  I am trying on 2x2 + 2 new bricks.  Any idea why
 mine is taking so long?
 
 -b
 
 
 
 On Wed, Apr 22, 2015 at 1:10 AM, Nithya Balachandran nbala...@redhat.com
 wrote:
 
  That sounds great. Thanks.
 
  Regards,
  Nithya
 
  - Original Message -
  From: Benjamin Turner bennytu...@gmail.com
  To: Nithya Balachandran nbala...@redhat.com
  Cc: Susant Palai spa...@redhat.com, Gluster Devel 
  gluster-devel@gluster.org
  Sent: Wednesday, 22 April, 2015 12:14:14 AM
  Subject: Re: [Gluster-devel] Rebalance improvement design
 
  I am setting up a test env now, I'll have some feedback for you this week.
 
  -b
 
  On Tue, Apr 21, 2015 at 11:36 AM, Nithya Balachandran nbala...@redhat.com
  
  wrote:
 
   Hi Ben,
  
   Did you get a chance to try this out?
  
   Regards,
   Nithya
  
   - Original Message -
   From: Susant Palai spa...@redhat.com
   To: Benjamin Turner bennytu...@gmail.com
   Cc: Gluster Devel gluster-devel@gluster.org
   Sent: Monday, April 13, 2015 9:55:07 AM
   Subject: Re: [Gluster-devel] Rebalance improvement design
  
   Hi Ben,
 Uploaded a new patch here: http://review.gluster.org/#/c/9657/. We can
   start perf test on it. :)
  
   Susant
  
   - Original Message -
   From: Susant Palai spa...@redhat.com
   To: Benjamin Turner bennytu...@gmail.com
   Cc: Gluster Devel gluster-devel@gluster.org
   Sent: Thursday, 9 April, 2015 3:40:09 PM
   Subject: Re: [Gluster-devel] Rebalance improvement design
  
   Thanks Ben. RPM is not available and I am planning to refresh the patch
  in
   two days with some more regression fixes. I think we can run the tests
  post
   that. Any larger data-set will be good(say 3 to 5 TB).
  
   Thanks,
   Susant
  
   - Original Message -
   From: Benjamin Turner bennytu...@gmail.com
   To: Vijay Bellur vbel...@redhat.com
   Cc: Susant Palai spa...@redhat.com, Gluster Devel 
   gluster-devel@gluster.org
   Sent: Thursday, 9 April, 2015 2:10:30 AM
   Subject: Re: [Gluster-devel] Rebalance improvement design
  
  
   I have some rebalance perf regression stuff I have been working on, is
   there an RPM with these patches anywhere so that I can try it on my
   systems? If not I'll just build from:
  
  
   git fetch git:// review.gluster.org/glusterfs refs/changes/57/9657/8 
   git cherry-pick FETCH_HEAD
  
  
  
   I will have _at_least_ 10TB of storage, how many TBs of data should I run
   with?
  
  
   -b
  
  
   On Tue, Apr 7, 2015 at 9:07 AM, Vijay Bellur  vbel...@redhat.com 
  wrote:
  
  
  
  
   On 04/07/2015 03:08 PM, Susant Palai wrote:
  
  
   Here is one test performed on a 300GB data set and around 100%(1/2 the
   time) improvement was seen.
  
   [root@gprfs031 ~]# gluster v i
  
   Volume Name: rbperf
   Type: Distribute
   Volume ID: 35562662-337e-4923-b862- d0bbb0748003
   Status: Started
   Number of Bricks: 4
   Transport-type: tcp
   Bricks:
   Brick1: gprfs029-10ge:/bricks/ gprfs029/brick1
   Brick2: gprfs030-10ge:/bricks/ gprfs030/brick1
   Brick3: gprfs031

Re: [Gluster-devel] Gerrit review UI documentation

2015-04-24 Thread Susant Palai
Thanks Ravi. Very helpful. 

- Original Message -
 From: Ravishankar N ravishan...@redhat.com
 To: Gluster Devel gluster-devel@gluster.org
 Sent: Friday, April 24, 2015 2:00:52 PM
 Subject: [Gluster-devel] Gerrit review UI documentation
 
 Thought this might be helpful if you want to explore the features of the
 new gerrit review interface:
 https://review.gluster.org/Documentation/user-review-ui.html
 
 -Ravi
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Core by test case : georep-tarssh-hybrid.t

2015-04-24 Thread Susant Palai
Appended comments inline.

- Original Message -
 From: Susant Palai spa...@redhat.com
 To: Kotresh Hiremath Ravishankar khire...@redhat.com
 Cc: gluster-devel@gluster.org
 Sent: Friday, April 24, 2015 7:17:33 PM
 Subject: Re: [Gluster-devel] Core by test case : georep-tarssh-hybrid.t
 
 Hi,
   Here is a speculation :
 
   With the introduction of multi-threaded epoll we are processing multiple
   responses at the same time. The crash happened in _gf_free which
   originated from dht_getxattr_cbk (as seen in the backtrace). In current
   state we don't have a frame lock inside dht_getxattr_cbk. Hence, this path
   is prone to races.
 
 Here is a code-snippet from dht_getxattr_cbk.
 ===
 this_call_cnt = dht_frame_return (frame); 
Need to move the above line after out section. Othere wise will 
end up in dead lock. 
 ..
 ..
 ..
 ..
 
 
 if (!local-xattr) {
 local-xattr = dict_copy_with_ref (xattr, NULL);
 } else {
 dht_aggregate_xattr (local-xattr, xattr);
 }
 out:
 if (is_last_call (this_call_cnt)) {
 DHT_STACK_UNWIND (getxattr, frame, local-op_ret, op_errno,
   local-xattr, NULL);
 }
 return 0;
 
 ===
 Here I am depicting the responses from two cbks from a two subvol cluster.
   
 Thread:1  CBK1
 Thread:2
 CBK2
   
   =
 time: 1. this_call_cnt = 1 (2-1)
  
 time:2
 this_call_cnt
 = 0 (1 - 1)
 
 time:3  enters this function dict_copy_with_ref
 
 time:4
 dht_aggregate_xattr
 
 time:5
 DHT_STACK_UNWIND
 [leading to dict_unref and destroy]
 
 time:6  Still busy with dict_with_ref
 and tries to unref dict leading to
 free  which  is already freed in
 other thread. Hence, a double free.
 
 
 Will compose a patch which encompass the  critical section under frame-lock.
 
 
 Regards,
 Susant
 
 - Original Message -
  From: Kotresh Hiremath Ravishankar khire...@redhat.com
  To: Venky Shankar vshan...@redhat.com, Pranith Kumar Karampuri
  pkara...@redhat.com
  Cc: gluster-devel@gluster.org
  Sent: Friday, April 24, 2015 11:04:09 AM
  Subject: Re: [Gluster-devel] Core by test case : georep-tarssh-hybrid.t
  
  I apologize, I thought it is the same issue that we assumed. I just
  looked into the stack trace in and is a different issue. This crash
  has happened when stime getxattr.
  
  Pranith,
  You were working on min stime for ec, do you know abt this?
  
  The trace looks like this.
  
  1.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64
  openssl-1.0.1e-30.el6.8.x86_64 zlib-1.2.3-29.el6.x86_64
  (gdb) bt
  #0  0x7f4d89c41380 in pthread_spin_lock () from /lib64/libpthread.so.0
  #1  0x7f4d8a714438 in __gf_free (free_ptr=0x7f4d70023550) at
  /home/jenkins/root/workspace/smoke/libglusterfs/src/mem-pool.c:303
  #2  0x7f4d8a6ca1fb in data_destroy (data=0x7f4d87f27488) at
  /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:148
  #3  0x7f4d8a6caf46 in data_unref (this=0x7f4d87f27488) at
  /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:549
  #4  0x7f4d8a6cde55 in dict_get_bin (this=0x7f4d88108be8,
  key=0x7f4d78131230
  
  trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime,
  bin=0x7f4d7de276d8)
  at /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:2231
  #5  0x7f4d7cfa0d19 in gf_get_min_stime (this=0x7f4d7800d690,
  dst=0x7f4d88108be8,
  key=0x7f4d78131230
  
  trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime,
  value=0x7f4d87f271b0)
  at
  
  /home/jenkins/root/workspace/smoke/xlators/cluster/afr/src/../../../../xlators/lib/src/libxlator.c:330
  #6  0x7f4d7cd16419 in dht_aggregate (this=0x7f4d88108d8c,
  key=0x7f4d78131230
  
  trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime,
  value=0x7f4d87f271b0, data=0x7f4d88108be8)
  at
  
  /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:116
  #7  0x7f4d8a6cc3b1 in dict_foreach_match (dict=0x7f4d88108d8c,
  match=0x7f4d8a6cc244 dict_match_everything, match_data=0x0,
  action=0x7f4d7cd16330 dht_aggregate, action_data=0x7f4d88108be8) at
  /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:1182
  #8  0x7f4d8a6cc2a4 in dict_foreach (dict=0x7f4d88108d8c,
  fn=0x7f4d7cd16330 dht_aggregate, data=0x7f4d88108be8)
  at /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:1141
  #9  0x7f4d7cd165ae in dht_aggregate_xattr (dst=0x7f4d88108be8,
  src=0x7f4d88108d8c) at
  /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src

Re: [Gluster-devel] Regression: Spurious Failures

2015-04-23 Thread Susant Palai
Aware of uss.t spurious failures.

- Original Message -
From: Kotresh Hiremath Ravishankar khire...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Sent: Thursday, 23 April, 2015 2:26:50 PM
Subject: [Gluster-devel] Regression: Spurious Failures

Hi all,

I am seeing the following tests failing on my patch. The changes are
irrelevant to the test cases. Is anybody also facing the same?

1.
http://build.gluster.org/job/rackspace-regression-2GB-triggered/7267/consoleFull
[09:38:50] ./tests/bugs/snapshot/bug-1112559.t .. 
not ok 9 Got 0 instead of 1

2. 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/7365/consoleFull
[07:28:58] ./tests/basic/uss.t .. 
not ok 153 

Thanks and Regards,
Kotresh H R

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Rebalance improvement design

2015-04-09 Thread Susant Palai
Thanks Ben. RPM is not available and I am planning to refresh the patch in two 
days with some more regression fixes. I think we can run the tests post that. 
Any larger data-set will be good(say 3 to 5 TB).

Thanks,
Susant

- Original Message -
From: Benjamin Turner bennytu...@gmail.com
To: Vijay Bellur vbel...@redhat.com
Cc: Susant Palai spa...@redhat.com, Gluster Devel 
gluster-devel@gluster.org
Sent: Thursday, 9 April, 2015 2:10:30 AM
Subject: Re: [Gluster-devel] Rebalance improvement design


I have some rebalance perf regression stuff I have been working on, is there an 
RPM with these patches anywhere so that I can try it on my systems? If not I'll 
just build from: 


git fetch git:// review.gluster.org/glusterfs refs/changes/57/9657/8  git 
cherry-pick FETCH_HEAD 



I will have _at_least_ 10TB of storage, how many TBs of data should I run with? 


-b 


On Tue, Apr 7, 2015 at 9:07 AM, Vijay Bellur  vbel...@redhat.com  wrote: 




On 04/07/2015 03:08 PM, Susant Palai wrote: 


Here is one test performed on a 300GB data set and around 100%(1/2 the time) 
improvement was seen. 

[root@gprfs031 ~]# gluster v i 

Volume Name: rbperf 
Type: Distribute 
Volume ID: 35562662-337e-4923-b862- d0bbb0748003 
Status: Started 
Number of Bricks: 4 
Transport-type: tcp 
Bricks: 
Brick1: gprfs029-10ge:/bricks/ gprfs029/brick1 
Brick2: gprfs030-10ge:/bricks/ gprfs030/brick1 
Brick3: gprfs031-10ge:/bricks/ gprfs031/brick1 
Brick4: gprfs032-10ge:/bricks/ gprfs032/brick1 


Added server 32 and started rebalance force. 

Rebalance stat for new changes: 
[root@gprfs031 ~]# gluster v rebalance rbperf status 
Node Rebalanced-files size scanned failures skipped status run time in secs 
- --- --- --- --- --- 
 -- 
localhost 74639 36.1GB 297319 0 0 completed 1743.00 
172.17.40.30 67512 33.5GB 269187 0 0 completed 1395.00 
gprfs029-10ge 79095 38.8GB 284105 0 0 completed 1559.00 
gprfs032-10ge 0 0Bytes 0 0 0 completed 402.00 
volume rebalance: rbperf: success: 

Rebalance stat for old model: 
[root@gprfs031 ~]# gluster v rebalance rbperf status 
Node Rebalanced-files size scanned failures skipped status run time in secs 
- --- --- --- --- --- 
 -- 
localhost 86493 42.0GB 634302 0 0 completed 3329.00 
gprfs029-10ge 94115 46.2GB 687852 0 0 completed 3328.00 
gprfs030-10ge 74314 35.9GB 651943 0 0 completed 3072.00 
gprfs032-10ge 0 0Bytes 594166 0 0 completed 1943.00 
volume rebalance: rbperf: success: 


This is interesting. Thanks for sharing  well done! Maybe we should attempt a 
much larger data set and see how we fare there :). 

Regards, 


Vijay 


__ _ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://www.gluster.org/ mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Rebalance improvement design

2015-03-31 Thread Susant Palai
Hi,
   Posted patch for rebalance improvement here: 
http://review.gluster.org/#/c/9657/ .
You can find the feature page here: 
http://www.gluster.org/community/documentation/index.php/Features/improve_rebalance_performance

The current patch address two part of the design proposed.
1. Rebalance multiple files in parallel
2. Crawl only bricks that belong to the current node

Brief design explanation for the above two points.

1. Rebalance multiple files in parallel:
   -

The existing rebalance engine is single threaded. Hence, introduced 
multiple threads which will be running parallel to the crawler.
The current rebalance migration is converted to a Producer-Consumer 
frame work. 
Where Producer is : Crawler 
  Consumer is : Migrating Threads 
 
Crawler: Crawler is the main thread. The job of the crawler is now 
limited to fix-layout of each directory and add the files 
 which are eligible for the migration to a global queue. Hence, 
the crawler will not be blocked by migration process. 

   Producer: Producer will monitor the global queue. If any file is added 
to this queue, it will dqueue that entry and migrate the file.
Currently 15 migration threads are spawned at the beginning of 
the rebalance process. Hence, multiple file migration 
happens in parallel.


2. Crawl only bricks that belong to the current node:
   --

   As rebalance process is spawned per node, it migrates only the files 
that belongs to it's own node for the sake of load
   balancing. But it also reads entries from the whole cluster, which 
is not necessary as readdir hits other nodes.

 New Design:
   As part of the new design the rebalancer decides the subvols that 
are local to the rebalancer node by checking the node-uuid of 
   root directory prior to the crawler starts. Hence, readdir won't hit 
the whole cluster  as it has already the context of
  local subvols and also node-uuid request for each file can be 
avoided. This makes the rebalance process more scalable.


Requesting reviews asap.

Regards,
Susant





 








___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel