Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-26 Thread Sincock, John [FLCPTY]
Hi Kotresh,

NP, thanks for clarifying that Kotresh. We will be sure to delete or archive 
only changelogs that are old enough we're sure we won't need to query them.

Cheers and thanks again :-)


-Original Message-
From: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com] 
Sent: Friday, 23 October 2015 8:30 PM
To: Sincock, John [FLCPTY]
Cc: Vijaikumar Mallikarjuna; gluster-devel@gluster.org
Subject: Re: [Gluster-devel] Need advice re some major issues with glusterfind

Hi John,

You are welcome and happy to help you!

You can delete the consumed changelogs safely if there is only one glusterfind 
session for your gluster volume. But if you have multiple glusterfind sessions 
which are started at time gap of let's say two days.

1st-day  : session1   - For Purpose 1
2nd-day  : session1
3rd-day  : session1
   session2 (started)  - For Purpose 2

Above, if you had deleted changelogs of Day-1 and Day-2 when session 2 is 
started, it needs to crawl the entire filesystem which defeats the purpose of 
glusterfind and is slower.
That's the reason I said deleting changelogs is not recommended. If you don't 
have use cases of above kind, you can delete changelogs.


Thanks and Regards,
Kotresh H R

- Original Message -
> From: "John Sincock [FLCPTY]" <j.sinc...@fugro.com>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: "Vijaikumar Mallikarjuna" <vmall...@redhat.com>, 
> gluster-devel@gluster.org
> Sent: Friday, October 23, 2015 2:54:14 PM
> Subject: RE: [Gluster-devel] Need advice re some major issues with 
> glusterfind
> 
> Aaah I s, thanks Kotresh :-)
> This explains why there are so many files and why I sometimes didn't 
> see some changed files during my testing where I was changing files 
> and then immediately running a glusterfind.
> 
> When you say deleting the changelogs is not recommended because it 
> will affect new glusterfind sessions - I assume it will be OK to 
> delete changelogs that are further back into the past than the time 
> period we're interested in? Please let me know if this is the case, or 
> if you meant that removing old changelogs is likely to trigger bugs 
> and cause all our glusterfinds to start failing outright...
> 
> We can leave the old changelogs there if we have to, but if we don’t 
> increase the rollover time, the number will become astronomical as 
> time goes on, so I hope we can delete or archive old changelogs for 
> time periods we're no longer interested in.
> 
> For our purposes I think it should also be OK to try increasing the 
> rollover time significantly, eg if we have it set to rollover every 10 
> minutes, then all we have to do is subtract 10 mins from the start 
> time of each glusterfind/backup so it overlaps the end of the previous 
> glusterfind period. In this way, any files changed just before a 
> glusterfind/backup runs, might be missed by the first backup, but they 
> will be caught by the next backup that runs later on. And it wont 
> matter if some changed files get backed up twice -as long as we get at 
> least one backup of every file that does change..
> 
> I note that by default there is no easy way to make glusterfind report 
> on changes further back in time than the time you run glusterfind 
> create to start a session - but I've already had some success at 
> getting glusterfind to give results back to earlier times before the 
> session was created (as long as the changelogs exist). I did this by 
> using a script to manually set the time we're interested in in the 
> status file(s) - ie in the main status file on the node running the 
> "pre" command", and for every one of the extra status files stored on 
> every node for each of their bricks :-)
> 
> I think my only remaining concern is how cpu-intensive the process is. 
> I've had glusterfinds return very quickly if only reporting on changes 
> for the last hour, or the last 10 hours or so. But if I go back a bit 
> further, the time taken to do the glusterfind seems to really blow out 
> and it sits there pegging all our CPUs at 100% for hours.
> 
> But you and Vijay have definitely given me a few tweaks I can look 
> into - I think I will bump-up the changelog rollover a bit, and will 
> follow Vijay's tip to get all our files labelled with pgfid's, and 
> then perhaps the glusterfinds will be less cpu-intensive.
> 
> Thanks for the tips (Kotresh & Vijay), and I'll let you know how it goes.
> 
> If the glusterfinds are still very cpu-intensive after all the pgfid 
> labelling is done, I'll be happy to do some further testing if it can 
> be of any help to you. Or if you're already trying to find time to 
> work on increasing the efficiency of processing the changelo

Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-23 Thread Kotresh Hiremath Ravishankar
Hi John,

You are welcome and happy to help you!

You can delete the consumed changelogs safely if there is only one glusterfind 
session
for your gluster volume. But if you have multiple glusterfind sessions which 
are started
at time gap of let's say two days.

1st-day  : session1   - For Purpose 1
2nd-day  : session1
3rd-day  : session1
   session2 (started)  - For Purpose 2

Above, if you had deleted changelogs of Day-1 and Day-2 when session 2 is 
started, it needs
to crawl the entire filesystem which defeats the purpose of glusterfind and is 
slower.
That's the reason I said deleting changelogs is not recommended. If you don't 
have use cases
of above kind, you can delete changelogs.


Thanks and Regards,
Kotresh H R

- Original Message -
> From: "John Sincock [FLCPTY]" <j.sinc...@fugro.com>
> To: "Kotresh Hiremath Ravishankar" <khire...@redhat.com>
> Cc: "Vijaikumar Mallikarjuna" <vmall...@redhat.com>, gluster-devel@gluster.org
> Sent: Friday, October 23, 2015 2:54:14 PM
> Subject: RE: [Gluster-devel] Need advice re some major issues with glusterfind
> 
> Aaah I s, thanks Kotresh :-)
> This explains why there are so many files and why I sometimes didn't see some
> changed files during my testing where I was changing files and then
> immediately running a glusterfind.
> 
> When you say deleting the changelogs is not recommended because it will
> affect new glusterfind sessions - I assume it will be OK to delete
> changelogs that are further back into the past than the time period we're
> interested in? Please let me know if this is the case, or if you meant that
> removing old changelogs is likely to trigger bugs and cause all our
> glusterfinds to start failing outright...
> 
> We can leave the old changelogs there if we have to, but if we don’t increase
> the rollover time, the number will become astronomical as time goes on, so I
> hope we can delete or archive old changelogs for time periods we're no
> longer interested in.
> 
> For our purposes I think it should also be OK to try increasing the rollover
> time significantly, eg if we have it set to rollover every 10 minutes, then
> all we have to do is subtract 10 mins from the start time of each
> glusterfind/backup so it overlaps the end of the previous glusterfind
> period. In this way, any files changed just before a glusterfind/backup
> runs, might be missed by the first backup, but they will be caught by the
> next backup that runs later on. And it wont matter if some changed files get
> backed up twice -as long as we get at least one backup of every file that
> does change..
> 
> I note that by default there is no easy way to make glusterfind report on
> changes further back in time than the time you run glusterfind create to
> start a session - but I've already had some success at getting glusterfind
> to give results back to earlier times before the session was created (as
> long as the changelogs exist). I did this by using a script to manually set
> the time we're interested in in the status file(s) - ie in the main status
> file on the node running the "pre" command", and for every one of the extra
> status files stored on every node for each of their bricks :-)
> 
> I think my only remaining concern is how cpu-intensive the process is. I've
> had glusterfinds return very quickly if only reporting on changes for the
> last hour, or the last 10 hours or so. But if I go back a bit further, the
> time taken to do the glusterfind seems to really blow out and it sits there
> pegging all our CPUs at 100% for hours.
> 
> But you and Vijay have definitely given me a few tweaks I can look into - I
> think I will bump-up the changelog rollover a bit, and will follow Vijay's
> tip to get all our files labelled with pgfid's, and then perhaps the
> glusterfinds will be less cpu-intensive.
> 
> Thanks for the tips (Kotresh & Vijay), and I'll let you know how it goes.
> 
> If the glusterfinds are still very cpu-intensive after all the pgfid
> labelling is done, I'll be happy to do some further testing if it can be of
> any help to you. Or if you're already trying to find time to work on
> increasing the efficiency of processing the changelogs, and you know where
> the improvements need to be made I'll just leave you to it and hope it all
> goes smoothly for you
> 
> Thanks again, and cheerios :-)
> John
> 
> 
> 
> 
> 
>  
> 
> 
> 
> 
> -Original Message-
> From: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com]
> Sent: Friday, 23 October 2015 5:24 PM
> To: Sincock, John [FLCPTY]
> Cc: Vijaikumar Mallikarjuna; gluster-devel@gluster.org
> Subject: Re: [Gluster-devel] Need advice re some major iss

Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-23 Thread Vijay Bellur

On Friday 23 October 2015 03:30 PM, Kotresh Hiremath Ravishankar wrote:

Hi John,

You are welcome and happy to help you!

You can delete the consumed changelogs safely if there is only one glusterfind 
session
for your gluster volume. But if you have multiple glusterfind sessions which 
are started
at time gap of let's say two days.

1st-day  : session1   - For Purpose 1
2nd-day  : session1
3rd-day  : session1
session2 (started)  - For Purpose 2

Above, if you had deleted changelogs of Day-1 and Day-2 when session 2 is 
started, it needs
to crawl the entire filesystem which defeats the purpose of glusterfind and is 
slower.
That's the reason I said deleting changelogs is not recommended. If you don't 
have use cases
of above kind, you can delete changelogs.




or maybe archive the old changelogs so that they can be re-used in case 
of need later.


Regards,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-23 Thread Sincock, John [FLCPTY]
Aaah I s, thanks Kotresh :-)
This explains why there are so many files and why I sometimes didn't see some 
changed files during my testing where I was changing files and then immediately 
running a glusterfind.

When you say deleting the changelogs is not recommended because it will affect 
new glusterfind sessions - I assume it will be OK to delete changelogs that are 
further back into the past than the time period we're interested in? Please let 
me know if this is the case, or if you meant that removing old changelogs is 
likely to trigger bugs and cause all our glusterfinds to start failing 
outright...

We can leave the old changelogs there if we have to, but if we don’t increase 
the rollover time, the number will become astronomical as time goes on, so I 
hope we can delete or archive old changelogs for time periods we're no longer 
interested in.

For our purposes I think it should also be OK to try increasing the rollover 
time significantly, eg if we have it set to rollover every 10 minutes, then all 
we have to do is subtract 10 mins from the start time of each 
glusterfind/backup so it overlaps the end of the previous glusterfind period. 
In this way, any files changed just before a glusterfind/backup runs, might be 
missed by the first backup, but they will be caught by the next backup that 
runs later on. And it wont matter if some changed files get backed up twice -as 
long as we get at least one backup of every file that does change..

I note that by default there is no easy way to make glusterfind report on 
changes further back in time than the time you run glusterfind create to start 
a session - but I've already had some success at getting glusterfind to give 
results back to earlier times before the session was created (as long as the 
changelogs exist). I did this by using a script to manually set the time we're 
interested in in the status file(s) - ie in the main status file on the node 
running the "pre" command", and for every one of the extra status files stored 
on every node for each of their bricks :-)

I think my only remaining concern is how cpu-intensive the process is. I've had 
glusterfinds return very quickly if only reporting on changes for the last 
hour, or the last 10 hours or so. But if I go back a bit further, the time 
taken to do the glusterfind seems to really blow out and it sits there pegging 
all our CPUs at 100% for hours.

But you and Vijay have definitely given me a few tweaks I can look into - I 
think I will bump-up the changelog rollover a bit, and will follow Vijay's tip 
to get all our files labelled with pgfid's, and then perhaps the glusterfinds 
will be less cpu-intensive.

Thanks for the tips (Kotresh & Vijay), and I'll let you know how it goes. 

If the glusterfinds are still very cpu-intensive after all the pgfid labelling 
is done, I'll be happy to do some further testing if it can be of any help to 
you. Or if you're already trying to find time to work on increasing the 
efficiency of processing the changelogs, and you know where the improvements 
need to be made I'll just leave you to it and hope it all goes smoothly for you

Thanks again, and cheerios :-)
John





 




-Original Message-
From: Kotresh Hiremath Ravishankar [mailto:khire...@redhat.com] 
Sent: Friday, 23 October 2015 5:24 PM
To: Sincock, John [FLCPTY]
Cc: Vijaikumar Mallikarjuna; gluster-devel@gluster.org
Subject: Re: [Gluster-devel] Need advice re some major issues with glusterfind

Hi John,

The changelog files are generated every 15 secs recording the changes happened 
to filesystem within that span.  So every 15 sec, once the new changelog file 
is generated, it is ready to be consumed by glusterfind or any other consumers. 
The 15 sec time period is a tune-able.
e.g.,
 gluster vol set  changelog.rollover-time 300

The above will generate new changelog file every 300 sec instead of 15 sec. 
Hence reducing the number of changelogs. But glusterfind, will come to know 
about the changes in filesystem only after 300 secs!

Deleting these changelogs at .glusterfs/changelog/... is not recommeneded. It 
will affect any new glusterfind session going to be established. 


Thanks and Regards,
Kotresh H R1

- Original Message -
> From: "John Sincock [FLCPTY]" <j.sinc...@fugro.com>
> To: "Vijaikumar Mallikarjuna" <vmall...@redhat.com>
> Cc: gluster-devel@gluster.org
> Sent: Friday, October 23, 2015 9:54:25 AM
> Subject: Re: [Gluster-devel] Need advice re some major issues with 
> glusterfind
> 
> 
> Hi Vijay, pls see below again (I'm wondering if top-posting would be 
> easier, that's usually what I do, though I know some ppl don’t like 
> it)
> 
>  
> On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] 
> <j.sinc...@fugro.com>
> wrote:
> Hi Everybody,
> 
> We have recently upgraded our 220 TB gluster to 3.7.4, and we've been 
> trying to

Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-23 Thread Kotresh Hiremath Ravishankar
Hi John,

The changelog files are generated every 15 secs recording the changes happened 
to filesystem
within that span.  So every 15 sec, once the new changelog file is generated, 
it is ready 
to be consumed by glusterfind or any other consumers. The 15 sec time period is 
a tune-able.
e.g.,
 gluster vol set  changelog.rollover-time 300

The above will generate new changelog file every 300 sec instead of 15 sec. 
Hence reducing
the number of changelogs. But glusterfind, will come to know about the changes 
in filesystem
only after 300 secs!

Deleting these changelogs at .glusterfs/changelog/... is not recommeneded. It 
will affect any
new glusterfind session going to be established. 


Thanks and Regards,
Kotresh H R1

- Original Message -
> From: "John Sincock [FLCPTY]" <j.sinc...@fugro.com>
> To: "Vijaikumar Mallikarjuna" <vmall...@redhat.com>
> Cc: gluster-devel@gluster.org
> Sent: Friday, October 23, 2015 9:54:25 AM
> Subject: Re: [Gluster-devel] Need advice re some major issues with glusterfind
> 
> 
> Hi Vijay, pls see below again (I'm wondering if top-posting would be easier,
> that's usually what I do, though I know some ppl don’t like it)
> 
>  
> On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] <j.sinc...@fugro.com>
> wrote:
> Hi Everybody,
> 
> We have recently upgraded our 220 TB gluster to 3.7.4, and we've been trying
> to use the new glusterfind feature but have been having some serious
> problems with it. Overall the glusterfind looks very promising, so I don't
> want to offend anyone by raising these issues.
> 
> If these issues can be resolved or worked around, glusterfind will be a great
> feature.  So I would really appreciate any information or advice:
> 
> 1) What can be done about the vast number of tiny changelogs? We are seeing
> often 5+ small 89 byte changelog files per minute on EACH brick. Larger
> files if busier. We've been generating these changelogs for a few weeks and
> have in excess of 10,000 or 12,000 on most bricks. This makes glusterfinds
> very, very slow, especially on a node which has a lot of bricks, and looks
> unsustainable in the long run. Why are these files so small, and why are
> there so many of them, and how are they supposed to be managed in the long
> run? The sheer number of these files looks sure to impact performance in the
> long run.
> 
> 2) Pgfid xattribute is wreaking havoc with our backup scheme - when gluster
> adds this extended attribute to files it changes the ctime, which we were
> using to determine which files need to be archived. There should be a
> warning added to release notes & upgrade notes, so people can make a plan to
> manage this if required.
> 
> Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the
> rebalance took 5 days or so to complete, which looks like a major speed
> improvement over the more serial rebalance algorithm, so that's good. But I
> was hoping that the rebalance would also have had the side-effect of
> triggering all files to be labelled with the pgfid attribute by the time the
> rebalance completed, or failing that, after creation of an mlocate database
> across our entire gluster (which would have accessed every file, unless it
> is getting the info it needs only from directory inodes). Now it looks like
> ctimes are still being modified, and I think this can only be caused by
> files still being labelled with pgfids.
> 
> How can we force gluster to get this pgfid labelling over and done with, for
> all files that are already on the volume? We can't have gluster continuing
> to add pgfids in bursts here and there, eg when files are read for the first
> time since the upgrade. We need to get it over and done with. We have just
> had to turn off pgfid creation on the volume until we can force gluster to
> get it over and done with in one go.
>  
>  
> Hi John,
>  
> Was quota turned on/off before/after performing re-balance? If the pgfid is
>  missing, this can be healed by performing 'find  | xargs
> stat', all the files will get looked-up once and the pgfid healing will
> happen.
> Also could you please provide all the volume files under
> '/var/lib/glusterd/vols//*.vol'?
>  
> Thanks,
> Vijay
>  
>  
> Hi Vijay
>  
> Quota has never been turned on in our gluster, so it can’t be any
> quota-related xattrs which are resetting our ctimes, so I’m pretty sure it
> must be due to pgfids still being added.
>  
> Thanks for the tip re using stat, if that should trigger the pgfid build on
> each file, then I will run that when I have a chance. We’ll have to get our
> archiving of data back up to date, re-enable pgfid build option, and then
> run the stat over a weekend or something, as it will

Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-22 Thread Sincock, John [FLCPTY]

Hi Vijay, pls see below again (I'm wondering if top-posting would be easier, 
that's usually what I do, though I know some ppl don’t like it)

 
On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY]  
wrote:
Hi Everybody,

We have recently upgraded our 220 TB gluster to 3.7.4, and we've been trying to 
use the new glusterfind feature but have been having some serious problems with 
it. Overall the glusterfind looks very promising, so I don't want to offend 
anyone by raising these issues.

If these issues can be resolved or worked around, glusterfind will be a great 
feature.  So I would really appreciate any information or advice:

1) What can be done about the vast number of tiny changelogs? We are seeing 
often 5+ small 89 byte changelog files per minute on EACH brick. Larger files 
if busier. We've been generating these changelogs for a few weeks and have in 
excess of 10,000 or 12,000 on most bricks. This makes glusterfinds very, very 
slow, especially on a node which has a lot of bricks, and looks unsustainable 
in the long run. Why are these files so small, and why are there so many of 
them, and how are they supposed to be managed in the long run? The sheer number 
of these files looks sure to impact performance in the long run.

2) Pgfid xattribute is wreaking havoc with our backup scheme - when gluster 
adds this extended attribute to files it changes the ctime, which we were using 
to determine which files need to be archived. There should be a warning added 
to release notes & upgrade notes, so people can make a plan to manage this if 
required.

Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the rebalance 
took 5 days or so to complete, which looks like a major speed improvement over 
the more serial rebalance algorithm, so that's good. But I was hoping that the 
rebalance would also have had the side-effect of triggering all files to be 
labelled with the pgfid attribute by the time the rebalance completed, or 
failing that, after creation of an mlocate database across our entire gluster 
(which would have accessed every file, unless it is getting the info it needs 
only from directory inodes). Now it looks like ctimes are still being modified, 
and I think this can only be caused by files still being labelled with pgfids.

How can we force gluster to get this pgfid labelling over and done with, for 
all files that are already on the volume? We can't have gluster continuing to 
add pgfids in bursts here and there, eg when files are read for the first time 
since the upgrade. We need to get it over and done with. We have just had to 
turn off pgfid creation on the volume until we can force gluster to get it over 
and done with in one go.
 
 
Hi John,
 
Was quota turned on/off before/after performing re-balance? If the pgfid is  
missing, this can be healed by performing 'find  | xargs stat', 
all the files will get looked-up once and the pgfid healing will happen.
Also could you please provide all the volume files under 
'/var/lib/glusterd/vols//*.vol'?
 
Thanks,
Vijay
 
 
Hi Vijay
 
Quota has never been turned on in our gluster, so it can’t be any quota-related 
xattrs which are resetting our ctimes, so I’m pretty sure it must be due to 
pgfids still being added.
 
Thanks for the tip re using stat, if that should trigger the pgfid build on 
each file, then I will run that when I have a chance. We’ll have to get our 
archiving of data back up to date, re-enable pgfid build option, and then run 
the stat over a weekend or something, as it will take a while.
 
I’m still quite concerned about the number of changelogs being generated. Do 
you know if there any plans to change the way changelogs are generated so there 
aren’t so many of them, and to process them more efficiently? I think this will 
be vital to improving performance of glusterfind in future, as there are 
currently an enormous number of these small changelogs being generated on each 
of our gluster bricks.
  
Below is the volfile for one brick, the others are all equivalent. We haven’t 
tweaked the volume options much, besides increasing the io thread count to 32, 
and client/event threads to 6 (since we have a lot of small files on our 
gluster (30 million files, a lot of which are small, and some of which are 
large to very large):
 

Hi John,

PGFID xattrs are updated only when update-link-count-parent is enabled in the 
brick volume file. This option is enabled when quota is enabled on a volume.
In the volume file you provided below has update-link-count-parent disabled, I 
am wondering why PGFID xattrs are updated.

Thanks,
Vijay
 

Hi Vijay,
somewhere in the 3.7.5 upgrade instructions or the glusterfind documentation, 
there was a mention that we should enable a server option called 
storage.build-pgfid, which we did as it speeds up glusterfinds. You cannot see 
this in the volfile but you can see it when you do gluster volume info volname. 
So for our volume we currently have: 


Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-22 Thread Vijaikumar Mallikarjuna
On Thu, Oct 22, 2015 at 8:41 AM, Sincock, John [FLCPTY] <j.sinc...@fugro.com
> wrote:

> Pls see below
>
>
>
> *From:* Vijaikumar Mallikarjuna [mailto:vmall...@redhat.com]
> *Sent:* Wednesday, 21 October 2015 6:37 PM
> *To:* Sincock, John [FLCPTY]
> *Cc:* gluster-devel@gluster.org
> *Subject:* Re: [Gluster-devel] Need advice re some major issues with
> glusterfind
>
>
>
>
>
>
>
> On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] <
> j.sinc...@fugro.com> wrote:
>
> Hi Everybody,
>
> We have recently upgraded our 220 TB gluster to 3.7.4, and we've been
> trying to use the new glusterfind feature but have been having some serious
> problems with it. Overall the glusterfind looks very promising, so I don't
> want to offend anyone by raising these issues.
>
> If these issues can be resolved or worked around, glusterfind will be a
> great feature.  So I would really appreciate any information or advice:
>
> 1) What can be done about the vast number of tiny changelogs? We are
> seeing often 5+ small 89 byte changelog files per minute on EACH brick.
> Larger files if busier. We've been generating these changelogs for a few
> weeks and have in excess of 10,000 or 12,000 on most bricks. This makes
> glusterfinds very, very slow, especially on a node which has a lot of
> bricks, and looks unsustainable in the long run. Why are these files so
> small, and why are there so many of them, and how are they supposed to be
> managed in the long run? The sheer number of these files looks sure to
> impact performance in the long run.
>
> 2) Pgfid xattribute is wreaking havoc with our backup scheme - when
> gluster adds this extended attribute to files it changes the ctime, which
> we were using to determine which files need to be archived. There should be
> a warning added to release notes & upgrade notes, so people can make a plan
> to manage this if required.
>
> Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the
> rebalance took 5 days or so to complete, which looks like a major speed
> improvement over the more serial rebalance algorithm, so that's good. But I
> was hoping that the rebalance would also have had the side-effect of
> triggering all files to be labelled with the pgfid attribute by the time
> the rebalance completed, or failing that, after creation of an mlocate
> database across our entire gluster (which would have accessed every file,
> unless it is getting the info it needs only from directory inodes). Now it
> looks like ctimes are still being modified, and I think this can only be
> caused by files still being labelled with pgfids.
>
> How can we force gluster to get this pgfid labelling over and done with,
> for all files that are already on the volume? We can't have gluster
> continuing to add pgfids in bursts here and there, eg when files are read
> for the first time since the upgrade. We need to get it over and done with.
> We have just had to turn off pgfid creation on the volume until we can
> force gluster to get it over and done with in one go.
>
>
>
>
>
> Hi John,
>
>
>
> Was quota turned on/off before/after performing re-balance? If the pgfid
> is  missing, this can be healed by performing 'find  | xargs
> stat', all the files will get looked-up once and the pgfid healing will
> happen.
>
> Also could you please provide all the volume files under
> '/var/lib/glusterd/vols//*.vol'?
>
>
>
> Thanks,
>
> Vijay
>
>
>
>
>
> Hi Vijay
>
>
>
> Quota has never been turned on in our gluster, so it can’t be any
> quota-related xattrs which are resetting our ctimes, so I’m pretty sure it
> must be due to pgfids still being added.
>
>
>
> Thanks for the tip re using stat, if that should trigger the pgfid build
> on each file, then I will run that when I have a chance. We’ll have to get
> our archiving of data back up to date, re-enable pgfid build option, and
> then run the stat over a weekend or something, as it will take a while.
>
>
>
> I’m still quite concerned about the number of changelogs being generated.
> Do you know if there any plans to change the way changelogs are generated
> so there aren’t so many of them, and to process them more efficiently? I
> think this will be vital to improving performance of glusterfind in future,
> as there are currently an enormous number of these small changelogs being
> generated on each of our gluster bricks.
>
>
>
Below is the volfile for one brick, the others are all equivalent. We
> haven’t tweaked the volume options much, besides increasing the io thread
> count to 32, and client/event threads to 6 (since we have a lot of small
> files on our gluster (30 millio

Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-22 Thread Vijaikumar Mallikarjuna
Hi Kotresh/Venky,

Could you please provide your inputs on the change-log issues mentioned
below?

Thanks,
Vijay


On Fri, Oct 23, 2015 at 9:54 AM, Sincock, John [FLCPTY]  wrote:

>
> Hi Vijay, pls see below again (I'm wondering if top-posting would be
> easier, that's usually what I do, though I know some ppl don’t like it)
>
>
> On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] <
> j.sinc...@fugro.com> wrote:
> Hi Everybody,
>
> We have recently upgraded our 220 TB gluster to 3.7.4, and we've been
> trying to use the new glusterfind feature but have been having some serious
> problems with it. Overall the glusterfind looks very promising, so I don't
> want to offend anyone by raising these issues.
>
> If these issues can be resolved or worked around, glusterfind will be a
> great feature.  So I would really appreciate any information or advice:
>
> 1) What can be done about the vast number of tiny changelogs? We are
> seeing often 5+ small 89 byte changelog files per minute on EACH brick.
> Larger files if busier. We've been generating these changelogs for a few
> weeks and have in excess of 10,000 or 12,000 on most bricks. This makes
> glusterfinds very, very slow, especially on a node which has a lot of
> bricks, and looks unsustainable in the long run. Why are these files so
> small, and why are there so many of them, and how are they supposed to be
> managed in the long run? The sheer number of these files looks sure to
> impact performance in the long run.
>
> 2) Pgfid xattribute is wreaking havoc with our backup scheme - when
> gluster adds this extended attribute to files it changes the ctime, which
> we were using to determine which files need to be archived. There should be
> a warning added to release notes & upgrade notes, so people can make a plan
> to manage this if required.
>
> Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the
> rebalance took 5 days or so to complete, which looks like a major speed
> improvement over the more serial rebalance algorithm, so that's good. But I
> was hoping that the rebalance would also have had the side-effect of
> triggering all files to be labelled with the pgfid attribute by the time
> the rebalance completed, or failing that, after creation of an mlocate
> database across our entire gluster (which would have accessed every file,
> unless it is getting the info it needs only from directory inodes). Now it
> looks like ctimes are still being modified, and I think this can only be
> caused by files still being labelled with pgfids.
>
> How can we force gluster to get this pgfid labelling over and done with,
> for all files that are already on the volume? We can't have gluster
> continuing to add pgfids in bursts here and there, eg when files are read
> for the first time since the upgrade. We need to get it over and done with.
> We have just had to turn off pgfid creation on the volume until we can
> force gluster to get it over and done with in one go.
>
>
> Hi John,
>
> Was quota turned on/off before/after performing re-balance? If the pgfid
> is  missing, this can be healed by performing 'find  | xargs
> stat', all the files will get looked-up once and the pgfid healing will
> happen.
> Also could you please provide all the volume files under
> '/var/lib/glusterd/vols//*.vol'?
>
> Thanks,
> Vijay
>
>
> Hi Vijay
>
> Quota has never been turned on in our gluster, so it can’t be any
> quota-related xattrs which are resetting our ctimes, so I’m pretty sure it
> must be due to pgfids still being added.
>
> Thanks for the tip re using stat, if that should trigger the pgfid build
> on each file, then I will run that when I have a chance. We’ll have to get
> our archiving of data back up to date, re-enable pgfid build option, and
> then run the stat over a weekend or something, as it will take a while.
>
> I’m still quite concerned about the number of changelogs being generated.
> Do you know if there any plans to change the way changelogs are generated
> so there aren’t so many of them, and to process them more efficiently? I
> think this will be vital to improving performance of glusterfind in future,
> as there are currently an enormous number of these small changelogs being
> generated on each of our gluster bricks.
>
> Below is the volfile for one brick, the others are all equivalent. We
> haven’t tweaked the volume options much, besides increasing the io thread
> count to 32, and client/event threads to 6 (since we have a lot of small
> files on our gluster (30 million files, a lot of which are small, and some
> of which are large to very large):
>
>
> Hi John,
>
> PGFID xattrs are updated only when update-link-count-parent is enabled in
> the brick volume file. This option is enabled when quota is enabled on a
> volume.
> In the volume file you provided below
> has update-link-count-parent disabled, I am wondering why PGFID xattrs are
> updated.
>
> Thanks,
> Vijay
>
>
> Hi Vijay,
> somewhere in the 3.7.5 

Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-21 Thread Sincock, John [FLCPTY]
Pls see below

From: Vijaikumar Mallikarjuna [mailto:vmall...@redhat.com]
Sent: Wednesday, 21 October 2015 6:37 PM
To: Sincock, John [FLCPTY]
Cc: gluster-devel@gluster.org
Subject: Re: [Gluster-devel] Need advice re some major issues with glusterfind



On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY] 
<j.sinc...@fugro.com<mailto:j.sinc...@fugro.com>> wrote:
Hi Everybody,

We have recently upgraded our 220 TB gluster to 3.7.4, and we've been trying to 
use the new glusterfind feature but have been having some serious problems with 
it. Overall the glusterfind looks very promising, so I don't want to offend 
anyone by raising these issues.

If these issues can be resolved or worked around, glusterfind will be a great 
feature.  So I would really appreciate any information or advice:

1) What can be done about the vast number of tiny changelogs? We are seeing 
often 5+ small 89 byte changelog files per minute on EACH brick. Larger files 
if busier. We've been generating these changelogs for a few weeks and have in 
excess of 10,000 or 12,000 on most bricks. This makes glusterfinds very, very 
slow, especially on a node which has a lot of bricks, and looks unsustainable 
in the long run. Why are these files so small, and why are there so many of 
them, and how are they supposed to be managed in the long run? The sheer number 
of these files looks sure to impact performance in the long run.

2) Pgfid xattribute is wreaking havoc with our backup scheme - when gluster 
adds this extended attribute to files it changes the ctime, which we were using 
to determine which files need to be archived. There should be a warning added 
to release notes & upgrade notes, so people can make a plan to manage this if 
required.

Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the rebalance 
took 5 days or so to complete, which looks like a major speed improvement over 
the more serial rebalance algorithm, so that's good. But I was hoping that the 
rebalance would also have had the side-effect of triggering all files to be 
labelled with the pgfid attribute by the time the rebalance completed, or 
failing that, after creation of an mlocate database across our entire gluster 
(which would have accessed every file, unless it is getting the info it needs 
only from directory inodes). Now it looks like ctimes are still being modified, 
and I think this can only be caused by files still being labelled with pgfids.

How can we force gluster to get this pgfid labelling over and done with, for 
all files that are already on the volume? We can't have gluster continuing to 
add pgfids in bursts here and there, eg when files are read for the first time 
since the upgrade. We need to get it over and done with. We have just had to 
turn off pgfid creation on the volume until we can force gluster to get it over 
and done with in one go.


Hi John,

Was quota turned on/off before/after performing re-balance? If the pgfid is  
missing, this can be healed by performing 'find  | xargs stat', 
all the files will get looked-up once and the pgfid healing will happen.
Also could you please provide all the volume files under 
'/var/lib/glusterd/vols//*.vol'?

Thanks,
Vijay


Hi Vijay

Quota has never been turned on in our gluster, so it can’t be any quota-related 
xattrs which are resetting our ctimes, so I’m pretty sure it must be due to 
pgfids still being added.

Thanks for the tip re using stat, if that should trigger the pgfid build on 
each file, then I will run that when I have a chance. We’ll have to get our 
archiving of data back up to date, re-enable pgfid build option, and then run 
the stat over a weekend or something, as it will take a while.

I’m still quite concerned about the number of changelogs being generated. Do 
you know if there any plans to change the way changelogs are generated so there 
aren’t so many of them, and to process them more efficiently? I think this will 
be vital to improving performance of glusterfind in future, as there are 
currently an enormous number of these small changelogs being generated on each 
of our gluster bricks.

Below is the volfile for one brick, the others are all equivalent. We haven’t 
tweaked the volume options much, besides increasing the io thread count to 32, 
and client/event threads to 6 (since we have a lot of small files on our 
gluster (30 million files, a lot of which are small, and some of which are 
large to very large):

[root@g-unit-1 sbin]# cat 
/var/lib/glusterd/vols/vol00/vol00.g-unit-1.mnt-glusterfs-bricks-1.vol
volume vol00-posix
type storage/posix
option update-link-count-parent off
option volume-id 292b8701-d394-48ee-a224-b5a20ca7ce0f
option directory /mnt/glusterfs/bricks/1
end-volume

volume vol00-trash
type features/trash
option trash-internal-op off
option brick-path /mnt/glusterfs/bricks/1
option trash-dir .trashcan
subvolumes vol00-posix
end-volume

volume vol00-changetimerecorder
 

Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-21 Thread Vijaikumar Mallikarjuna
On Wed, Oct 21, 2015 at 5:53 AM, Sincock, John [FLCPTY]  wrote:

> Hi Everybody,
>
> We have recently upgraded our 220 TB gluster to 3.7.4, and we've been
> trying to use the new glusterfind feature but have been having some serious
> problems with it. Overall the glusterfind looks very promising, so I don't
> want to offend anyone by raising these issues.
>
> If these issues can be resolved or worked around, glusterfind will be a
> great feature.  So I would really appreciate any information or advice:
>
> 1) What can be done about the vast number of tiny changelogs? We are
> seeing often 5+ small 89 byte changelog files per minute on EACH brick.
> Larger files if busier. We've been generating these changelogs for a few
> weeks and have in excess of 10,000 or 12,000 on most bricks. This makes
> glusterfinds very, very slow, especially on a node which has a lot of
> bricks, and looks unsustainable in the long run. Why are these files so
> small, and why are there so many of them, and how are they supposed to be
> managed in the long run? The sheer number of these files looks sure to
> impact performance in the long run.
>
> 2) Pgfid xattribute is wreaking havoc with our backup scheme - when
> gluster adds this extended attribute to files it changes the ctime, which
> we were using to determine which files need to be archived. There should be
> a warning added to release notes & upgrade notes, so people can make a plan
> to manage this if required.
>
> Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the
> rebalance took 5 days or so to complete, which looks like a major speed
> improvement over the more serial rebalance algorithm, so that's good. But I
> was hoping that the rebalance would also have had the side-effect of
> triggering all files to be labelled with the pgfid attribute by the time
> the rebalance completed, or failing that, after creation of an mlocate
> database across our entire gluster (which would have accessed every file,
> unless it is getting the info it needs only from directory inodes). Now it
> looks like ctimes are still being modified, and I think this can only be
> caused by files still being labelled with pgfids.
>
> How can we force gluster to get this pgfid labelling over and done with,
> for all files that are already on the volume? We can't have gluster
> continuing to add pgfids in bursts here and there, eg when files are read
> for the first time since the upgrade. We need to get it over and done with.
> We have just had to turn off pgfid creation on the volume until we can
> force gluster to get it over and done with in one go.
>
> Hi John,

Was quota turned on/off before/after performing re-balance? If the pgfid is
 missing, this can be healed by performing 'find  | xargs
stat', all the files will get looked-up once and the pgfid healing will
happen.
Also could you please provide all the volume files under
'/var/lib/glusterd/vols//*.vol'?

Thanks,
Vijay






> 3) Files modified just before a glusterfind pre are often not included in
> the changed files list, unless pre command is run again a bit later - I
> think changelogs are missing very recent changes and need to be flushed or
> something before the pre command uses them?
>
> 4) BUG: Glusterfind follows symlinks off bricks and onto NFS mounted
> directories (and will cause these shares to be mounted if you have autofs
> enabled). Glusterfind should definitely not follow symlinks, but it does.
> For now, we are getting around this by turning off autofs when re run
> glusterfinds, but this should not be necessary. Glusterfind must be fixed
> so it never follows symlinks and never leaves the brick it is currently
> searching.
>
> 5) We have one of our nodes  with 16 bricks, and on this machine,
> glusterfind pre command seems to get stuck pegging all 8 cores to 100%, an
> strace of an offending processes gives an endless stream of these lseeks
> and reads and very little else. What is going on here? It doesn't look
> right... :
>
> lseek(13, 17188864, SEEK_SET)   = 17188864
> read(13,
> "\r\0\0\0\4\0J\0\3\25\2\"\0013\0J\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(13, 17189888, SEEK_SET)   = 17189888
> read(13,
> "\r\0\0\0\4\0\"\0\3\31\0020\1#\0\"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(13, 17190912, SEEK_SET)   = 17190912
> read(13,
> "\r\0\0\0\3\0\365\0\3\1\1\372\0\365\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(13, 17191936, SEEK_SET)   = 17191936
> read(13,
> "\r\0\0\0\4\0F\0\3\17\2\"\0017\0F\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(13, 17192960, SEEK_SET)   = 17192960
> read(13,
> "\r\0\0\0\4\0006\0\2\371\2\4\1\31\0006\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(13, 17193984, SEEK_SET)   = 17193984
> read(13,
> "\r\0\0\0\4\0L\0\3\31\2\36\1/\0L\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024)
> = 1024
>
> I saved one of these straces for 20 or 

Re: [Gluster-devel] Need advice re some major issues with glusterfind

2015-10-20 Thread Raghavendra Gowdappa
Hi John,

- Original Message -
> From: "John Sincock [FLCPTY]" <j.sinc...@fugro.com>
> To: gluster-devel@gluster.org
> Sent: Wednesday, October 21, 2015 5:53:23 AM
> Subject: [Gluster-devel] Need advice re some major issues with glusterfind
> 
> Hi Everybody,
> 
> We have recently upgraded our 220 TB gluster to 3.7.4, and we've been trying
> to use the new glusterfind feature but have been having some serious
> problems with it. Overall the glusterfind looks very promising, so I don't
> want to offend anyone by raising these issues.
> 
> If these issues can be resolved or worked around, glusterfind will be a great
> feature.  So I would really appreciate any information or advice:
> 
> 1) What can be done about the vast number of tiny changelogs? We are seeing
> often 5+ small 89 byte changelog files per minute on EACH brick. Larger
> files if busier. We've been generating these changelogs for a few weeks and
> have in excess of 10,000 or 12,000 on most bricks. This makes glusterfinds
> very, very slow, especially on a node which has a lot of bricks, and looks
> unsustainable in the long run. Why are these files so small, and why are
> there so many of them, and how are they supposed to be managed in the long
> run? The sheer number of these files looks sure to impact performance in the
> long run.
> 
> 2) Pgfid xattribute is wreaking havoc with our backup scheme - when gluster
> adds this extended attribute to files it changes the ctime, which we were
> using to determine which files need to be archived. There should be a
> warning added to release notes & upgrade notes, so people can make a plan to
> manage this if required.
> 
> Also, we ran a rebalance immediately after the 3.7.4 upgrade, and the
> rebalance took 5 days or so to complete, which looks like a major speed
> improvement over the more serial rebalance algorithm, so that's good. But I
> was hoping that the rebalance would also have had the side-effect of
> triggering all files to be labelled with the pgfid attribute by the time the
> rebalance completed, or failing that, after creation of an mlocate database
> across our entire gluster (which would have accessed every file, unless it
> is getting the info it needs only from directory inodes). Now it looks like
> ctimes are still being modified, and I think this can only be caused by
> files still being labelled with pgfids.
> 
> How can we force gluster to get this pgfid labelling over and done with, for
> all files that are already on the volume? We can't have gluster continuing
> to add pgfids in bursts here and there, eg when files are read for the first
> time since the upgrade. We need to get it over and done with. We have just
> had to turn off pgfid creation on the volume until we can force gluster to
> get it over and done with in one go.

We are looking into pgfid xattr issue. Its a long weekend here in India. So, 
kindly expect a delay on update on this issue.

> 
> 3) Files modified just before a glusterfind pre are often not included in the
> changed files list, unless pre command is run again a bit later - I think
> changelogs are missing very recent changes and need to be flushed or
> something before the pre command uses them?
> 
> 4) BUG: Glusterfind follows symlinks off bricks and onto NFS mounted
> directories (and will cause these shares to be mounted if you have autofs
> enabled). Glusterfind should definitely not follow symlinks, but it does.
> For now, we are getting around this by turning off autofs when re run
> glusterfinds, but this should not be necessary. Glusterfind must be fixed so
> it never follows symlinks and never leaves the brick it is currently
> searching.
> 
> 5) We have one of our nodes  with 16 bricks, and on this machine, glusterfind
> pre command seems to get stuck pegging all 8 cores to 100%, an strace of an
> offending processes gives an endless stream of these lseeks and reads and
> very little else. What is going on here? It doesn't look right... :
> 
> lseek(13, 17188864, SEEK_SET)   = 17188864
> read(13,
> "\r\0\0\0\4\0J\0\3\25\2\"\0013\0J\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024)
> = 1024
> lseek(13, 17189888, SEEK_SET)   = 17189888
> read(13,
> "\r\0\0\0\4\0\"\0\3\31\0020\1#\0\"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(13, 17190912, SEEK_SET)   = 17190912
> read(13,
> "\r\0\0\0\3\0\365\0\3\1\1\372\0\365\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> lseek(13, 17191936, SEEK_SET)   = 17191936
> read(13,
> "\r\0\0\0\4\0F\0\3\17\2\"\0017\0F\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024)
> = 1024
> lseek(13, 17192960, SEEK_SET)   = 17192960
>