[Gluster-devel] EHT / DHT

2014-11-27 Thread Jan Holtzhausen
Hi
I was wondering, is there a way to change / parameter to pass to clusters DHT 
to change the distribution algorithm to only take into account filename and not 
the preceding filesystem path?
i.e when a file is at: /mount/gluster/directory/filename.ext
To only hash on “filename.ext” ?

Best Regards
Jan Holtzhausen
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Jan H Holtzhausen
I could tell you… 
But Symantec wouldn’t like it…..

From:  Poornima Gurusiddaiah 
Date:  Wednesday 26 November 2014 at 7:16 AM
To:  Jan H Holtzhausen 
Cc:  
Subject:  Re: [Gluster-devel] EHT / DHT

Out of curiosity, what back end and deduplication solution are you using?

Regards,
Poornima

From: "Jan H Holtzhausen" 
To: "Anand Avati" , "Shyam" , 
gluster-devel@gluster.org
Sent: Wednesday, November 26, 2014 3:43:36 AM
Subject: Re: [Gluster-devel] EHT / DHT

Yes we have deduplication at the filesystem layer

BR
Jan

From:  Anand Avati 
Date:  Wednesday 26 November 2014 at 12:11 AM
To:  Jan H Holtzhausen , Shyam , 

Subject:  Re: [Gluster-devel] EHT / DHT

Unless there is some sort of de-duplication under the covers happening in 
the brick, or the files are hardlinks to each other, there is no cache 
benefit whatsoever by having identical files placed on the same server.

Thanks,
Avati

On Tue Nov 25 2014 at 12:59:25 PM Jan H Holtzhausen  
wrote:
As to the why.
Filesystem cache hits.
Files with the same name tend to be the same files.

Regards
Jan




On 2014/11/25, 8:42 PM, "Jan H Holtzhausen"  wrote:

>So in a distributed cluster, the GFID tells all bricks what a files
>preceding directory structure looks like?
>Where the physical file is saved is a function of the filename ONLY.
>Therefore My requirement should be met by default, or am I being dense?
>
>BR
>Jan
>
>
>
>On 2014/11/25, 8:15 PM, "Shyam"  wrote:
>
>>On 11/25/2014 03:11 PM, Jan H Holtzhausen wrote:
>>> STILL doesn’t work … exact same file ends up on 2 different bricks …
>>> I must be missing something.
>>> All I need is for:
>>> /directory1/subdirectory2/foo
>>> And
>>> /directory2/subdirectoryaaa999/foo
>>>
>>>
>>> To end up on the same brick….
>>
>>This is not possible is what I was attempting to state in the previous
>>mail. The regex filter is not for this purpose.
>>
>>The hash is always based on the name of the file, but the location is
>>based on the distribution/layout of the directory, which is different
>>for each directory based on its GFID.
>>
>>So there are no options in the code to enable what you seek at present.
>>
>>Why is this needed?
>>
>>Shyam
>
>___
>Gluster-devel mailing list
>Gluster-devel@gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Poornima Gurusiddaiah
Out of curiosity, what back end and deduplication solution are you using? 

Regards, 
Poornima 

- Original Message -

From: "Jan H Holtzhausen"  
To: "Anand Avati" , "Shyam" , 
gluster-devel@gluster.org 
Sent: Wednesday, November 26, 2014 3:43:36 AM 
Subject: Re: [Gluster-devel] EHT / DHT 

Yes we have deduplication at the filesystem layer 

BR 
Jan 

From: Anand Avati < av...@gluster.org > 
Date: Wednesday 26 November 2014 at 12:11 AM 
To: Jan H Holtzhausen < j...@holtztech.info >, Shyam < srang...@redhat.com >, < 
gluster-devel@gluster.org > 
Subject: Re: [Gluster-devel] EHT / DHT 

Unless there is some sort of de-duplication under the covers happening in the 
brick, or the files are hardlinks to each other, there is no cache benefit 
whatsoever by having identical files placed on the same server. 

Thanks, 
Avati 

On Tue Nov 25 2014 at 12:59:25 PM Jan H Holtzhausen < j...@holtztech.info > 
wrote: 


As to the why. 
Filesystem cache hits. 
Files with the same name tend to be the same files. 

Regards 
Jan 




On 2014/11/25, 8:42 PM, "Jan H Holtzhausen" < j...@holtztech.info > wrote: 

>So in a distributed cluster, the GFID tells all bricks what a files 
>preceding directory structure looks like? 
>Where the physical file is saved is a function of the filename ONLY. 
>Therefore My requirement should be met by default, or am I being dense? 
> 
>BR 
>Jan 
> 
> 
> 
>On 2014/11/25, 8:15 PM, "Shyam" < srang...@redhat.com > wrote: 
> 
>>On 11/25/2014 03:11 PM, Jan H Holtzhausen wrote: 
>>> STILL doesn’t work … exact same file ends up on 2 different bricks … 
>>> I must be missing something. 
>>> All I need is for: 
>>> /directory1/subdirectory2/foo 
>>> And 
>>> /directory2/ subdirectoryaaa999/foo 
>>> 
>>> 
>>> To end up on the same brick…. 
>> 
>>This is not possible is what I was attempting to state in the previous 
>>mail. The regex filter is not for this purpose. 
>> 
>>The hash is always based on the name of the file, but the location is 
>>based on the distribution/layout of the directory, which is different 
>>for each directory based on its GFID. 
>> 
>>So there are no options in the code to enable what you seek at present. 
>> 
>>Why is this needed? 
>> 
>>Shyam 
> 
>_ __ 
>Gluster-devel mailing list 
> Gluster-devel@gluster.org 
> http://supercolony.gluster. org/mailman/listinfo/gluster- devel 

__ _ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://supercolony.gluster. org/mailman/listinfo/gluster- devel 




___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://supercolony.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Anand Avati
On Tue Nov 25 2014 at 2:13:43 PM Jan H Holtzhausen 
wrote:

> Yes we have deduplication at the filesystem layer
>

As things stand, it is not possible to achieve what you are looking for
with DHT. By using regexes, you can influence the placement of files
relative to other filenames *only within the same directory* (thereby
making temp file followed by rename inexpensive, etc.)

What you are looking for requires a major change in DHT - using a single
hash layout for all directories and not having a directory-specific
component in the hash calculation.

Thanks,
Avati
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Jan H Holtzhausen
Yes we have deduplication at the filesystem layer

BR
Jan

From:  Anand Avati 
Date:  Wednesday 26 November 2014 at 12:11 AM
To:  Jan H Holtzhausen , Shyam , 

Subject:  Re: [Gluster-devel] EHT / DHT

Unless there is some sort of de-duplication under the covers happening in 
the brick, or the files are hardlinks to each other, there is no cache 
benefit whatsoever by having identical files placed on the same server.

Thanks,
Avati

On Tue Nov 25 2014 at 12:59:25 PM Jan H Holtzhausen  
wrote:
As to the why.
Filesystem cache hits.
Files with the same name tend to be the same files.

Regards
Jan




On 2014/11/25, 8:42 PM, "Jan H Holtzhausen"  wrote:

>So in a distributed cluster, the GFID tells all bricks what a files
>preceding directory structure looks like?
>Where the physical file is saved is a function of the filename ONLY.
>Therefore My requirement should be met by default, or am I being dense?
>
>BR
>Jan
>
>
>
>On 2014/11/25, 8:15 PM, "Shyam"  wrote:
>
>>On 11/25/2014 03:11 PM, Jan H Holtzhausen wrote:
>>> STILL doesn’t work … exact same file ends up on 2 different bricks …
>>> I must be missing something.
>>> All I need is for:
>>> /directory1/subdirectory2/foo
>>> And
>>> /directory2/subdirectoryaaa999/foo
>>>
>>>
>>> To end up on the same brick….
>>
>>This is not possible is what I was attempting to state in the previous
>>mail. The regex filter is not for this purpose.
>>
>>The hash is always based on the name of the file, but the location is
>>based on the distribution/layout of the directory, which is different
>>for each directory based on its GFID.
>>
>>So there are no options in the code to enable what you seek at present.
>>
>>Why is this needed?
>>
>>Shyam
>
>___
>Gluster-devel mailing list
>Gluster-devel@gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Anand Avati
Unless there is some sort of de-duplication under the covers happening in
the brick, or the files are hardlinks to each other, there is no cache
benefit whatsoever by having identical files placed on the same server.

Thanks,
Avati

On Tue Nov 25 2014 at 12:59:25 PM Jan H Holtzhausen 
wrote:

> As to the why.
> Filesystem cache hits.
> Files with the same name tend to be the same files.
>
> Regards
> Jan
>
>
>
>
> On 2014/11/25, 8:42 PM, "Jan H Holtzhausen"  wrote:
>
> >So in a distributed cluster, the GFID tells all bricks what a files
> >preceding directory structure looks like?
> >Where the physical file is saved is a function of the filename ONLY.
> >Therefore My requirement should be met by default, or am I being dense?
> >
> >BR
> >Jan
> >
> >
> >
> >On 2014/11/25, 8:15 PM, "Shyam"  wrote:
> >
> >>On 11/25/2014 03:11 PM, Jan H Holtzhausen wrote:
> >>> STILL doesn’t work … exact same file ends up on 2 different bricks …
> >>> I must be missing something.
> >>> All I need is for:
> >>> /directory1/subdirectory2/foo
> >>> And
> >>> /directory2/subdirectoryaaa999/foo
> >>>
> >>>
> >>> To end up on the same brick….
> >>
> >>This is not possible is what I was attempting to state in the previous
> >>mail. The regex filter is not for this purpose.
> >>
> >>The hash is always based on the name of the file, but the location is
> >>based on the distribution/layout of the directory, which is different
> >>for each directory based on its GFID.
> >>
> >>So there are no options in the code to enable what you seek at present.
> >>
> >>Why is this needed?
> >>
> >>Shyam
> >
> >___
> >Gluster-devel mailing list
> >Gluster-devel@gluster.org
> >http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Jan H Holtzhausen
As to the why.
Filesystem cache hits.
Files with the same name tend to be the same files.

Regards
Jan




On 2014/11/25, 8:42 PM, "Jan H Holtzhausen"  wrote:

>So in a distributed cluster, the GFID tells all bricks what a files 
>preceding directory structure looks like?
>Where the physical file is saved is a function of the filename ONLY.
>Therefore My requirement should be met by default, or am I being dense?
>
>BR
>Jan
>
>
>
>On 2014/11/25, 8:15 PM, "Shyam"  wrote:
>
>>On 11/25/2014 03:11 PM, Jan H Holtzhausen wrote:
>>> STILL doesn’t work … exact same file ends up on 2 different bricks …
>>> I must be missing something.
>>> All I need is for:
>>> /directory1/subdirectory2/foo
>>> And
>>> /directory2/subdirectoryaaa999/foo
>>>
>>>
>>> To end up on the same brick….
>>
>>This is not possible is what I was attempting to state in the previous 
>>mail. The regex filter is not for this purpose.
>>
>>The hash is always based on the name of the file, but the location is 
>>based on the distribution/layout of the directory, which is different 
>>for each directory based on its GFID.
>>
>>So there are no options in the code to enable what you seek at present.
>>
>>Why is this needed?
>>
>>Shyam
>
>___
>Gluster-devel mailing list
>Gluster-devel@gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Jan H Holtzhausen
So in a distributed cluster, the GFID tells all bricks what a files 
preceding directory structure looks like?
Where the physical file is saved is a function of the filename ONLY.
Therefore My requirement should be met by default, or am I being dense?

BR
Jan



On 2014/11/25, 8:15 PM, "Shyam"  wrote:

>On 11/25/2014 03:11 PM, Jan H Holtzhausen wrote:
>> STILL doesn’t work … exact same file ends up on 2 different bricks …
>> I must be missing something.
>> All I need is for:
>> /directory1/subdirectory2/foo
>> And
>> /directory2/subdirectoryaaa999/foo
>>
>>
>> To end up on the same brick….
>
>This is not possible is what I was attempting to state in the previous 
>mail. The regex filter is not for this purpose.
>
>The hash is always based on the name of the file, but the location is 
>based on the distribution/layout of the directory, which is different 
>for each directory based on its GFID.
>
>So there are no options in the code to enable what you seek at present.
>
>Why is this needed?
>
>Shyam

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Shyam

On 11/25/2014 03:11 PM, Jan H Holtzhausen wrote:

STILL doesn’t work … exact same file ends up on 2 different bricks …
I must be missing something.
All I need is for:
/directory1/subdirectory2/foo
And
/directory2/subdirectoryaaa999/foo


To end up on the same brick….


This is not possible is what I was attempting to state in the previous 
mail. The regex filter is not for this purpose.


The hash is always based on the name of the file, but the location is 
based on the distribution/layout of the directory, which is different 
for each directory based on its GFID.


So there are no options in the code to enable what you seek at present.

Why is this needed?

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Jan H Holtzhausen
STILL doesn’t work … exact same file ends up on 2 different bricks …
I must be missing something.
All I need is for:
/directory1/subdirectory2/foo
And 
/directory2/subdirectoryaaa999/foo


To end up on the same brick….

Jan




On 2014/11/25, 8:00 PM, "Jan H Holtzhausen"  wrote:

>Hmm
>Then something is wrong, 
>If I upload 2 identical files, with different paths they only end up on 
>the same server 1/4 of the time (I have 4 bricks).
>I’ll test the regex quickly.
>
>BR
>Jan
>
>
>
>
>On 2014/11/25, 7:55 PM, "Shyam"  wrote:
>
>>On 11/25/2014 02:28 PM, Jan H Holtzhausen wrote:
>>> I think I have it.
>>> Unless I’m totally confused, I can hash ONLY on the filename with:
>>>
>>> glusterfs --volfile-server=a_server --volfile-id=a_volume \
>>> --xlator-option a_volume-dht.extra_hash_regex='.*[/]' \
>>> /a/mountpoint
>>>
>>> Correct?
>>
>>The hash of a file does not include the full path, it is on the file 
>>name _only_. So any regex will not work when the filename remains 
>>constant like "myfile".
>>
>>As Jeff explains the option is really to prevent using temporary parts 
>>of the name in the hash computation (for rename optimization). In this 
>>case, you do not seem to have any tmp parts to the name, like "myfile" 
>>and "myfile~" should evaluate to the same hash, so remove all trailing 
>>'~' from the name.
>>
>>So I am not sure the above is the option you are looking for.
>>
>>>
>>> Jan
>>>
>>> From: Jan H Holtzhausen >><mailto:j...@holtztech.info>>
>>> Date: Tuesday 25 November 2014 at 9:06 PM
>>> To: mailto:gluster-devel@gluster.org>>
>>> Subject: Re: [Gluster-devel] EHT / DHT
>>>
>>>>Are you referring to something else in your request? Meaning, you want
>>>
>>>>/myfile, /dir1/myfile and /dir2/dir3/myfile to fall onto the same
>>>
>>>> bricks/subvolumes and that perchance is what you are looking for?
>>>
>>>
>>> That is EXACTLY whatI  am looking for.
>>>
>>> What are my chances?
>>
>>As far as I know not much out of the box. As Jeff explained, the 
>>directory distribution/layout considers the GFID of the directory, hence 
>>each of the directories in the above example would/could get different 
>>ranges.
>>
>>The file on the other hand remains constant "myfile" so its hash value 
>>remains the same, but due to the distribution range change as above for 
>>the directories, it will land on different bricks and not the same one.
>>
>>Out of curiosity, why is this functionality needed?
>>
>>Shyam
>>___
>>Gluster-devel mailing list
>>Gluster-devel@gluster.org
>>http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>___
>Gluster-devel mailing list
>Gluster-devel@gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Jan H Holtzhausen
Hmm
Then something is wrong, 
If I upload 2 identical files, with different paths they only end up on 
the same server 1/4 of the time (I have 4 bricks).
I’ll test the regex quickly.

BR
Jan




On 2014/11/25, 7:55 PM, "Shyam"  wrote:

>On 11/25/2014 02:28 PM, Jan H Holtzhausen wrote:
>> I think I have it.
>> Unless I’m totally confused, I can hash ONLY on the filename with:
>>
>> glusterfs --volfile-server=a_server --volfile-id=a_volume \
>> --xlator-option a_volume-dht.extra_hash_regex='.*[/]' \
>> /a/mountpoint
>>
>> Correct?
>
>The hash of a file does not include the full path, it is on the file 
>name _only_. So any regex will not work when the filename remains 
>constant like "myfile".
>
>As Jeff explains the option is really to prevent using temporary parts 
>of the name in the hash computation (for rename optimization). In this 
>case, you do not seem to have any tmp parts to the name, like "myfile" 
>and "myfile~" should evaluate to the same hash, so remove all trailing 
>'~' from the name.
>
>So I am not sure the above is the option you are looking for.
>
>>
>> Jan
>>
>> From: Jan H Holtzhausen ><mailto:j...@holtztech.info>>
>> Date: Tuesday 25 November 2014 at 9:06 PM
>> To: mailto:gluster-devel@gluster.org>>
>> Subject: Re: [Gluster-devel] EHT / DHT
>>
>>>Are you referring to something else in your request? Meaning, you want
>>
>>>/myfile, /dir1/myfile and /dir2/dir3/myfile to fall onto the same
>>
>>> bricks/subvolumes and that perchance is what you are looking for?
>>
>>
>> That is EXACTLY whatI  am looking for.
>>
>> What are my chances?
>
>As far as I know not much out of the box. As Jeff explained, the 
>directory distribution/layout considers the GFID of the directory, hence 
>each of the directories in the above example would/could get different 
>ranges.
>
>The file on the other hand remains constant "myfile" so its hash value 
>remains the same, but due to the distribution range change as above for 
>the directories, it will land on different bricks and not the same one.
>
>Out of curiosity, why is this functionality needed?
>
>Shyam
>___
>Gluster-devel mailing list
>Gluster-devel@gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Shyam

On 11/25/2014 02:28 PM, Jan H Holtzhausen wrote:

I think I have it.
Unless I’m totally confused, I can hash ONLY on the filename with:

glusterfs --volfile-server=a_server --volfile-id=a_volume \
--xlator-option a_volume-dht.extra_hash_regex='.*[/]' \
/a/mountpoint

Correct?


The hash of a file does not include the full path, it is on the file 
name _only_. So any regex will not work when the filename remains 
constant like "myfile".


As Jeff explains the option is really to prevent using temporary parts 
of the name in the hash computation (for rename optimization). In this 
case, you do not seem to have any tmp parts to the name, like "myfile" 
and "myfile~" should evaluate to the same hash, so remove all trailing 
'~' from the name.


So I am not sure the above is the option you are looking for.



Jan

From: Jan H Holtzhausen mailto:j...@holtztech.info>>
Date: Tuesday 25 November 2014 at 9:06 PM
To: mailto:gluster-devel@gluster.org>>
Subject: Re: [Gluster-devel] EHT / DHT


Are you referring to something else in your request? Meaning, you want



/myfile, /dir1/myfile and /dir2/dir3/myfile to fall onto the same



bricks/subvolumes and that perchance is what you are looking for?



That is EXACTLY whatI  am looking for.

What are my chances?


As far as I know not much out of the box. As Jeff explained, the 
directory distribution/layout considers the GFID of the directory, hence 
each of the directories in the above example would/could get different 
ranges.


The file on the other hand remains constant "myfile" so its hash value 
remains the same, but due to the distribution range change as above for 
the directories, it will land on different bricks and not the same one.


Out of curiosity, why is this functionality needed?

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Jan H Holtzhausen
I think I have it.
Unless I’m totally confused, I can hash ONLY on the filename with:
glusterfs --volfile-server=a_server --volfile-id=a_volume \
   --xlator-option a_volume-dht.extra_hash_regex='.*[/]' \
   /a/mountpoint
Correct?

Jan

From:  Jan H Holtzhausen 
Date:  Tuesday 25 November 2014 at 9:06 PM
To:  
Subject:  Re: [Gluster-devel] EHT / DHT

>Are you referring to something else in your request? Meaning, you want 
>/myfile, /dir1/myfile and /dir2/dir3/myfile to fall onto the same 
> bricks/subvolumes and that perchance is what you are looking for?

That is EXACTLY what I am looking for.
What are my chances?

BR
Jan
___ Gluster-devel mailing list 
Gluster-devel@gluster.org 
http://supercolony.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-25 Thread Jan H Holtzhausen
>Are you referring to something else in your request? Meaning, you want 
>/myfile, /dir1/myfile and /dir2/dir3/myfile to fall onto the same 
> bricks/subvolumes and that perchance is what you are looking for?

That is EXACTLY what I am looking for.
What are my chances?

BR
Jan

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-11 Thread Anand Avati
On Tue, Nov 11, 2014 at 1:56 PM, Jeff Darcy  wrote:

>  (Personally I would have
> done this by "mixing in" the parent GFID to the hash calculation, but
> that alternative was ignored.)
>

Actually when DHT was implemented, the concept of GFID did not (yet) exist.
Due to backward compatibility it has just remained this way even later.
Including the GFID into the hash has benefits.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-11 Thread Jeff Darcy
> > I was wondering, is there a way to change / parameter to pass to
> > clusters DHT to change the distribution algorithm to only take into
> > account filename and not the preceding filesystem path?
> > i.e when a file is at: /mount/gluster/directory/filename.ext
> > To only hash on “filename.ext” ?
> 
> Currently DHT hashes the file name and not the entire path.
> 
> See, callers of dht_hash_compute in source (pretty much
> dht_layout_search) to which loc->name is passed, which is the file name
> and not the entire path.

While that is true, there are a couple of caveats.  First, the hash is
based on the file name (last path component) but the *distribution* for
each directory (what we call a layout) is modified based on the
directory GFID.  This prevents the same file name in different
directories always hashing to the same brick.  (Personally I would have
done this by "mixing in" the parent GFID to the hash calculation, but
that alternative was ignored.)

Secondly, there is a way to modify the hashing.  If you set the
"cluster.extra-hash-regex" option on a volume, that regular expression
will be used to "pick apart" the file name into a part that's used for
hashing and a part that's ignored.  Consider the case of rsync, which
for a file XXX will create a temporary file .XXX.123456 and rely on the
semantics of rename(2) to move it into place only after it's fully
written.  The "rsync-hash-regex" is already set up to remove the leading
"." and trailing ".123456" so that "XXX" is again the effective name for
hashing/distribution purposes.  This allows the later rename to be done
on one brick every time, which improves performance significantly.  With
"extra-hash-regex" you can do the same thing for a second app, without
affecting the rsync behavior.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] EHT / DHT

2014-11-11 Thread Shyam

On 11/11/2014 03:51 AM, Jan Holtzhausen wrote:

Hi
I was wondering, is there a way to change / parameter to pass to
clusters DHT to change the distribution algorithm to only take into
account filename and not the preceding filesystem path?
i.e when a file is at: /mount/gluster/directory/filename.ext
To only hash on “filename.ext” ?


Currently DHT hashes the file name and not the entire path.

See, callers of dht_hash_compute in source (pretty much 
dht_layout_search) to which loc->name is passed, which is the file name 
and not the entire path.


Are you referring to something else in your request? Meaning, you want 
/myfile, /dir1/myfile and /dir2/dir3/myfile to fall onto the same 
bricks/subvolumes and that perchance is what you are looking for?


Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] EHT / DHT

2014-11-11 Thread Jan Holtzhausen
Hi
I was wondering, is there a way to change / parameter to pass to clusters DHT 
to change the distribution algorithm to only take into account filename and not 
the preceding filesystem path?
i.e when a file is at: /mount/gluster/directory/filename.ext
To only hash on “filename.ext” ?

Best Regards
Jan Holtzhausen
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel