Re: Status FUSE-Support of HDFS

2008-11-06 Thread Pete Wyckoff

Have not used it with rsync, but do remember, fuse-dfs will return an EIO for 
non-sequential writes.  No unit test for this yet, but there probably should be 
:)


On 11/4/08 9:07 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:



Thanks! This is good news. So it's fast enough for our purposes if it
turns out to be the same order of magnitude on our systems.

Have you used this with rsync? If so, any known issues with that
(reading or writing)?

Thanks in advance,

Robert


Pete Wyckoff wrote:
> Reads are 20-30% slower
> Writes are 33% slower before 
> https://issues.apache.org/jira/browse/HADOOP-3805 - You need a kernel > 
> 2.6.26-rc* to test 3805, which I don't have :(
>
> These #s are with hadoop 0.17 and the 0.18.2 version of fuse-dfs.
>
> -- pete
>
>
> On 11/2/08 6:23 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:
>
>
>
> Hi Pete,
>
> thanks for the info. That helps a lot. We will probably test it for our
> use cases then. Did you benchmark throughput when reading writing files
> through fuse-dfs and compared it to command line tool or API access? Is
> there a notable difference?
>
> Thanks again,
>
> Robert
>
>
>
> Pete Wyckoff wrote:
>> It has come a long way since 0.18 and facebook keeps our (0.17) dfs mounted 
>> via fuse and uses that for some operations.
>>
>> There have recently been some problems with fuse-dfs when used in a 
>> multithreaded environment, but those have been fixed in 0.18.2 and 0.19. (do 
>> not use 0.18 or 0.18.1)
>>
>> The current (known) issues are:
>>   1. Wrong semantics when copying over an existing file - namely it does a 
>> delete and then re-creates the file, so ownership/permissions may end up 
>> wrong. There is a patch for this.
>>   2. When directories have 10s of thousands of files, performance can be 
>> very poor.
>>   3. Posix truncate is supported only for truncating it to 0 size since hdfs 
>> doesn't support truncate.
>>   4. Appends are not supported - this is a libhdfs problem and there is a 
>> patch for it.
>>
>> It is still a pre-1.0 product for sure, but it has been pretty stable for us.
>>
>>
>> -- pete
>>
>>
>> On 10/31/08 9:08 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>> Hi,
>>
>> could anyone tell me what the current Status of FUSE support for HDFS
>> is? Is this something that can be expected to be usable in a few
>> weeks/months in a production environment? We have been really
>> happy/successful with HDFS in our production system. However, some
>> software we use in our application simply requires an OS-Level file
>> system which currently requires us to do a lot of copying between HDFS
>> and a regular file system for processes which require that software and
>> FUSE support would really eliminate that one disadvantage we have with
>> HDFS. We wouldn't even require the performance of that to be outstanding
>> because just by eliminatimng the copy step, we would greatly increase
>> the thruput of those processes.
>>
>> Thanks for sharing any thoughts on this.
>>
>> Regards,
>>
>> Robert
>>
>>
>>
>
>
>
>





Re: Status FUSE-Support of HDFS

2008-11-04 Thread Brian Bockelman

Hey Robert,

I would chime in saying that our usage of FUSE results in a network  
transfer rate of about 30MB/s, and it does not seem to be a limiting  
factor (right now, we're CPU bound).


In our (limited) tests, we've achieved 80Gbps of reads in our cluster  
overall.  This did not appear to push the limits of FUSE or Hadoop.


Since we've applied the patches (which are in 0.18.2 by default), we  
haven't had any corruption issues.  Our application has rather heavy- 
handed internal file checksums, and the jobs would crash immediately  
if they were reading in garbage.


Brian

On Nov 4, 2008, at 10:07 AM, Robert Krüger wrote:



Thanks! This is good news. So it's fast enough for our purposes if it
turns out to be the same order of magnitude on our systems.

Have you used this with rsync? If so, any known issues with that
(reading or writing)?

Thanks in advance,

Robert


Pete Wyckoff wrote:

Reads are 20-30% slower
Writes are 33% slower before https://issues.apache.org/jira/browse/HADOOP-3805 
 - You need a kernel > 2.6.26-rc* to test 3805, which I don't have :(


These #s are with hadoop 0.17 and the 0.18.2 version of fuse-dfs.

-- pete


On 11/2/08 6:23 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:



Hi Pete,

thanks for the info. That helps a lot. We will probably test it for  
our
use cases then. Did you benchmark throughput when reading writing  
files
through fuse-dfs and compared it to command line tool or API  
access? Is

there a notable difference?

Thanks again,

Robert



Pete Wyckoff wrote:
It has come a long way since 0.18 and facebook keeps our (0.17)  
dfs mounted via fuse and uses that for some operations.


There have recently been some problems with fuse-dfs when used in  
a multithreaded environment, but those have been fixed in 0.18.2  
and 0.19. (do not use 0.18 or 0.18.1)


The current (known) issues are:
 1. Wrong semantics when copying over an existing file - namely it  
does a delete and then re-creates the file, so ownership/ 
permissions may end up wrong. There is a patch for this.
 2. When directories have 10s of thousands of files, performance  
can be very poor.
 3. Posix truncate is supported only for truncating it to 0 size  
since hdfs doesn't support truncate.
 4. Appends are not supported - this is a libhdfs problem and  
there is a patch for it.


It is still a pre-1.0 product for sure, but it has been pretty  
stable for us.



-- pete


On 10/31/08 9:08 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:



Hi,

could anyone tell me what the current Status of FUSE support for  
HDFS

is? Is this something that can be expected to be usable in a few
weeks/months in a production environment? We have been really
happy/successful with HDFS in our production system. However, some
software we use in our application simply requires an OS-Level file
system which currently requires us to do a lot of copying between  
HDFS
and a regular file system for processes which require that  
software and
FUSE support would really eliminate that one disadvantage we have  
with
HDFS. We wouldn't even require the performance of that to be  
outstanding
because just by eliminatimng the copy step, we would greatly  
increase

the thruput of those processes.

Thanks for sharing any thoughts on this.

Regards,

Robert












Re: Status FUSE-Support of HDFS

2008-11-04 Thread Robert Krüger

Thanks! This is good news. So it's fast enough for our purposes if it
turns out to be the same order of magnitude on our systems.

Have you used this with rsync? If so, any known issues with that
(reading or writing)?

Thanks in advance,

Robert


Pete Wyckoff wrote:
> Reads are 20-30% slower
> Writes are 33% slower before 
> https://issues.apache.org/jira/browse/HADOOP-3805 - You need a kernel > 
> 2.6.26-rc* to test 3805, which I don't have :(
> 
> These #s are with hadoop 0.17 and the 0.18.2 version of fuse-dfs.
> 
> -- pete
> 
> 
> On 11/2/08 6:23 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:
> 
> 
> 
> Hi Pete,
> 
> thanks for the info. That helps a lot. We will probably test it for our
> use cases then. Did you benchmark throughput when reading writing files
> through fuse-dfs and compared it to command line tool or API access? Is
> there a notable difference?
> 
> Thanks again,
> 
> Robert
> 
> 
> 
> Pete Wyckoff wrote:
>> It has come a long way since 0.18 and facebook keeps our (0.17) dfs mounted 
>> via fuse and uses that for some operations.
>>
>> There have recently been some problems with fuse-dfs when used in a 
>> multithreaded environment, but those have been fixed in 0.18.2 and 0.19. (do 
>> not use 0.18 or 0.18.1)
>>
>> The current (known) issues are:
>>   1. Wrong semantics when copying over an existing file - namely it does a 
>> delete and then re-creates the file, so ownership/permissions may end up 
>> wrong. There is a patch for this.
>>   2. When directories have 10s of thousands of files, performance can be 
>> very poor.
>>   3. Posix truncate is supported only for truncating it to 0 size since hdfs 
>> doesn't support truncate.
>>   4. Appends are not supported - this is a libhdfs problem and there is a 
>> patch for it.
>>
>> It is still a pre-1.0 product for sure, but it has been pretty stable for us.
>>
>>
>> -- pete
>>
>>
>> On 10/31/08 9:08 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>> Hi,
>>
>> could anyone tell me what the current Status of FUSE support for HDFS
>> is? Is this something that can be expected to be usable in a few
>> weeks/months in a production environment? We have been really
>> happy/successful with HDFS in our production system. However, some
>> software we use in our application simply requires an OS-Level file
>> system which currently requires us to do a lot of copying between HDFS
>> and a regular file system for processes which require that software and
>> FUSE support would really eliminate that one disadvantage we have with
>> HDFS. We wouldn't even require the performance of that to be outstanding
>> because just by eliminatimng the copy step, we would greatly increase
>> the thruput of those processes.
>>
>> Thanks for sharing any thoughts on this.
>>
>> Regards,
>>
>> Robert
>>
>>
>>
> 
> 
> 
> 



Re: Status FUSE-Support of HDFS

2008-11-03 Thread Pete Wyckoff

+1 but since hadoop deals well with such directories currently, fuse-dfs will 
basically lock up on such directories - this is because ls -color=blah causes a 
stat on every file in a directory.  There is a JIRA open for this but it is a 
pretty rare case although it has happened to me at facebook.

-- pete


>It's good for a portable application to
keep the #of files/directory low by having two levels of directory for
storing files -just use a hash operation to determine which dir to store
a specific file in.


On 11/3/08 4:00 AM, "Steve Loughran" <[EMAIL PROTECTED]> wrote:

Pete Wyckoff wrote:
> It has come a long way since 0.18 and facebook keeps our (0.17) dfs mounted 
> via fuse and uses that for some operations.
>
> There have recently been some problems with fuse-dfs when used in a 
> multithreaded environment, but those have been fixed in 0.18.2 and 0.19. (do 
> not use 0.18 or 0.18.1)
>
> The current (known) issues are:

>   2. When directories have 10s of thousands of files, performance can be very 
> poor.

I've known other filesystems to top out at 64k-1 files per directory,
even if they don't slow down. It's good for a portable application to
keep the #of files/directory low by having two levels of directory for
storing files -just use a hash operation to determine which dir to store
a specific file in.




Re: Status FUSE-Support of HDFS

2008-11-03 Thread Pete Wyckoff

Reads are 20-30% slower
Writes are 33% slower before https://issues.apache.org/jira/browse/HADOOP-3805 
- You need a kernel > 2.6.26-rc* to test 3805, which I don't have :(

These #s are with hadoop 0.17 and the 0.18.2 version of fuse-dfs.

-- pete


On 11/2/08 6:23 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:



Hi Pete,

thanks for the info. That helps a lot. We will probably test it for our
use cases then. Did you benchmark throughput when reading writing files
through fuse-dfs and compared it to command line tool or API access? Is
there a notable difference?

Thanks again,

Robert



Pete Wyckoff wrote:
> It has come a long way since 0.18 and facebook keeps our (0.17) dfs mounted 
> via fuse and uses that for some operations.
>
> There have recently been some problems with fuse-dfs when used in a 
> multithreaded environment, but those have been fixed in 0.18.2 and 0.19. (do 
> not use 0.18 or 0.18.1)
>
> The current (known) issues are:
>   1. Wrong semantics when copying over an existing file - namely it does a 
> delete and then re-creates the file, so ownership/permissions may end up 
> wrong. There is a patch for this.
>   2. When directories have 10s of thousands of files, performance can be very 
> poor.
>   3. Posix truncate is supported only for truncating it to 0 size since hdfs 
> doesn't support truncate.
>   4. Appends are not supported - this is a libhdfs problem and there is a 
> patch for it.
>
> It is still a pre-1.0 product for sure, but it has been pretty stable for us.
>
>
> -- pete
>
>
> On 10/31/08 9:08 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:
>
>
>
> Hi,
>
> could anyone tell me what the current Status of FUSE support for HDFS
> is? Is this something that can be expected to be usable in a few
> weeks/months in a production environment? We have been really
> happy/successful with HDFS in our production system. However, some
> software we use in our application simply requires an OS-Level file
> system which currently requires us to do a lot of copying between HDFS
> and a regular file system for processes which require that software and
> FUSE support would really eliminate that one disadvantage we have with
> HDFS. We wouldn't even require the performance of that to be outstanding
> because just by eliminatimng the copy step, we would greatly increase
> the thruput of those processes.
>
> Thanks for sharing any thoughts on this.
>
> Regards,
>
> Robert
>
>
>





Re: Status FUSE-Support of HDFS

2008-11-03 Thread Steve Loughran

Pete Wyckoff wrote:

It has come a long way since 0.18 and facebook keeps our (0.17) dfs mounted via 
fuse and uses that for some operations.

There have recently been some problems with fuse-dfs when used in a 
multithreaded environment, but those have been fixed in 0.18.2 and 0.19. (do 
not use 0.18 or 0.18.1)

The current (known) issues are:



  2. When directories have 10s of thousands of files, performance can be very 
poor.


I've known other filesystems to top out at 64k-1 files per directory, 
even if they don't slow down. It's good for a portable application to 
keep the #of files/directory low by having two levels of directory for 
storing files -just use a hash operation to determine which dir to store 
a specific file in.


Re: Status FUSE-Support of HDFS

2008-11-02 Thread Robert Krüger

Hi Pete,

thanks for the info. That helps a lot. We will probably test it for our
use cases then. Did you benchmark throughput when reading writing files
through fuse-dfs and compared it to command line tool or API access? Is
there a notable difference?

Thanks again,

Robert



Pete Wyckoff wrote:
> It has come a long way since 0.18 and facebook keeps our (0.17) dfs mounted 
> via fuse and uses that for some operations.
> 
> There have recently been some problems with fuse-dfs when used in a 
> multithreaded environment, but those have been fixed in 0.18.2 and 0.19. (do 
> not use 0.18 or 0.18.1)
> 
> The current (known) issues are:
>   1. Wrong semantics when copying over an existing file - namely it does a 
> delete and then re-creates the file, so ownership/permissions may end up 
> wrong. There is a patch for this.
>   2. When directories have 10s of thousands of files, performance can be very 
> poor.
>   3. Posix truncate is supported only for truncating it to 0 size since hdfs 
> doesn't support truncate.
>   4. Appends are not supported - this is a libhdfs problem and there is a 
> patch for it.
> 
> It is still a pre-1.0 product for sure, but it has been pretty stable for us.
> 
> 
> -- pete
> 
> 
> On 10/31/08 9:08 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:
> 
> 
> 
> Hi,
> 
> could anyone tell me what the current Status of FUSE support for HDFS
> is? Is this something that can be expected to be usable in a few
> weeks/months in a production environment? We have been really
> happy/successful with HDFS in our production system. However, some
> software we use in our application simply requires an OS-Level file
> system which currently requires us to do a lot of copying between HDFS
> and a regular file system for processes which require that software and
> FUSE support would really eliminate that one disadvantage we have with
> HDFS. We wouldn't even require the performance of that to be outstanding
> because just by eliminatimng the copy step, we would greatly increase
> the thruput of those processes.
> 
> Thanks for sharing any thoughts on this.
> 
> Regards,
> 
> Robert
> 
> 
> 



Re: Status FUSE-Support of HDFS

2008-10-31 Thread Pete Wyckoff

It has come a long way since 0.18 and facebook keeps our (0.17) dfs mounted via 
fuse and uses that for some operations.

There have recently been some problems with fuse-dfs when used in a 
multithreaded environment, but those have been fixed in 0.18.2 and 0.19. (do 
not use 0.18 or 0.18.1)

The current (known) issues are:
  1. Wrong semantics when copying over an existing file - namely it does a 
delete and then re-creates the file, so ownership/permissions may end up wrong. 
There is a patch for this.
  2. When directories have 10s of thousands of files, performance can be very 
poor.
  3. Posix truncate is supported only for truncating it to 0 size since hdfs 
doesn't support truncate.
  4. Appends are not supported - this is a libhdfs problem and there is a patch 
for it.

It is still a pre-1.0 product for sure, but it has been pretty stable for us.


-- pete


On 10/31/08 9:08 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:



Hi,

could anyone tell me what the current Status of FUSE support for HDFS
is? Is this something that can be expected to be usable in a few
weeks/months in a production environment? We have been really
happy/successful with HDFS in our production system. However, some
software we use in our application simply requires an OS-Level file
system which currently requires us to do a lot of copying between HDFS
and a regular file system for processes which require that software and
FUSE support would really eliminate that one disadvantage we have with
HDFS. We wouldn't even require the performance of that to be outstanding
because just by eliminatimng the copy step, we would greatly increase
the thruput of those processes.

Thanks for sharing any thoughts on this.

Regards,

Robert




Status FUSE-Support of HDFS

2008-10-31 Thread Robert Krüger

Hi,

could anyone tell me what the current Status of FUSE support for HDFS
is? Is this something that can be expected to be usable in a few
weeks/months in a production environment? We have been really
happy/successful with HDFS in our production system. However, some
software we use in our application simply requires an OS-Level file
system which currently requires us to do a lot of copying between HDFS
and a regular file system for processes which require that software and
FUSE support would really eliminate that one disadvantage we have with
HDFS. We wouldn't even require the performance of that to be outstanding
because just by eliminatimng the copy step, we would greatly increase
the thruput of those processes.

Thanks for sharing any thoughts on this.

Regards,

Robert