Re: About fuse-dfs and NFS

2009-06-26 Thread Brian Bockelman

Hey Chris,

FUSE in general does not support NFS mounts well because it has a  
tendency to renumber inodes upon NFS restart, which causes clients to  
choke.


FUSE-DFS supports a limited range of write operations; it's possible  
that your application is trying to use write functionality that is not  
supported.


Brian

On Jun 26, 2009, at 2:57 AM, XuChris wrote:




Hi,

I mount hdfs to a directory of localhost by fuse-dfs, and then  
export the directory.

When access  the directory by NFS, I can read data of the directory,
but cannot write data to the directory. Why?
Now I want to know does fuse-dfs support data-writing operation by  
NFS or not?

Who can help me? Thank you very much.

My system configure:
OS: Fedora release 8(kernel 2.6.23.1)
 For NFS, the version of fuse module has updated to 2.7.4
Fuse:2.7.4
Hadoop:0.19.1

Best regards.

Chris
2009-6-26
_
打工,挣钱,买房子,快来MClub一起”金屋藏娇”!
http://club.msn.cn/?from=10




About fuse-dfs and NFS

2009-06-26 Thread XuChris


Hi,
 
I mount hdfs to a directory of localhost by fuse-dfs, and then export the 
directory. 
When access  the directory by NFS, I can read data of the directory,
but cannot write data to the directory. Why? 
Now I want to know does fuse-dfs support data-writing operation by NFS or not?
Who can help me? Thank you very much. 
 
My system configure:
OS: Fedora release 8(kernel 2.6.23.1)
  For NFS, the version of fuse module has updated to 2.7.4
Fuse:2.7.4
Hadoop:0.19.1
 
Best regards.
 
Chris 
2009-6-26
_
打工,挣钱,买房子,快来MClub一起”金屋藏娇”!
http://club.msn.cn/?from=10

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-16 Thread Brian Bockelman

Hey Tom,

Yup, that's one of the things I've been looking at - however, it  
doesn't appear to be the likely culprit as to why data access is  
fairly random.  The time the operation took does not seem to be a  
factor of the number of bytes read, at least in the smaller range.


Brian

On Apr 16, 2009, at 5:17 AM, Tom White wrote:


Not sure if will affect your findings, but when you read from a
FSDataInputStream you should see how many bytes were actually read by
inspecting the return value and re-read if it was fewer than you want.
See Hadoop's IOUtils readFully() method.

Tom

On Mon, Apr 13, 2009 at 4:22 PM, Brian Bockelman  
 wrote:


Hey Todd,

Been playing more this morning after thinking about it for the  
night -- I
think the culprit is not the network, but actually the cache.   
Here's the
output of your script adjusted to do the same calls as I was doing  
(you had

left out the random I/O part).

[br...@red tmp]$ java hdfs_tester
Mean value for reads of size 0: 0.0447
Mean value for reads of size 16384: 10.4872
Mean value for reads of size 32768: 10.82925
Mean value for reads of size 49152: 6.2417
Mean value for reads of size 65536: 7.0511003
Mean value for reads of size 81920: 9.411599
Mean value for reads of size 98304: 9.378799
Mean value for reads of size 114688: 8.99065
Mean value for reads of size 131072: 5.1378503
Mean value for reads of size 147456: 6.1324
Mean value for reads of size 163840: 17.1187
Mean value for reads of size 180224: 6.5492
Mean value for reads of size 196608: 8.45695
Mean value for reads of size 212992: 7.4292
Mean value for reads of size 229376: 10.7843
Mean value for reads of size 245760: 9.29095
Mean value for reads of size 262144: 6.57865

Copy of the script below.

So, without the FUSE layer, we don't see much (if any) patterns  
here.  The
overhead of randomly skipping through the file is higher than the  
overhead

of reading out the data.

Upon further inspection, the biggest factor affecting the FUSE  
layer is
actually the Linux VFS caching -- if you notice, the bandwidth in  
the given
graph for larger read sizes is *higher* than 1Gbps, which is the  
limit of
the network on that particular node.  If I go in the opposite  
direction -
starting with the largest reads first, then going down to the  
smallest
reads, the graph entirely smooths out for the small values -  
everything is

read from the filesystem cache in the client RAM.  Graph attached.

So, on the upside, mounting through FUSE gives us the opportunity  
to speed
up reads for very complex, non-sequential patterns - for free,  
thanks to the
hardworking Linux kernel.  On the downside, it's incredibly  
difficult to
come up with simple cases to demonstrate performance for an  
application --
the cache performance and size depends on how much activity there's  
on the
client, the previous file system activity that the application did,  
and the
amount of concurrent activity on the server.  I can give you  
results for
performance, but it's not going to be the performance you see in  
real life.

 (Gee, if only file systems were easy...)

Ok, sorry for the list noise -- it seems I'm going to have to think  
more

about this problem before I can come up with something coherent.

Brian





import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
import java.net.URI;
import java.util.Random;

public class hdfs_tester {
 public static void main(String[] args) throws Exception {
  URI uri = new URI("hdfs://hadoop-name:9000/");
  FileSystem fs = FileSystem.get(uri, new Configuration());
  Path path = new
Path("/user/uscms01/pnfs/unl.edu/data4/cms/store/phedex_monarctest/ 
Nebraska/LoadTest07_Nebraska_33");

  FSDataInputStream dis = fs.open(path);
  Random rand = new Random();
  FileStatus status = fs.getFileStatus(path);
  long file_len = status.getLen();
  int iters = 20;
  for (int size=0; size < 1024*1024; size += 4*4096) {
long csum = 0;
for (int i = 0; i < iters; i++) {
  int pos = rand.nextInt((int)((file_len-size-1)/8))*8;
  byte buf[] = new byte[size];
  if (pos < 0)
pos = 0;
  long st = System.nanoTime();
  dis.read(pos, buf, 0, size);
  long et = System.nanoTime();
  csum += et-st;
  //System.out.println(String.valueOf(size) + "\t" +  
String.valueOf(pos)

+ "\t" + String.valueOf(et - st));
}
float csum2 = csum; csum2 /= iters;
System.out.println("Mean value for reads of size " + size + ":  
" +

(csum2/1000/1000));
  }
  fs.close();
 }
}


On Apr 13, 2009, at 3:14 AM, Todd Lipcon wrote:

On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon   
wrote:



Hey Brian,

This is really interesting stuff. I'm curious - have you tried  
these same

experiments using the Java API? I'm wondering whether this is
FUSE-specific
or inherent to all HDFS reads. I'll try to reprodu

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-16 Thread Tom White
Not sure if will affect your findings, but when you read from a
FSDataInputStream you should see how many bytes were actually read by
inspecting the return value and re-read if it was fewer than you want.
See Hadoop's IOUtils readFully() method.

Tom

On Mon, Apr 13, 2009 at 4:22 PM, Brian Bockelman  wrote:
>
> Hey Todd,
>
> Been playing more this morning after thinking about it for the night -- I
> think the culprit is not the network, but actually the cache.  Here's the
> output of your script adjusted to do the same calls as I was doing (you had
> left out the random I/O part).
>
> [br...@red tmp]$ java hdfs_tester
> Mean value for reads of size 0: 0.0447
> Mean value for reads of size 16384: 10.4872
> Mean value for reads of size 32768: 10.82925
> Mean value for reads of size 49152: 6.2417
> Mean value for reads of size 65536: 7.0511003
> Mean value for reads of size 81920: 9.411599
> Mean value for reads of size 98304: 9.378799
> Mean value for reads of size 114688: 8.99065
> Mean value for reads of size 131072: 5.1378503
> Mean value for reads of size 147456: 6.1324
> Mean value for reads of size 163840: 17.1187
> Mean value for reads of size 180224: 6.5492
> Mean value for reads of size 196608: 8.45695
> Mean value for reads of size 212992: 7.4292
> Mean value for reads of size 229376: 10.7843
> Mean value for reads of size 245760: 9.29095
> Mean value for reads of size 262144: 6.57865
>
> Copy of the script below.
>
> So, without the FUSE layer, we don't see much (if any) patterns here.  The
> overhead of randomly skipping through the file is higher than the overhead
> of reading out the data.
>
> Upon further inspection, the biggest factor affecting the FUSE layer is
> actually the Linux VFS caching -- if you notice, the bandwidth in the given
> graph for larger read sizes is *higher* than 1Gbps, which is the limit of
> the network on that particular node.  If I go in the opposite direction -
> starting with the largest reads first, then going down to the smallest
> reads, the graph entirely smooths out for the small values - everything is
> read from the filesystem cache in the client RAM.  Graph attached.
>
> So, on the upside, mounting through FUSE gives us the opportunity to speed
> up reads for very complex, non-sequential patterns - for free, thanks to the
> hardworking Linux kernel.  On the downside, it's incredibly difficult to
> come up with simple cases to demonstrate performance for an application --
> the cache performance and size depends on how much activity there's on the
> client, the previous file system activity that the application did, and the
> amount of concurrent activity on the server.  I can give you results for
> performance, but it's not going to be the performance you see in real life.
>  (Gee, if only file systems were easy...)
>
> Ok, sorry for the list noise -- it seems I'm going to have to think more
> about this problem before I can come up with something coherent.
>
> Brian
>
>
>
>
>
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FileStatus;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.FSDataInputStream;
> import org.apache.hadoop.conf.Configuration;
> import java.io.IOException;
> import java.net.URI;
> import java.util.Random;
>
> public class hdfs_tester {
>  public static void main(String[] args) throws Exception {
>   URI uri = new URI("hdfs://hadoop-name:9000/");
>   FileSystem fs = FileSystem.get(uri, new Configuration());
>   Path path = new
> Path("/user/uscms01/pnfs/unl.edu/data4/cms/store/phedex_monarctest/Nebraska/LoadTest07_Nebraska_33");
>   FSDataInputStream dis = fs.open(path);
>   Random rand = new Random();
>   FileStatus status = fs.getFileStatus(path);
>   long file_len = status.getLen();
>   int iters = 20;
>   for (int size=0; size < 1024*1024; size += 4*4096) {
>     long csum = 0;
>     for (int i = 0; i < iters; i++) {
>       int pos = rand.nextInt((int)((file_len-size-1)/8))*8;
>       byte buf[] = new byte[size];
>       if (pos < 0)
>         pos = 0;
>       long st = System.nanoTime();
>       dis.read(pos, buf, 0, size);
>       long et = System.nanoTime();
>       csum += et-st;
>       //System.out.println(String.valueOf(size) + "\t" + String.valueOf(pos)
> + "\t" + String.valueOf(et - st));
>     }
>     float csum2 = csum; csum2 /= iters;
>     System.out.println("Mean value for reads of size " + size + ": " +
> (csum2/1000/1000));
>   }
>   fs.close();
>  }
> }
>
>
> On Apr 13, 2009, at 3:14 AM, Todd Lipcon wrote:
>
>> On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon  wrote:
>>
>>> Hey Brian,
>>>
>>> This is really interesting stuff. I'm curious - have you tried these same
>>> experiments using the Java API? I'm wondering whether this is
>>> FUSE-specific
>>> or inherent to all HDFS reads. I'll try to reproduce this over here as
>>> well.
>>>
>>> This smells sort of nagle-related to me... if you get a chance, you may
>>> want to edit DFSClient.java and change TCP_WINDOW_SIZE to 256 * 10

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-14 Thread jason hadoop
Oh I agree caching, is wonderful when you plan to re-use the data in the
near term.

Solaris has an interesting feature, if the application writes enough
contiguous data, in a short time window, (tunable in later nevada builds),
solaris bypasses the buffer cache for the writes.

For reasons I have never had time to look into, there is a significant
impact on overall system responsiveness when there is significant cache
store activity going on, and there are patterns that work in the general
case but fail in others, the tar example from earlier, it is my theory that
the blocks written to the tar file, take priority over the read ahead, and
so the next file to be read for the tar archive are not pre-cached. Using
the cache flush on the tar file, allows the read aheads to go ahead.
The other nice thing that happens is that the size of the dirty pool tends
not to grow to to the point that the periodic sync operations pause the
system.

We had an interesting problem with solaris under vmware some years back,
where we were running IMAP servers as part of JES for testing a middleware
mail application, The IMAP writes would accumulate in the buffer cache, and
performace would be wonderful, and the middle ware performace was great,
then the must flush now threshold would be crossed and it would take 2
minutes to flush all of the accumulated writes out, and the middle ware app
would block waiting on that to finish. In the end as a quick hack, we did
the following *while true; do sync; sleep 30; done*, which prevented the
stalls as it kept the flush time down. The flushes totally fill the disk
queues and will cause starvation for other apps.

I believe this is part of the block report stall problem in 4584.

On Tue, Apr 14, 2009 at 4:52 AM, Brian Bockelman wrote:

> Hey Jason,
>
> Thanks, I'll keep this on hand as I do more tests.  I now have a C, Java,
> and python version of my testing program ;)
>
> However, I particularly *like* the fact that there's caching going on -
> it'll help out our application immensely, I think.  I'll be looking at the
> performance both with and without the cache.
>
> Brian
>
>
> On Apr 14, 2009, at 12:01 AM, jason hadoop wrote:
>
>  The following very simple program will tell the VM to drop the pages being
>> cached for a file. I tend to spin this in a for loop when making large tar
>> files, or otherwise working with large files, and the system performance
>> really smooths out.
>> Since it use open(path) it will churn through the inode cache and
>> directories.
>> Something like this might actually significantly speed up HDFS by running
>> over the blocks on the datanodes, for busy clusters.
>>
>>
>> #define _XOPEN_SOURCE 600
>> #define _GNU_SOURCE
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> /** Simple program to dump buffered data for specific files from the
>> buffer
>> cache. Copyright Jason Venner 2009, License GPL*/
>>
>> int main( int argc, char** argv )
>> {
>>  int failCount = 0;
>>  int i;
>>  for( i = 1; i < argc; i++ ) {
>>   char* file = argv[i];
>>   int fd = open( file, O_RDONLY|O_LARGEFILE );
>>   if (fd == -1) {
>> perror( file );
>> failCount++;
>> continue;
>>   }
>>   if (posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED )!=0) {
>> fprintf( stderr, "Failed to flush cache for %s %s\n", argv[optind],
>> strerror( posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED ) ) );
>> failCount++;
>>   }
>>   close(fd);
>>  }
>>  exit( failCount );
>> }
>>
>>
>> On Mon, Apr 13, 2009 at 4:01 PM, Scott Carey > >wrote:
>>
>>
>>> On 4/12/09 9:41 PM, "Brian Bockelman"  wrote:
>>>
>>>  Ok, here's something perhaps even more strange.  I removed the "seek"
 part out of my timings, so I was only timing the "read" instead of the
 "seek + read" as in the first case.  I also turned the read-ahead down
 to 1-byte (aka, off).

 The jump *always* occurs at 128KB, exactly.

>>>
>>> Some random ideas:
>>>
>>> I have no idea how FUSE interops with the Linux block layer, but 128K
>>> happens to be the default 'readahead' value for block devices, which may
>>> just be a coincidence.
>>>
>>> For a disk 'sda', you check and set the value (in 512 byte blocks) with:
>>>
>>> /sbin/blockdev --getra /dev/sda
>>> /sbin/blockdev --setra [num blocks] /dev/sda
>>>
>>>
>>> I know on my file system tests, the OS readahead is not activated until a
>>> series of sequential reads go through the block device, so truly random
>>> access is not affected by this.  I've set it to 128MB and random iops
>>> does
>>> not change on a ext3 or xfs file system.  If this applies to FUSE too,
>>> there
>>> may be reasons that this behavior differs.
>>> Furthermore, one would not expect it to be slower to randomly read 4k
>>> than
>>> randomly read up to the readahead size itself even if it did.
>>>
>>> I also have no idea how much of the OS device queue and block device
>>> scheduler is involved with FUSE.  If those are involved, then there's 

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-14 Thread Brian Bockelman

Hey Jason,

Thanks, I'll keep this on hand as I do more tests.  I now have a C,  
Java, and python version of my testing program ;)


However, I particularly *like* the fact that there's caching going on  
- it'll help out our application immensely, I think.  I'll be looking  
at the performance both with and without the cache.


Brian

On Apr 14, 2009, at 12:01 AM, jason hadoop wrote:

The following very simple program will tell the VM to drop the pages  
being
cached for a file. I tend to spin this in a for loop when making  
large tar
files, or otherwise working with large files, and the system  
performance

really smooths out.
Since it use open(path) it will churn through the inode cache and
directories.
Something like this might actually significantly speed up HDFS by  
running

over the blocks on the datanodes, for busy clusters.


#define _XOPEN_SOURCE 600
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/** Simple program to dump buffered data for specific files from the  
buffer

cache. Copyright Jason Venner 2009, License GPL*/

int main( int argc, char** argv )
{
 int failCount = 0;
 int i;
 for( i = 1; i < argc; i++ ) {
   char* file = argv[i];
   int fd = open( file, O_RDONLY|O_LARGEFILE );
   if (fd == -1) {
 perror( file );
 failCount++;
 continue;
   }
   if (posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED )!=0) {
 fprintf( stderr, "Failed to flush cache for %s %s\n",  
argv[optind],

strerror( posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED ) ) );
 failCount++;
   }
   close(fd);
 }
 exit( failCount );
}


On Mon, Apr 13, 2009 at 4:01 PM, Scott Carey  
wrote:




On 4/12/09 9:41 PM, "Brian Bockelman"  wrote:

Ok, here's something perhaps even more strange.  I removed the  
"seek"
part out of my timings, so I was only timing the "read" instead of  
the
"seek + read" as in the first case.  I also turned the read-ahead  
down

to 1-byte (aka, off).

The jump *always* occurs at 128KB, exactly.


Some random ideas:

I have no idea how FUSE interops with the Linux block layer, but 128K
happens to be the default 'readahead' value for block devices,  
which may

just be a coincidence.

For a disk 'sda', you check and set the value (in 512 byte blocks)  
with:


/sbin/blockdev --getra /dev/sda
/sbin/blockdev --setra [num blocks] /dev/sda


I know on my file system tests, the OS readahead is not activated  
until a
series of sequential reads go through the block device, so truly  
random
access is not affected by this.  I've set it to 128MB and random  
iops does
not change on a ext3 or xfs file system.  If this applies to FUSE  
too,

there
may be reasons that this behavior differs.
Furthermore, one would not expect it to be slower to randomly read  
4k than

randomly read up to the readahead size itself even if it did.

I also have no idea how much of the OS device queue and block device
scheduler is involved with FUSE.  If those are involved, then  
there's a

bunch of stuff to tinker with there as well.

Lastly, an FYI if you don't already know the following.  If the OS is
caching pages, there is a way to flush these in Linux to evict the  
cache.

See /proc/sys/vm/drop_caches .





I'm a bit befuddled.  I know we say that HDFS is optimized for  
large,

sequential reads, not random reads - but it seems that it's one bug-
fix away from being a good general-purpose system.  Heck if I can  
find

what's causing the issues though...

Brian








--
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422




Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-13 Thread jason hadoop
The following very simple program will tell the VM to drop the pages being
cached for a file. I tend to spin this in a for loop when making large tar
files, or otherwise working with large files, and the system performance
really smooths out.
Since it use open(path) it will churn through the inode cache and
directories.
Something like this might actually significantly speed up HDFS by running
over the blocks on the datanodes, for busy clusters.


#define _XOPEN_SOURCE 600
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/** Simple program to dump buffered data for specific files from the buffer
cache. Copyright Jason Venner 2009, License GPL*/

int main( int argc, char** argv )
{
  int failCount = 0;
  int i;
  for( i = 1; i < argc; i++ ) {
char* file = argv[i];
int fd = open( file, O_RDONLY|O_LARGEFILE );
if (fd == -1) {
  perror( file );
  failCount++;
  continue;
}
if (posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED )!=0) {
  fprintf( stderr, "Failed to flush cache for %s %s\n", argv[optind],
strerror( posix_fadvise( fd, 0, 0, POSIX_FADV_DONTNEED ) ) );
  failCount++;
}
close(fd);
  }
  exit( failCount );
}


On Mon, Apr 13, 2009 at 4:01 PM, Scott Carey wrote:

>
> On 4/12/09 9:41 PM, "Brian Bockelman"  wrote:
>
> > Ok, here's something perhaps even more strange.  I removed the "seek"
> > part out of my timings, so I was only timing the "read" instead of the
> > "seek + read" as in the first case.  I also turned the read-ahead down
> > to 1-byte (aka, off).
> >
> > The jump *always* occurs at 128KB, exactly.
>
> Some random ideas:
>
> I have no idea how FUSE interops with the Linux block layer, but 128K
> happens to be the default 'readahead' value for block devices, which may
> just be a coincidence.
>
> For a disk 'sda', you check and set the value (in 512 byte blocks) with:
>
> /sbin/blockdev --getra /dev/sda
> /sbin/blockdev --setra [num blocks] /dev/sda
>
>
> I know on my file system tests, the OS readahead is not activated until a
> series of sequential reads go through the block device, so truly random
> access is not affected by this.  I've set it to 128MB and random iops does
> not change on a ext3 or xfs file system.  If this applies to FUSE too,
> there
> may be reasons that this behavior differs.
> Furthermore, one would not expect it to be slower to randomly read 4k than
> randomly read up to the readahead size itself even if it did.
>
> I also have no idea how much of the OS device queue and block device
> scheduler is involved with FUSE.  If those are involved, then there's a
> bunch of stuff to tinker with there as well.
>
> Lastly, an FYI if you don't already know the following.  If the OS is
> caching pages, there is a way to flush these in Linux to evict the cache.
> See /proc/sys/vm/drop_caches .
>
>
>
> >
> > I'm a bit befuddled.  I know we say that HDFS is optimized for large,
> > sequential reads, not random reads - but it seems that it's one bug-
> > fix away from being a good general-purpose system.  Heck if I can find
> > what's causing the issues though...
> >
> > Brian
> >
> >
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-13 Thread Scott Carey

On 4/12/09 9:41 PM, "Brian Bockelman"  wrote:

> Ok, here's something perhaps even more strange.  I removed the "seek"
> part out of my timings, so I was only timing the "read" instead of the
> "seek + read" as in the first case.  I also turned the read-ahead down
> to 1-byte (aka, off).
> 
> The jump *always* occurs at 128KB, exactly.

Some random ideas:

I have no idea how FUSE interops with the Linux block layer, but 128K
happens to be the default 'readahead' value for block devices, which may
just be a coincidence.

For a disk 'sda', you check and set the value (in 512 byte blocks) with:

/sbin/blockdev --getra /dev/sda
/sbin/blockdev --setra [num blocks] /dev/sda


I know on my file system tests, the OS readahead is not activated until a
series of sequential reads go through the block device, so truly random
access is not affected by this.  I've set it to 128MB and random iops does
not change on a ext3 or xfs file system.  If this applies to FUSE too, there
may be reasons that this behavior differs.
Furthermore, one would not expect it to be slower to randomly read 4k than
randomly read up to the readahead size itself even if it did.

I also have no idea how much of the OS device queue and block device
scheduler is involved with FUSE.  If those are involved, then there's a
bunch of stuff to tinker with there as well.

Lastly, an FYI if you don't already know the following.  If the OS is
caching pages, there is a way to flush these in Linux to evict the cache.
See /proc/sys/vm/drop_caches .



> 
> I'm a bit befuddled.  I know we say that HDFS is optimized for large,
> sequential reads, not random reads - but it seems that it's one bug-
> fix away from being a good general-purpose system.  Heck if I can find
> what's causing the issues though...
> 
> Brian
> 
> 



Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-13 Thread Brian Bockelman


Hey Todd,

Been playing more this morning after thinking about it for the night  
-- I think the culprit is not the network, but actually the cache.   
Here's the output of your script adjusted to do the same calls as I  
was doing (you had left out the random I/O part).


[br...@red tmp]$ java hdfs_tester
Mean value for reads of size 0: 0.0447
Mean value for reads of size 16384: 10.4872
Mean value for reads of size 32768: 10.82925
Mean value for reads of size 49152: 6.2417
Mean value for reads of size 65536: 7.0511003
Mean value for reads of size 81920: 9.411599
Mean value for reads of size 98304: 9.378799
Mean value for reads of size 114688: 8.99065
Mean value for reads of size 131072: 5.1378503
Mean value for reads of size 147456: 6.1324
Mean value for reads of size 163840: 17.1187
Mean value for reads of size 180224: 6.5492
Mean value for reads of size 196608: 8.45695
Mean value for reads of size 212992: 7.4292
Mean value for reads of size 229376: 10.7843
Mean value for reads of size 245760: 9.29095
Mean value for reads of size 262144: 6.57865

Copy of the script below.

So, without the FUSE layer, we don't see much (if any) patterns here.   
The overhead of randomly skipping through the file is higher than the  
overhead of reading out the data.


Upon further inspection, the biggest factor affecting the FUSE layer  
is actually the Linux VFS caching -- if you notice, the bandwidth in  
the given graph for larger read sizes is *higher* than 1Gbps, which is  
the limit of the network on that particular node.  If I go in the  
opposite direction - starting with the largest reads first, then going  
down to the smallest reads, the graph entirely smooths out for the  
small values - everything is read from the filesystem cache in the  
client RAM.  Graph attached.


So, on the upside, mounting through FUSE gives us the opportunity to  
speed up reads for very complex, non-sequential patterns - for free,  
thanks to the hardworking Linux kernel.  On the downside, it's  
incredibly difficult to come up with simple cases to demonstrate  
performance for an application -- the cache performance and size  
depends on how much activity there's on the client, the previous file  
system activity that the application did, and the amount of concurrent  
activity on the server.  I can give you results for performance, but  
it's not going to be the performance you see in real life.  (Gee, if  
only file systems were easy...)


Ok, sorry for the list noise -- it seems I'm going to have to think  
more about this problem before I can come up with something coherent.


Brian





import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
import java.net.URI;
import java.util.Random;

public class hdfs_tester {
 public static void main(String[] args) throws Exception {
   URI uri = new URI("hdfs://hadoop-name:9000/");
   FileSystem fs = FileSystem.get(uri, new Configuration());
   Path path = new Path("/user/uscms01/pnfs/unl.edu/data4/cms/store/ 
phedex_monarctest/Nebraska/LoadTest07_Nebraska_33");

   FSDataInputStream dis = fs.open(path);
   Random rand = new Random();
   FileStatus status = fs.getFileStatus(path);
   long file_len = status.getLen();
   int iters = 20;
   for (int size=0; size < 1024*1024; size += 4*4096) {
 long csum = 0;
 for (int i = 0; i < iters; i++) {
   int pos = rand.nextInt((int)((file_len-size-1)/8))*8;
   byte buf[] = new byte[size];
   if (pos < 0)
 pos = 0;
   long st = System.nanoTime();
   dis.read(pos, buf, 0, size);
   long et = System.nanoTime();
   csum += et-st;
   //System.out.println(String.valueOf(size) + "\t" +  
String.valueOf(pos) + "\t" + String.valueOf(et - st));

 }
 float csum2 = csum; csum2 /= iters;
 System.out.println("Mean value for reads of size " + size + ": "  
+ (csum2/1000/1000));

   }
   fs.close();
 }
}


On Apr 13, 2009, at 3:14 AM, Todd Lipcon wrote:

On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon   
wrote:



Hey Brian,

This is really interesting stuff. I'm curious - have you tried  
these same
experiments using the Java API? I'm wondering whether this is FUSE- 
specific
or inherent to all HDFS reads. I'll try to reproduce this over here  
as well.


This smells sort of nagle-related to me... if you get a chance, you  
may
want to edit DFSClient.java and change TCP_WINDOW_SIZE to 256 *  
1024, and
see if the magic number jumps up to 256KB. If so, I think it should  
be a

pretty easy bugfix.



Oops - spoke too fast there... looks like TCP_WINDOW_SIZE isn't  
actually

used for any socket configuration, so I don't think that will make a
difference... still think networking might be the culprit, though.

-Todd




On Sun, Apr 12, 2009 at 9:41 PM, Brian Bockelman >wrote:


Ok, here's something perhaps even more strange.  I remov

Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-13 Thread Todd Lipcon
On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon  wrote:

> Hey Brian,
>
> This is really interesting stuff. I'm curious - have you tried these same
> experiments using the Java API? I'm wondering whether this is FUSE-specific
> or inherent to all HDFS reads. I'll try to reproduce this over here as well.
>

I just tried this on a localhost single-node cluster with the following test
program:

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
import java.net.URI;

public class Test {
  public static void main(String[] args) throws Exception {
URI uri = new URI("hdfs://localhost:8020/");
FileSystem fs = FileSystem.get(uri, new Configuration());
Path path = new Path("/testfile");
FSDataInputStream dis = fs.open(path);

for (int size=0; size < 1024*1024; size += 4096) {
  for (int i = 0; i < 100; i++) {
long st = System.currentTimeMillis();
byte buf[] = new byte[size];
dis.read(0, buf, 0, size);
long et = System.currentTimeMillis();

System.out.println(String.valueOf(size) + "\t" + String.valueOf(et -
st));
  }
}
fs.close();
  }
}


I didn't see the same behavior as you're reporting. Can you give this a try
on your cluster and see if it shows the 128K jump?

-Todd


Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-13 Thread Todd Lipcon
On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon  wrote:

> Hey Brian,
>
> This is really interesting stuff. I'm curious - have you tried these same
> experiments using the Java API? I'm wondering whether this is FUSE-specific
> or inherent to all HDFS reads. I'll try to reproduce this over here as well.
>
> This smells sort of nagle-related to me... if you get a chance, you may
> want to edit DFSClient.java and change TCP_WINDOW_SIZE to 256 * 1024, and
> see if the magic number jumps up to 256KB. If so, I think it should be a
> pretty easy bugfix.
>

Oops - spoke too fast there... looks like TCP_WINDOW_SIZE isn't actually
used for any socket configuration, so I don't think that will make a
difference... still think networking might be the culprit, though.

-Todd


>
> On Sun, Apr 12, 2009 at 9:41 PM, Brian Bockelman wrote:
>
>> Ok, here's something perhaps even more strange.  I removed the "seek" part
>> out of my timings, so I was only timing the "read" instead of the "seek +
>> read" as in the first case.  I also turned the read-ahead down to 1-byte
>> (aka, off).
>>
>> The jump *always* occurs at 128KB, exactly.
>>
>> I'm a bit befuddled.  I know we say that HDFS is optimized for large,
>> sequential reads, not random reads - but it seems that it's one bug-fix away
>> from being a good general-purpose system.  Heck if I can find what's causing
>> the issues though...
>>
>> Brian
>>
>>
>>
>>
>>
>> On Apr 12, 2009, at 8:53 PM, Brian Bockelman wrote:
>>
>>  Hey all,
>>>
>>> I was doing some research on I/O patterns of our applications, and I
>>> noticed the attached pattern.  In case if the mail server strips out
>>> attachments, I also uploaded it:
>>>
>>> http://t2.unl.edu/store/Hadoop_64KB_ra.png
>>> http://t2.unl.edu/store/Hadoop_1024KB_ra.png
>>>
>>> This was taken using the FUSE mounts of Hadoop; the first one was with a
>>> 64KB read-ahead and the second with a 1MB read-ahead.  This was taken from a
>>> 2GB file and randomly 'seek'ed in the file.  This was performed 20 times for
>>> each read size, advancing in 4KB increments.  Each blue dot is the read time
>>> of one experiment; the red dot is the median read time for the read size.
>>>  The graphs show the absolute read time.
>>>
>>> There's very interesting behavior - it seems that there is a change in
>>> behavior around reads of size of 800KB.  The time for the reads go down
>>> significantly when you read *larger* files.  I thought this was just an
>>> artifact of the 64KB read-ahead I set in FUSE, so I upped the read-ahead
>>> significantly, to 1MB.  In this case, the difference between the the small
>>> read sizes and large read sizes are *very* pronounced.  If it was an
>>> artifact from FUSE, I'd expect the place where the change occurred would be
>>> a function of the readahead-size.
>>>
>>> Anyone out there who knows the code have any ideas?  What could I be
>>> doing wrong?
>>>
>>> Brian
>>>
>>> 
>>>
>>> 
>>>
>>
>>
>>
>


Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-13 Thread Todd Lipcon
Hey Brian,

This is really interesting stuff. I'm curious - have you tried these same
experiments using the Java API? I'm wondering whether this is FUSE-specific
or inherent to all HDFS reads. I'll try to reproduce this over here as well.

This smells sort of nagle-related to me... if you get a chance, you may want
to edit DFSClient.java and change TCP_WINDOW_SIZE to 256 * 1024, and see if
the magic number jumps up to 256KB. If so, I think it should be a pretty
easy bugfix.

-Todd


On Sun, Apr 12, 2009 at 9:41 PM, Brian Bockelman wrote:

> Ok, here's something perhaps even more strange.  I removed the "seek" part
> out of my timings, so I was only timing the "read" instead of the "seek +
> read" as in the first case.  I also turned the read-ahead down to 1-byte
> (aka, off).
>
> The jump *always* occurs at 128KB, exactly.
>
> I'm a bit befuddled.  I know we say that HDFS is optimized for large,
> sequential reads, not random reads - but it seems that it's one bug-fix away
> from being a good general-purpose system.  Heck if I can find what's causing
> the issues though...
>
> Brian
>
>
>
>
>
> On Apr 12, 2009, at 8:53 PM, Brian Bockelman wrote:
>
>  Hey all,
>>
>> I was doing some research on I/O patterns of our applications, and I
>> noticed the attached pattern.  In case if the mail server strips out
>> attachments, I also uploaded it:
>>
>> http://t2.unl.edu/store/Hadoop_64KB_ra.png
>> http://t2.unl.edu/store/Hadoop_1024KB_ra.png
>>
>> This was taken using the FUSE mounts of Hadoop; the first one was with a
>> 64KB read-ahead and the second with a 1MB read-ahead.  This was taken from a
>> 2GB file and randomly 'seek'ed in the file.  This was performed 20 times for
>> each read size, advancing in 4KB increments.  Each blue dot is the read time
>> of one experiment; the red dot is the median read time for the read size.
>>  The graphs show the absolute read time.
>>
>> There's very interesting behavior - it seems that there is a change in
>> behavior around reads of size of 800KB.  The time for the reads go down
>> significantly when you read *larger* files.  I thought this was just an
>> artifact of the 64KB read-ahead I set in FUSE, so I upped the read-ahead
>> significantly, to 1MB.  In this case, the difference between the the small
>> read sizes and large read sizes are *very* pronounced.  If it was an
>> artifact from FUSE, I'd expect the place where the change occurred would be
>> a function of the readahead-size.
>>
>> Anyone out there who knows the code have any ideas?  What could I be doing
>> wrong?
>>
>> Brian
>>
>> 
>>
>> 
>>
>
>
>


Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-12 Thread Brian Bockelman
Ok, here's something perhaps even more strange.  I removed the "seek"  
part out of my timings, so I was only timing the "read" instead of the  
"seek + read" as in the first case.  I also turned the read-ahead down  
to 1-byte (aka, off).


The jump *always* occurs at 128KB, exactly.

I'm a bit befuddled.  I know we say that HDFS is optimized for large,  
sequential reads, not random reads - but it seems that it's one bug- 
fix away from being a good general-purpose system.  Heck if I can find  
what's causing the issues though...


Brian





On Apr 12, 2009, at 8:53 PM, Brian Bockelman wrote:


Hey all,

I was doing some research on I/O patterns of our applications, and I  
noticed the attached pattern.  In case if the mail server strips out  
attachments, I also uploaded it:


http://t2.unl.edu/store/Hadoop_64KB_ra.png
http://t2.unl.edu/store/Hadoop_1024KB_ra.png

This was taken using the FUSE mounts of Hadoop; the first one was  
with a 64KB read-ahead and the second with a 1MB read-ahead.  This  
was taken from a 2GB file and randomly 'seek'ed in the file.  This  
was performed 20 times for each read size, advancing in 4KB  
increments.  Each blue dot is the read time of one experiment; the  
red dot is the median read time for the read size.  The graphs show  
the absolute read time.


There's very interesting behavior - it seems that there is a change  
in behavior around reads of size of 800KB.  The time for the reads  
go down significantly when you read *larger* files.  I thought this  
was just an artifact of the 64KB read-ahead I set in FUSE, so I  
upped the read-ahead significantly, to 1MB.  In this case, the  
difference between the the small read sizes and large read sizes are  
*very* pronounced.  If it was an artifact from FUSE, I'd expect the  
place where the change occurred would be a function of the readahead- 
size.


Anyone out there who knows the code have any ideas?  What could I be  
doing wrong?


Brian








Re: how to mount specification-path of hdfs with fuse-dfs

2009-03-26 Thread jacky_ji

yes, thanks for your response.

Craig Macdonald wrote:
> 
> Hi Jacky,
> 
> Please to hear that fuse-dfs is working for you.
> 
> Do you mean that you want to mount dfs://localhost:9000/users at /mnt/hdfs
> ?
> 
> If so, fuse-dfs doesnt currently support this, but it would be a good 
> idea for a future improvement.
> 
> Craig
> 
> jacky_ji wrote:
>> i can use fuse-dfs to mount hdfs.  just like this: ./fuse-dfs
>> dfs://localhost:9000 /mnt/hdfs -d
>> but  i want to mount the specification path in hdfs now, and i have no
>> idea
>> ablut it, any advice will be appreciated.
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/how-to-mount-specification-path-of-hdfs-with-fuse-dfs-tp22716393p22734500.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: how to mount specification-path of hdfs with fuse-dfs

2009-03-26 Thread Craig Macdonald

Hi Jacky,

Please to hear that fuse-dfs is working for you.

Do you mean that you want to mount dfs://localhost:9000/users at /mnt/hdfs ?

If so, fuse-dfs doesnt currently support this, but it would be a good 
idea for a future improvement.


Craig

jacky_ji wrote:

i can use fuse-dfs to mount hdfs.  just like this: ./fuse-dfs
dfs://localhost:9000 /mnt/hdfs -d
but  i want to mount the specification path in hdfs now, and i have no idea
ablut it, any advice will be appreciated.
  




how to mount specification-path of hdfs with fuse-dfs

2009-03-25 Thread jacky_ji

i can use fuse-dfs to mount hdfs.  just like this: ./fuse-dfs
dfs://localhost:9000 /mnt/hdfs -d
but  i want to mount the specification path in hdfs now, and i have no idea
ablut it, any advice will be appreciated.
-- 
View this message in context: 
http://www.nabble.com/how-to-mount-specification-path-of-hdfs-with-fuse-dfs-tp22716393p22716393.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: where and how to get fuse-dfs?

2009-03-17 Thread jason hadoop
fuse_dfs is a contrib package that is part of the standard hadoop
distribution tar ball, but not compiled, and does not compile without some
special ant flags

There is a README in src/contrib/fuse-dfs/README, of the distribution, that
walks you through the process of compiling and using fuse_dfs.



On Tue, Mar 17, 2009 at 1:11 AM, jacky_ji  wrote:

>
> fuse-dfs is a component of hadoop, but i can't find it. i want to use it to
> mount hdfs.
> where and how to get it?
> who can help me, thanks very much.
> --
> View this message in context:
> http://www.nabble.com/where-and-how-to-get-fuse-dfs--tp22554228p22554228.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


where and how to get fuse-dfs?

2009-03-17 Thread jacky_ji

fuse-dfs is a component of hadoop, but i can't find it. i want to use it to
mount hdfs.
where and how to get it? 
who can help me, thanks very much.
-- 
View this message in context: 
http://www.nabble.com/where-and-how-to-get-fuse-dfs--tp22554228p22554228.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Hadoop+s3 & fuse-dfs

2009-01-29 Thread Brian Bockelman

Hey all,

This is a long-shot, but I've noticed before that libhdfs doesn't load  
hadoop-site.xml *unless* hadoop-site.xml is in your local directory.


As a last try, maybe cd $HADOOP_HOME/conf and try running it from there?

Brian

On Jan 28, 2009, at 7:20 PM, Craig Macdonald wrote:


Hi Roopa,

Glad it worked :-)

Please file JIRA issues against the fuse-dfs / libhdfs components  
that would have made it easier to mount the S3 filesystem.


Craig

Roopa Sudheendra wrote:
Thanks, Yes a setup with fuse-dfs and hdfs works fine.I think the  
mount point was bad for whatever reason and was failing with that  
error .I created another mount point for mounting which resolved   
the transport end point error.


Also i had -d option on my command..:)


Roopa


On Jan 28, 2009, at 6:35 PM, Craig Macdonald wrote:


Hi Roopa,

Firstly, can you get the fuse-dfs working for an instance HDFS?
There is also a debug mode for fuse: enable this by adding -d on  
the command line.


C

Roopa Sudheendra wrote:

Hey Craig,
I tried the way u suggested..but i get this transport endpoint  
not connected. Can i see the logs anywhere? I dont see anything  
in /var/log/messages either
looks like it tries to create the file system in hdfs.c but not  
sure where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting,  
but not a scheme (hdfs etc). The easiest option would be to set  
the S3 as your default file system in your hadoop-site.xml, then  
use the host of "default". That should get libhdfs to use the S3  
file system. i.e. set fuse-dfs to mount dfs://default:0/ and all  
should work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem  
for the df command. This would fail in your case. This issue is  
currently being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like  
anything other than "dfs:// " so with the fact that hadoop can  
connect to S3 file system ..allowing s3 scheme should solve my  
problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based  
on a C interface called libhdfs which allows C programs (such  
as fuse-dfs) to connect to the Hadoop file system Java API.  
This being the case, fuse-dfs should (theoretically) be able  
to connect to any file system that Hadoop can. Your mileage  
may vary, but if you find issues, please do report them  
through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem  
as one of our backup storage solution. Just the hadoop and  
s3(block based since it overcomes the 5gb limit) so far seems  
to be fine.
My problem is that i want to mount this filesystem using fuse- 
dfs ( since i don't have to worry about how the file is  
written on the system ) . Since the namenode does not get  
started with s3 backed hadoop system how can i connect fuse- 
dfs to this setup.


Appreciate your help.
Thanks,
Roopa
















Re: Hadoop+s3 & fuse-dfs

2009-01-28 Thread Craig Macdonald

Hi Roopa,

Glad it worked :-)

Please file JIRA issues against the fuse-dfs / libhdfs components that 
would have made it easier to mount the S3 filesystem.


Craig

Roopa Sudheendra wrote:
Thanks, Yes a setup with fuse-dfs and hdfs works fine.I think the 
mount point was bad for whatever reason and was failing with that 
error .I created another mount point for mounting which resolved  the 
transport end point error.


Also i had -d option on my command..:)


Roopa


On Jan 28, 2009, at 6:35 PM, Craig Macdonald wrote:


Hi Roopa,

Firstly, can you get the fuse-dfs working for an instance HDFS?
There is also a debug mode for fuse: enable this by adding -d on the 
command line.


C

Roopa Sudheendra wrote:

Hey Craig,
I tried the way u suggested..but i get this transport endpoint not 
connected. Can i see the logs anywhere? I dont see anything in 
/var/log/messages either
looks like it tries to create the file system in hdfs.c but not sure 
where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting, 
but not a scheme (hdfs etc). The easiest option would be to set the 
S3 as your default file system in your hadoop-site.xml, then use 
the host of "default". That should get libhdfs to use the S3 file 
system. i.e. set fuse-dfs to mount dfs://default:0/ and all should 
work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for 
the df command. This would fail in your case. This issue is 
currently being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like 
anything other than "dfs:// " so with the fact that hadoop can 
connect to S3 file system ..allowing s3 scheme should solve my 
problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on 
a C interface called libhdfs which allows C programs (such as 
fuse-dfs) to connect to the Hadoop file system Java API. This 
being the case, fuse-dfs should (theoretically) be able to 
connect to any file system that Hadoop can. Your mileage may 
vary, but if you find issues, please do report them through the 
normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as 
one of our backup storage solution. Just the hadoop and s3(block 
based since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using 
fuse-dfs ( since i don't have to worry about how the file is 
written on the system ) . Since the namenode does not get 
started with s3 backed hadoop system how can i connect fuse-dfs 
to this setup.


Appreciate your help.
Thanks,
Roopa
















Re: Hadoop+s3 & fuse-dfs

2009-01-28 Thread Roopa Sudheendra
Thanks, Yes a setup with fuse-dfs and hdfs works fine.I think the  
mount point was bad for whatever reason and was failing with that  
error .I created another mount point for mounting which resolved  the  
transport end point error.


Also i had -d option on my command..:)


Roopa


On Jan 28, 2009, at 6:35 PM, Craig Macdonald wrote:


Hi Roopa,

Firstly, can you get the fuse-dfs working for an instance HDFS?
There is also a debug mode for fuse: enable this by adding -d on the  
command line.


C

Roopa Sudheendra wrote:

Hey Craig,
I tried the way u suggested..but i get this transport endpoint not  
connected. Can i see the logs anywhere? I dont see anything in /var/ 
log/messages either
looks like it tries to create the file system in hdfs.c but not  
sure where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting,  
but not a scheme (hdfs etc). The easiest option would be to set  
the S3 as your default file system in your hadoop-site.xml, then  
use the host of "default". That should get libhdfs to use the S3  
file system. i.e. set fuse-dfs to mount dfs://default:0/ and all  
should work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for  
the df command. This would fail in your case. This issue is  
currently being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like  
anything other than "dfs:// " so with the fact that hadoop can  
connect to S3 file system ..allowing s3 scheme should solve my  
problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based  
on a C interface called libhdfs which allows C programs (such as  
fuse-dfs) to connect to the Hadoop file system Java API. This  
being the case, fuse-dfs should (theoretically) be able to  
connect to any file system that Hadoop can. Your mileage may  
vary, but if you find issues, please do report them through the  
normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem  
as one of our backup storage solution. Just the hadoop and  
s3(block based since it overcomes the 5gb limit) so far seems  
to be fine.
My problem is that i want to mount this filesystem using fuse- 
dfs ( since i don't have to worry about how the file is written  
on the system ) . Since the namenode does not get started with  
s3 backed hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa














Re: Hadoop+s3 & fuse-dfs

2009-01-28 Thread Craig Macdonald

Hi Roopa,

Firstly, can you get the fuse-dfs working for an instance HDFS?
There is also a debug mode for fuse: enable this by adding -d on the 
command line.


C

Roopa Sudheendra wrote:

Hey Craig,
 I tried the way u suggested..but i get this transport endpoint not 
connected. Can i see the logs anywhere? I dont see anything in 
/var/log/messages either
 looks like it tries to create the file system in hdfs.c but not sure 
where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting, but 
not a scheme (hdfs etc). The easiest option would be to set the S3 as 
your default file system in your hadoop-site.xml, then use the host 
of "default". That should get libhdfs to use the S3 file system. i.e. 
set fuse-dfs to mount dfs://default:0/ and all should work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for 
the df command. This would fail in your case. This issue is currently 
being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like anything 
other than "dfs:// " so with the fact that hadoop can connect to S3 
file system ..allowing s3 scheme should solve my problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on a 
C interface called libhdfs which allows C programs (such as 
fuse-dfs) to connect to the Hadoop file system Java API. This being 
the case, fuse-dfs should (theoretically) be able to connect to any 
file system that Hadoop can. Your mileage may vary, but if you find 
issues, please do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as 
one of our backup storage solution. Just the hadoop and s3(block 
based since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs 
( since i don't have to worry about how the file is written on the 
system ) . Since the namenode does not get started with s3 backed 
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa












Re: Hadoop+s3 & fuse-dfs

2009-01-28 Thread Roopa Sudheendra

Hey Craig,
 I tried the way u suggested..but i get this transport endpoint not  
connected. Can i see the logs anywhere? I dont see anything in /var/ 
log/messages either
 looks like it tries to create the file system in hdfs.c but not sure  
where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting, but  
not a scheme (hdfs etc). The easiest option would be to set the S3  
as your default file system in your hadoop-site.xml, then use the  
host of "default". That should get libhdfs to use the S3 file  
system. i.e. set fuse-dfs to mount dfs://default:0/ and all should  
work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for  
the df command. This would fail in your case. This issue is  
currently being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like  
anything other than "dfs:// " so with the fact that hadoop can  
connect to S3 file system ..allowing s3 scheme should solve my  
problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on  
a C interface called libhdfs which allows C programs (such as fuse- 
dfs) to connect to the Hadoop file system Java API. This being the  
case, fuse-dfs should (theoretically) be able to connect to any  
file system that Hadoop can. Your mileage may vary, but if you  
find issues, please do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as  
one of our backup storage solution. Just the hadoop and s3(block  
based since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs  
( since i don't have to worry about how the file is written on  
the system ) . Since the namenode does not get started with s3  
backed hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa










Re: Hadoop+s3 & fuse-dfs

2009-01-28 Thread Craig Macdonald

In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting, but 
not a scheme (hdfs etc). The easiest option would be to set the S3 as 
your default file system in your hadoop-site.xml, then use the host of 
"default". That should get libhdfs to use the S3 file system. i.e. set 
fuse-dfs to mount dfs://default:0/ and all should work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for the 
df command. This would fail in your case. This issue is currently being 
worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like anything 
other than "dfs:// " so with the fact that hadoop can connect to S3 
file system ..allowing s3 scheme should solve my problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on a C 
interface called libhdfs which allows C programs (such as fuse-dfs) 
to connect to the Hadoop file system Java API. This being the case, 
fuse-dfs should (theoretically) be able to connect to any file system 
that Hadoop can. Your mileage may vary, but if you find issues, 
please do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as one 
of our backup storage solution. Just the hadoop and s3(block based 
since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs ( 
since i don't have to worry about how the file is written on the 
system ) . Since the namenode does not get started with s3 backed 
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa








Re: Hadoop+s3 & fuse-dfs

2009-01-28 Thread Roopa Sudheendra

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like anything  
other than "dfs:// " so with the fact that hadoop can connect to S3  
file system ..allowing s3 scheme should solve my problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on a  
C interface called libhdfs which allows C programs (such as fuse- 
dfs) to connect to the Hadoop file system Java API. This being the  
case, fuse-dfs should (theoretically) be able to connect to any file  
system that Hadoop can. Your mileage may vary, but if you find  
issues, please do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as  
one of our backup storage solution. Just the hadoop and s3(block  
based since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs  
( since i don't have to worry about how the file is written on the  
system ) . Since the namenode does not get started with s3 backed  
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa






Re: Hadoop+s3 & fuse-dfs

2009-01-28 Thread Craig Macdonald

Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on a C 
interface called libhdfs which allows C programs (such as fuse-dfs) to 
connect to the Hadoop file system Java API. This being the case, 
fuse-dfs should (theoretically) be able to connect to any file system 
that Hadoop can. Your mileage may vary, but if you find issues, please 
do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as one 
of our backup storage solution. Just the hadoop and s3(block based 
since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs ( 
since i don't have to worry about how the file is written on the 
system ) . Since the namenode does not get started with s3 backed 
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa




Hadoop+s3 & fuse-dfs

2009-01-28 Thread Roopa Sudheendra
I am experimenting with Hadoop backed by Amazon s3 filesystem as one  
of our backup storage solution. Just the hadoop and s3(block based  
since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs  
( since i don't have to worry about how the file is written on the  
system ) . Since the namenode does not get started with s3 backed  
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa


Re: fuse-dfs : Transport endpoint is not connected

2008-12-29 Thread Brian Bockelman

Hey Amit,

The "transport endpoint is not connected" means that you have a FUSE  
endpoint mounted that crashed which you did not unmount before the  
current attempt.


Also, it's fairly pointless to run FUSE-DFS in 0.19.0 without this  
patch:


http://issues.apache.org/jira/secure/attachment/12394123/HADOOP-4616_0.19.txt

Brian

On Dec 29, 2008, at 9:09 AM, amit handa wrote:


Just to add , I am using hadoop-core-0.19.0 and fuse 2.7.4

On Mon, Dec 29, 2008 at 8:33 PM, amit handa  wrote:

Hi,

I get the following error when trying to mount the fuse dfs.

[fuse-dfs]$ ./fuse_dfs_wrapper.sh -d dfs://mydevserver.com:9000 / 
mnt/hadoop/

fuse-dfs ignoring option -d
port=9000,server=mydevserver.com
fuse-dfs didn't recognize /mnt/hadoop/,-2
fuse: bad mount point `/mnt/hadoop/': Transport endpoint is not  
connected


I followed all the steps at one of the similar threads -
http://www.nabble.com/fuse-dfs-to18849722.html#a18877009 but it  
didn't

resolve the issue.
I was able to build the fuse-dfs using  ant compile-contrib
-Dlibhdfs=1 -Dfusedfs=1

the /var/log/messages shows the following line:

Dec 28 07:41:10 mydevserver fuse_dfs: mounting dfs:// 
mydevserver.com:9000/


Any pointers to debug this issue ?

Thanks,
Amit





Re: fuse-dfs : Transport endpoint is not connected

2008-12-29 Thread amit handa
Just to add , I am using hadoop-core-0.19.0 and fuse 2.7.4

On Mon, Dec 29, 2008 at 8:33 PM, amit handa  wrote:
> Hi,
>
> I get the following error when trying to mount the fuse dfs.
>
> [fuse-dfs]$ ./fuse_dfs_wrapper.sh -d dfs://mydevserver.com:9000 /mnt/hadoop/
> fuse-dfs ignoring option -d
> port=9000,server=mydevserver.com
> fuse-dfs didn't recognize /mnt/hadoop/,-2
> fuse: bad mount point `/mnt/hadoop/': Transport endpoint is not connected
>
> I followed all the steps at one of the similar threads -
> http://www.nabble.com/fuse-dfs-to18849722.html#a18877009 but it didn't
> resolve the issue.
> I was able to build the fuse-dfs using  ant compile-contrib
> -Dlibhdfs=1 -Dfusedfs=1
>
> the /var/log/messages shows the following line:
>
> Dec 28 07:41:10 mydevserver fuse_dfs: mounting dfs://mydevserver.com:9000/
>
> Any pointers to debug this issue ?
>
> Thanks,
> Amit
>


fuse-dfs : Transport endpoint is not connected

2008-12-29 Thread amit handa
Hi,

I get the following error when trying to mount the fuse dfs.

[fuse-dfs]$ ./fuse_dfs_wrapper.sh -d dfs://mydevserver.com:9000 /mnt/hadoop/
fuse-dfs ignoring option -d
port=9000,server=mydevserver.com
fuse-dfs didn't recognize /mnt/hadoop/,-2
fuse: bad mount point `/mnt/hadoop/': Transport endpoint is not connected

I followed all the steps at one of the similar threads -
http://www.nabble.com/fuse-dfs-to18849722.html#a18877009 but it didn't
resolve the issue.
I was able to build the fuse-dfs using  ant compile-contrib
-Dlibhdfs=1 -Dfusedfs=1

the /var/log/messages shows the following line:

Dec 28 07:41:10 mydevserver fuse_dfs: mounting dfs://mydevserver.com:9000/

Any pointers to debug this issue ?

Thanks,
Amit


RE: The error occurred when a lot of files created use fuse-dfs

2008-12-16 Thread zhuweimin
Brian

Thank you very much.

The version of Hadoop is 0.19.0,I think 4616 and 4635 patches is necessary.

I will try it.



-Original Message-
From: Brian Bockelman [mailto:bbock...@cse.unl.edu] 
Sent: Monday, December 15, 2008 10:00 PM
To: core-user@hadoop.apache.org
Subject: Re: The error occurred when a lot of files created use fuse-dfs

Hey,

What version of Hadoop are you running?  Have you taken a look at  
HADOOP-4775?

https://issues.apache.org/jira/browse/HADOOP-4775

Basically, fuse-dfs is not usable on Hadoop 0.19.0 without a patch.

Brian

On Dec 15, 2008, at 12:24 AM, zhuweimin wrote:

> Dear fuse-dfs users
>
> I copy 1000 files into hadoop from local disk use fuse-dfs,
> Display the following error when the 600th files are copied:
>
> cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
> 10m_33.dat':
> Input/output error
> cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
> 10m_34.dat':
> Input/output error
> cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
> 10m_35.dat':
> Input/output error
> cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
> 10m_36.dat':
> Input/output error
> cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
> 10m_37.dat':
> Input/output error
> cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
> 10m_38.dat':
> Input/output error
> ...
>
> It is necessary to remount the fuse-dfs.
>
> Do you think about of the error.
>
> thanks
>
>




Re: The error occurred when a lot of files created use fuse-dfs

2008-12-15 Thread Brian Bockelman

Hey,

What version of Hadoop are you running?  Have you taken a look at  
HADOOP-4775?


https://issues.apache.org/jira/browse/HADOOP-4775

Basically, fuse-dfs is not usable on Hadoop 0.19.0 without a patch.

Brian

On Dec 15, 2008, at 12:24 AM, zhuweimin wrote:


Dear fuse-dfs users

I copy 1000 files into hadoop from local disk use fuse-dfs,
Display the following error when the 600th files are copied:

cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_33.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_34.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_35.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_36.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_37.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_38.dat':

Input/output error
...

It is necessary to remount the fuse-dfs.

Do you think about of the error.

thanks






The error occurred when a lot of files created use fuse-dfs

2008-12-14 Thread zhuweimin
Dear fuse-dfs users

I copy 1000 files into hadoop from local disk use fuse-dfs, 
Display the following error when the 600th files are copied:

cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/10m_33.dat':
Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/10m_34.dat':
Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/10m_35.dat':
Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/10m_36.dat':
Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/10m_37.dat':
Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/10m_38.dat':
Input/output error
...

It is necessary to remount the fuse-dfs.

Do you think about of the error.

thanks





Re: fuse-dfs

2008-08-08 Thread Pete Wyckoff

Hi Sebastian.

Setting of times doesn¹t work, but ls, rm, rmdir, mkdir, cp, etc etc should
work.

Things that are not currently supported include:

Touch,  chown, chmod, permissions in general and obviously random writes for
which you would get an IO error.

This is what I get on 0.17 for df ­h:

FilesystemSize  Used Avail Use% Mounted on
fuse  XXXT  YYYT  ZZZT  AA% /export/hdfs  and the #s are
right

There is no unit test for df though (doh!), so it¹s quite possible the
libhdfs API has changed and fuse-dfs needs to update its code to match the
API. I will check that.

To be honest, we run on 0.17.1, so other than unit tests, I never run on
0.19 :(

-- pete

Ps I created: https://issues.apache.org/jira/browse/HADOOP-3928 to track
this.






On 8/8/08 3:34 AM, "Sebastian Vieira" <[EMAIL PROTECTED]> wrote:

> Hi Pete,
> 
> From within the 0.19 source i did:
> 
> ant jar
> ant metrics.jar
> ant test-core
> 
> This resulted in 3 jar files within $HADOOP_HOME/build :
> 
> [EMAIL PROTECTED] hadoop-0.19]# ls -l build/*.jar
> -rw-r--r-- 1 root root 2201651 Aug  8 08:26 build/hadoop-0.19.0-dev-core.jar
> -rw-r--r-- 1 root root 1096699 Aug  8 08:29 build/hadoop-0.19.0-dev-test.jar
> -rw-r--r-- 1 root root   55695 Aug  8 08:26
> build/hadoop-metrics-0.19.0-dev.jar
> 
> I've added these to be included in the CLASSPATH within the wrapper script:
> 
> for f in `ls $HADOOP_HOME/build/*.jar`; do
> export CLASSPATH=$CLASSPATH:$f
> done
> 
> This still produced the same error, so (thanks to the more detailed error
> output your patch provided) i renamed hadoop-0.19.0-dev-core.jar to
> hadoop-core.jar to match the regexp.
> 
> Then i figured out that i can't use dfs://master:9000 becaus in
> hadoop-site.xml i specified that dfs should run on port 54310 (doh!). So i
> issued this command:
> 
> ./fuse_dfs_wrapper.sh dfs://master:54310 /mnt/hadoop -d
> 
> Succes! Even though the output from df -h is .. weird :
> 
> fuse  512M 0  512M   0% /mnt/hadoop
> 
> I added some data:
> 
> for x in `seq 1 25`;do
> dd if=/dev/zero of=/mnt/hadoop/test-$x.raw bs=1MB count=10
> done
> 
> And now the output from df -h is:
> 
> fuse  512M -3.4G  3.9G   -  /mnt/hadoop
> 
> Note that my HDFS setup now consists of 20 nodes, exporting 15G each, so df is
> a little confused. Hadoop's status page (dfshealth.jsp) correctly displays the
> output though, evenly dividing the blocks over all the nodes.
> 
> What i didn't understand however, is why there's no fuse-dfs in the
> downloadable tarballs. Am i looking in the wrong place perhaps?
> 
> Anyway, now that i got things mounted, i come upon the next problem. I can't
> do much else than dd :)
> 
> [EMAIL PROTECTED] fuse-dfs]# touch /mnt/hadoop/test.tst
> touch: setting times of `/mnt/hadoop/test.tst': Function not implemented
> 
> 
> regards,
> 
> Sebastian
> 




Re: fuse-dfs

2008-08-08 Thread Sebastian Vieira
Hi Pete,

>From within the 0.19 source i did:

ant jar
ant metrics.jar
ant test-core

This resulted in 3 jar files within $HADOOP_HOME/build :

[EMAIL PROTECTED] hadoop-0.19]# ls -l build/*.jar
-rw-r--r-- 1 root root 2201651 Aug  8 08:26 build/hadoop-0.19.0-dev-core.jar
-rw-r--r-- 1 root root 1096699 Aug  8 08:29 build/hadoop-0.19.0-dev-test.jar
-rw-r--r-- 1 root root   55695 Aug  8 08:26
build/hadoop-metrics-0.19.0-dev.jar

I've added these to be included in the CLASSPATH within the wrapper script:

for f in `ls $HADOOP_HOME/build/*.jar`; do
export CLASSPATH=$CLASSPATH:$f
done

This still produced the same error, so (thanks to the more detailed error
output your patch provided) i renamed hadoop-0.19.0-dev-core.jar to
hadoop-core.jar to match the regexp.

Then i figured out that i can't use dfs://master:9000 becaus in
hadoop-site.xml i specified that dfs should run on port 54310 (doh!). So i
issued this command:

./fuse_dfs_wrapper.sh dfs://master:54310 /mnt/hadoop -d

Succes! Even though the output from df -h is .. weird :

fuse  512M 0  512M   0% /mnt/hadoop

I added some data:

for x in `seq 1 25`;do
dd if=/dev/zero of=/mnt/hadoop/test-$x.raw bs=1MB count=10
done

And now the output from df -h is:

fuse  512M -3.4G  3.9G   -  /mnt/hadoop

Note that my HDFS setup now consists of 20 nodes, exporting 15G each, so df
is a little confused. Hadoop's status page (dfshealth.jsp) correctly
displays the output though, evenly dividing the blocks over all the nodes.

What i didn't understand however, is why there's no fuse-dfs in the
downloadable tarballs. Am i looking in the wrong place perhaps?

Anyway, now that i got things mounted, i come upon the next problem. I can't
do much else than dd :)

[EMAIL PROTECTED] fuse-dfs]# touch /mnt/hadoop/test.tst
touch: setting times of `/mnt/hadoop/test.tst': Function not implemented


regards,

Sebastian


Re: fuse-dfs

2008-08-07 Thread Pete Wyckoff

This just means your classpath is not set properly, so when fuse-dfs uses
libhdfs to try and connect to your server, it cannot instantiate hadoop
objects.

I have a JIRA open to improve error messaging when this happens:

https://issues.apache.org/jira/browse/HADOOP-3918

If you use the fuse_dfs_wrapper.sh, you should be able to set HADOOP_HOME
and it will create the classpath for you.

In retrospect, fuse_dfs_wrapper.sh should probably complain and exit if
HADOOP_HOME is not set.

-- pete


On 8/7/08 2:35 PM, "Sebastian Vieira" <[EMAIL PROTECTED]> wrote:

> On Thu, Aug 7, 2008 at 4:25 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote:
> 
>> 
>> Hi Sebastian,
>> 
>> Those 2 things are just warnings and shouldn't cause any problems.  What
>> happens when you ls /mnt/hadoop ?
> 
> 
> [EMAIL PROTECTED] fuse-dfs]# ls /mnt/hadoop
> ls: /mnt/hadoop: Transport endpoint is not connected
> 
> Also, this happens when i start fuse-dfs in one terminal, and do a df -h in
> another:
> 
> [EMAIL PROTECTED] fuse-dfs]# ./fuse_dfs_wrapper.sh dfs://master:9000 
> /mnt/hadoop
> -d
> port=9000,server=master
> fuse-dfs didn't recognize /mnt/hadoop,-2
> fuse-dfs ignoring option -d
> unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
> INIT: 7.8
> flags=0x0003
> max_readahead=0x0002
>INIT: 7.8
>flags=0x0001
>max_readahead=0x0002
>max_write=0x0010
>unique: 1, error: 0 (Success), outsize: 40
> unique: 2, opcode: STATFS (17), nodeid: 1, insize: 40
> 
> -now i do a df -h in the other term-
> 
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/conf/Configuration
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.conf.Configuration
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClassInternal(Unknown Source)
> 
> Then the output from df is:
> 
> df: `/mnt/hadoop': Software caused connection abort
> 
> 
> 
>>  And also what version of fuse-dfs are you
>> using? The handling of options is different in trunk than in the last
>> release.
> 
> 
> [EMAIL PROTECTED] fuse-dfs]# ./fuse_dfs --version
> ./fuse_dfs 0.1.0
> 
> I did a checkout of the latest svn and compiled using the command you gave
> in one of your previous mails.
> 
> 
>> 
>> You can also look in /var/log/messages.
>> 
> 
> Only one line:
> Aug  7 20:21:05 master fuse_dfs: mounting dfs://master:9000/
> 
> 
> Thanks for your time,
> 
> 
> Sebastian



Re: fuse-dfs

2008-08-07 Thread Sebastian Vieira
On Thu, Aug 7, 2008 at 4:25 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote:

>
> Hi Sebastian,
>
> Those 2 things are just warnings and shouldn't cause any problems.  What
> happens when you ls /mnt/hadoop ?


[EMAIL PROTECTED] fuse-dfs]# ls /mnt/hadoop
ls: /mnt/hadoop: Transport endpoint is not connected

Also, this happens when i start fuse-dfs in one terminal, and do a df -h in
another:

[EMAIL PROTECTED] fuse-dfs]# ./fuse_dfs_wrapper.sh dfs://master:9000 /mnt/hadoop
-d
port=9000,server=master
fuse-dfs didn't recognize /mnt/hadoop,-2
fuse-dfs ignoring option -d
unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
INIT: 7.8
flags=0x0003
max_readahead=0x0002
   INIT: 7.8
   flags=0x0001
   max_readahead=0x0002
   max_write=0x0010
   unique: 1, error: 0 (Success), outsize: 40
unique: 2, opcode: STATFS (17), nodeid: 1, insize: 40

-now i do a df -h in the other term-

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)

Then the output from df is:

df: `/mnt/hadoop': Software caused connection abort



>  And also what version of fuse-dfs are you
> using? The handling of options is different in trunk than in the last
> release.


[EMAIL PROTECTED] fuse-dfs]# ./fuse_dfs --version
./fuse_dfs 0.1.0

I did a checkout of the latest svn and compiled using the command you gave
in one of your previous mails.


>
> You can also look in /var/log/messages.
>

Only one line:
Aug  7 20:21:05 master fuse_dfs: mounting dfs://master:9000/


Thanks for your time,


Sebastian


Re: fuse-dfs

2008-08-07 Thread Pete Wyckoff

Hi Sebastian,

Those 2 things are just warnings and shouldn't cause any problems.  What
happens when you ls /mnt/hadoop ?  And also what version of fuse-dfs are you
using? The handling of options is different in trunk than in the last
release.

You can also look in /var/log/messages.

pete


On 8/7/08 7:12 AM, "Sebastian Vieira" <[EMAIL PROTECTED]> wrote:

> Thanks. After alot of experimenting (and ofcourse, right before you sent
> this reply) i figured it out. I also had to include the path to libhdfs.so
> in my ld.so.conf and update it before i was able to succesfully compile
> fuse_dfs. However when i try to mount the HDFS, it fails. I have tried both
> the wrapper script and the single binary. Both display the following error:
> 
> fuse-dfs didn't recognize /mnt/hadoop,-2
> fuse-dfs ignoring option -d
> 
> regards,
> 
> Sebastian
> 
> On Wed, Aug 6, 2008 at 5:29 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote:
> 
>> 
>> Sorry - I see the problem now: should be:
>> 
>> Ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1
>> 
>> Compile-contrib depends on compile-libhdfs which also requires the
>> -Dlibhdfs=1 property to be set.
>> 
>> pete
>> 
>> 
>> On 8/6/08 5:04 AM, "Sebastian Vieira" <[EMAIL PROTECTED]> wrote:
>> 
>>> Hi,
>>> 
>>> I have installed Hadoop on 20 nodes (data storage) and one master
>> (namenode)
>>> to which i want to add data. I have learned that this is possible through
>> a
>>> Java API or via the Hadoop shell. However, i would like to mount the HDFS
>>> using FUSE and i discovered that there's a contrib/fuse-dfs within the
>>> Hadoop tar.gz package. Now i read the README file and noticed that i was
>>> unable to compile using this command:
>>> 
>>> ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1
>>> 
>>> If i change the line to:
>>> 
>>> ant compile-contrib -Dcompile.c++=1 -Dlibhdfs-fuse=1
>>> 
>>> It goes a little bit further. It will now start the configure script, but
>>> still fails. I've tried alot of different things but i'm unable to
>> compile
>>> fuse-dfs. This is a piece of the error i get from ant:
>>> 
>>> compile:
>>>  [echo] contrib: fuse-dfs
>>> -snip-
>>>  [exec] Making all in src
>>>  [exec] make[1]: Entering directory
>>> `/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
>>>  [exec] gcc  -Wall -O3
>> -L/usr/local/src/hadoop-core-trunk/build/libhdfs
>>> -lhdfs -L/usr/lib -lfuse -L/usr/java/jdk1.6.0_07/jre/lib/i386/server
>> -ljvm
>>> -o fuse_dfs  fuse_dfs.o
>>>  [exec] /usr/bin/ld: cannot find -lhdfs
>>>  [exec] collect2: ld returned 1 exit status
>>>  [exec] make[1]: *** [fuse_dfs] Error 1
>>>  [exec] make[1]: Leaving directory
>>> `/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
>>>  [exec] make: *** [all-recursive] Error 1
>>> 
>>> BUILD FAILED
>>> /usr/local/src/hadoop-core-trunk/build.xml:413: The following error
>> occurred
>>> while executing this line:
>>> /usr/local/src/hadoop-core-trunk/src/contrib/build.xml:30: The following
>>> error occurred while executing this line:
>>> /usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/build.xml:40: exec
>>> returned: 2
>>> 
>>> 
>>> Could somebody shed some light on this?
>>> 
>>> 
>>> thanks,
>>> 
>>> Sebastian.
>> 
>> 



Re: fuse-dfs

2008-08-07 Thread Sebastian Vieira
Thanks. After alot of experimenting (and ofcourse, right before you sent
this reply) i figured it out. I also had to include the path to libhdfs.so
in my ld.so.conf and update it before i was able to succesfully compile
fuse_dfs. However when i try to mount the HDFS, it fails. I have tried both
the wrapper script and the single binary. Both display the following error:

fuse-dfs didn't recognize /mnt/hadoop,-2
fuse-dfs ignoring option -d

regards,

Sebastian

On Wed, Aug 6, 2008 at 5:29 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote:

>
> Sorry - I see the problem now: should be:
>
> Ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1
>
> Compile-contrib depends on compile-libhdfs which also requires the
> -Dlibhdfs=1 property to be set.
>
> pete
>
>
> On 8/6/08 5:04 AM, "Sebastian Vieira" <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I have installed Hadoop on 20 nodes (data storage) and one master
> (namenode)
> > to which i want to add data. I have learned that this is possible through
> a
> > Java API or via the Hadoop shell. However, i would like to mount the HDFS
> > using FUSE and i discovered that there's a contrib/fuse-dfs within the
> > Hadoop tar.gz package. Now i read the README file and noticed that i was
> > unable to compile using this command:
> >
> > ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1
> >
> > If i change the line to:
> >
> > ant compile-contrib -Dcompile.c++=1 -Dlibhdfs-fuse=1
> >
> > It goes a little bit further. It will now start the configure script, but
> > still fails. I've tried alot of different things but i'm unable to
> compile
> > fuse-dfs. This is a piece of the error i get from ant:
> >
> > compile:
> >  [echo] contrib: fuse-dfs
> > -snip-
> >  [exec] Making all in src
> >  [exec] make[1]: Entering directory
> > `/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
> >  [exec] gcc  -Wall -O3
> -L/usr/local/src/hadoop-core-trunk/build/libhdfs
> > -lhdfs -L/usr/lib -lfuse -L/usr/java/jdk1.6.0_07/jre/lib/i386/server
> -ljvm
> > -o fuse_dfs  fuse_dfs.o
> >  [exec] /usr/bin/ld: cannot find -lhdfs
> >  [exec] collect2: ld returned 1 exit status
> >  [exec] make[1]: *** [fuse_dfs] Error 1
> >  [exec] make[1]: Leaving directory
> > `/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
> >  [exec] make: *** [all-recursive] Error 1
> >
> > BUILD FAILED
> > /usr/local/src/hadoop-core-trunk/build.xml:413: The following error
> occurred
> > while executing this line:
> > /usr/local/src/hadoop-core-trunk/src/contrib/build.xml:30: The following
> > error occurred while executing this line:
> > /usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/build.xml:40: exec
> > returned: 2
> >
> >
> > Could somebody shed some light on this?
> >
> >
> > thanks,
> >
> > Sebastian.
>
>


Re: fuse-dfs

2008-08-06 Thread Pete Wyckoff

Sorry - I see the problem now: should be:

Ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1

Compile-contrib depends on compile-libhdfs which also requires the
-Dlibhdfs=1 property to be set.

pete


On 8/6/08 5:04 AM, "Sebastian Vieira" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I have installed Hadoop on 20 nodes (data storage) and one master (namenode)
> to which i want to add data. I have learned that this is possible through a
> Java API or via the Hadoop shell. However, i would like to mount the HDFS
> using FUSE and i discovered that there's a contrib/fuse-dfs within the
> Hadoop tar.gz package. Now i read the README file and noticed that i was
> unable to compile using this command:
> 
> ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1
> 
> If i change the line to:
> 
> ant compile-contrib -Dcompile.c++=1 -Dlibhdfs-fuse=1
> 
> It goes a little bit further. It will now start the configure script, but
> still fails. I've tried alot of different things but i'm unable to compile
> fuse-dfs. This is a piece of the error i get from ant:
> 
> compile:
>  [echo] contrib: fuse-dfs
> -snip-
>  [exec] Making all in src
>  [exec] make[1]: Entering directory
> `/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
>  [exec] gcc  -Wall -O3 -L/usr/local/src/hadoop-core-trunk/build/libhdfs
> -lhdfs -L/usr/lib -lfuse -L/usr/java/jdk1.6.0_07/jre/lib/i386/server -ljvm
> -o fuse_dfs  fuse_dfs.o
>  [exec] /usr/bin/ld: cannot find -lhdfs
>  [exec] collect2: ld returned 1 exit status
>  [exec] make[1]: *** [fuse_dfs] Error 1
>  [exec] make[1]: Leaving directory
> `/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
>  [exec] make: *** [all-recursive] Error 1
> 
> BUILD FAILED
> /usr/local/src/hadoop-core-trunk/build.xml:413: The following error occurred
> while executing this line:
> /usr/local/src/hadoop-core-trunk/src/contrib/build.xml:30: The following
> error occurred while executing this line:
> /usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/build.xml:40: exec
> returned: 2
> 
> 
> Could somebody shed some light on this?
> 
> 
> thanks,
> 
> Sebastian.



Re: fuse-dfs

2008-08-06 Thread Pete Wyckoff

Hi Sebastian,

The problem is that hdfs.so is supposed to be in build/libhdfs but for some
reason isn't.

Have you tried doing a ant compile-libhdfs -Dlibhdfs=1 ?


And then checked if hdfs.so is in build/libhdfs ?

Thanks, pete

On 8/6/08 5:04 AM, "Sebastian Vieira" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I have installed Hadoop on 20 nodes (data storage) and one master (namenode)
> to which i want to add data. I have learned that this is possible through a
> Java API or via the Hadoop shell. However, i would like to mount the HDFS
> using FUSE and i discovered that there's a contrib/fuse-dfs within the
> Hadoop tar.gz package. Now i read the README file and noticed that i was
> unable to compile using this command:
> 
> ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1
> 
> If i change the line to:
> 
> ant compile-contrib -Dcompile.c++=1 -Dlibhdfs-fuse=1
> 
> It goes a little bit further. It will now start the configure script, but
> still fails. I've tried alot of different things but i'm unable to compile
> fuse-dfs. This is a piece of the error i get from ant:
> 
> compile:
>  [echo] contrib: fuse-dfs
> -snip-
>  [exec] Making all in src
>  [exec] make[1]: Entering directory
> `/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
>  [exec] gcc  -Wall -O3 -L/usr/local/src/hadoop-core-trunk/build/libhdfs
> -lhdfs -L/usr/lib -lfuse -L/usr/java/jdk1.6.0_07/jre/lib/i386/server -ljvm
> -o fuse_dfs  fuse_dfs.o
>  [exec] /usr/bin/ld: cannot find -lhdfs
>  [exec] collect2: ld returned 1 exit status
>  [exec] make[1]: *** [fuse_dfs] Error 1
>  [exec] make[1]: Leaving directory
> `/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
>  [exec] make: *** [all-recursive] Error 1
> 
> BUILD FAILED
> /usr/local/src/hadoop-core-trunk/build.xml:413: The following error occurred
> while executing this line:
> /usr/local/src/hadoop-core-trunk/src/contrib/build.xml:30: The following
> error occurred while executing this line:
> /usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/build.xml:40: exec
> returned: 2
> 
> 
> Could somebody shed some light on this?
> 
> 
> thanks,
> 
> Sebastian.



fuse-dfs

2008-08-06 Thread Sebastian Vieira
Hi,

I have installed Hadoop on 20 nodes (data storage) and one master (namenode)
to which i want to add data. I have learned that this is possible through a
Java API or via the Hadoop shell. However, i would like to mount the HDFS
using FUSE and i discovered that there's a contrib/fuse-dfs within the
Hadoop tar.gz package. Now i read the README file and noticed that i was
unable to compile using this command:

ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1

If i change the line to:

ant compile-contrib -Dcompile.c++=1 -Dlibhdfs-fuse=1

It goes a little bit further. It will now start the configure script, but
still fails. I've tried alot of different things but i'm unable to compile
fuse-dfs. This is a piece of the error i get from ant:

compile:
 [echo] contrib: fuse-dfs
-snip-
 [exec] Making all in src
 [exec] make[1]: Entering directory
`/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
 [exec] gcc  -Wall -O3 -L/usr/local/src/hadoop-core-trunk/build/libhdfs
-lhdfs -L/usr/lib -lfuse -L/usr/java/jdk1.6.0_07/jre/lib/i386/server -ljvm
-o fuse_dfs  fuse_dfs.o
 [exec] /usr/bin/ld: cannot find -lhdfs
 [exec] collect2: ld returned 1 exit status
 [exec] make[1]: *** [fuse_dfs] Error 1
 [exec] make[1]: Leaving directory
`/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/src'
 [exec] make: *** [all-recursive] Error 1

BUILD FAILED
/usr/local/src/hadoop-core-trunk/build.xml:413: The following error occurred
while executing this line:
/usr/local/src/hadoop-core-trunk/src/contrib/build.xml:30: The following
error occurred while executing this line:
/usr/local/src/hadoop-core-trunk/src/contrib/fuse-dfs/build.xml:40: exec
returned: 2


Could somebody shed some light on this?


thanks,

Sebastian.


RE: How to compile fuse-dfs

2008-03-11 Thread xavier.quintuna
Yes, all those commands works fine but I need to copy a file to the hdfs


Thanks again.

Xavier

-Original Message-
From: Pete Wyckoff [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 11, 2008 11:26 AM
To: core-user@hadoop.apache.org
Subject: Re: How to compile fuse-dfs


But, to be clear, you can do  mv, rm, mkdir, rmdir.


On 3/11/08 10:24 AM, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:

> Thanks Pete. I'll be waiting for 0.17 then



Re: How to compile fuse-dfs

2008-03-11 Thread Pete Wyckoff

But, to be clear, you can do  mv, rm, mkdir, rmdir.


On 3/11/08 10:24 AM, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:

> Thanks Pete. I'll be waiting for 0.17 then



RE: How to compile fuse-dfs

2008-03-11 Thread xavier.quintuna

Thanks Pete. I'll be waiting for 0.17 then

Xavier
 

-Original Message-
From: Pete Wyckoff [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 11, 2008 11:16 AM
To: QUINTUNA Xavier RD-ILAB-SSF; core-user@hadoop.apache.org
Subject: Re: How to compile fuse-dfs


Oh sorry xavier - you can't write to DFS - although you shouldn't be
getting an exception. It should return an IO error but create an empty
file.

Fuse_dfs relies on appends working in DFS and since this didn't make it
into 16, we'll have to wait for 0.17 for this to work.

I will look at this error though.

-- pete


On 3/11/08 10:03 AM, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:

> dfs,138549488,'FLV',4096) Exception in thread "Thread-7"
> java.nio.BufferOverflowException



Re: How to compile fuse-dfs

2008-03-11 Thread Pete Wyckoff

Oh sorry xavier - you can't write to DFS - although you shouldn't be getting
an exception. It should return an IO error but create an empty file.

Fuse_dfs relies on appends working in DFS and since this didn't make it into
16, we'll have to wait for 0.17 for this to work.

I will look at this error though.

-- pete


On 3/11/08 10:03 AM, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:

> dfs,138549488,'FLV',4096) Exception in thread "Thread-7"
> java.nio.BufferOverflowException



RE: How to compile fuse-dfs

2008-03-11 Thread xavier.quintuna
Hi Pete, 

I was able to compile the fuse_dfs.c. Thanks. But now I have another
question for you.
I'm able to read a file but I'm not able to copy a file to the hdfs. I
wonder if you solve this problem? And How?
In my logs I have this message

hdfsWrite(dfs,138549488,'FLV',4096) Exception in thread "Thread-7"
java.nio.BufferOverflowException at
java.nio.Buffer.nextPutIndex(Buffer.java:425) at
java.nio.HeapByteBuffer.putInt(HeapByteBuffer.java:347) at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$Packet.writeInt(DFSClien
t.java:1537) at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.jav
a:2128) at
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.ja
va:141) at
org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutput
Stream.java:41) at
java.io.DataOutputStream.write(DataOutputStream.java:90) at
java.io.FilterOutputStream.write(FilterOutputStream.java:80) Call to
org.apache.hadoop.fs.FSDataOutputStream::write failed!
./fuse_dfs[14561]: ERROR: fuse problem - could not write all the bytes
for /user/xavier/movie/le_plat.flv -1!=4096fuse_dfs.c:702 08/03/11
10:57:50 WARN fs.DFSClient: DataStreamer Exception: java.io.IOException:
BlockSize 0 is smaller than data size.  Offset of packet in block 0
Aborting file /user/xavier/movie/le_plat.flv 
---

I really appreciate your help

Xavier


-Original Message-
From: Pete Wyckoff [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 10, 2008 7:43 PM
To: core-user@hadoop.apache.org
Subject: Re: How to compile fuse-dfs


Hi Xavier,

If you run ./bootsrap.sh does it not create a Makefile for you?  There
is a bug in the Makefile that hardcodes it to amd64. I will look at
this.

What kernel are you using and what HW?

--pete


On 3/10/08 2:23 PM, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:

> Hi everybody,
> 
> I'm trying to compile fuse-dfs but I have problems. I don't have a lot

> of experience with C++.
> I would like to know:
> Is it a clear readme file with the instructions to compile, install 
> fuse-dfs?
> Do I need to replace  fuse_dfs.c with the one in 
> fuse-dfs/src/fuse_dfs.c?
> Do I need to set up different flag if I'm using a i386 or 86 machine?
> Which one and Where?
> Which make file do I need to use to compile the code?
> 
> 
> 
> Thanks
> 
> Xavier
> 
> 
> 



Re: How to compile fuse-dfs

2008-03-10 Thread Pete Wyckoff

Hi Xavier,

If you run ./bootsrap.sh does it not create a Makefile for you?  There is a
bug in the Makefile that hardcodes it to amd64. I will look at this.

What kernel are you using and what HW?

--pete


On 3/10/08 2:23 PM, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:

> Hi everybody,
> 
> I'm trying to compile fuse-dfs but I have problems. I don't have a lot
> of experience with C++.
> I would like to know:
> Is it a clear readme file with the instructions to compile, install
> fuse-dfs?
> Do I need to replace  fuse_dfs.c with the one in
> fuse-dfs/src/fuse_dfs.c?
> Do I need to set up different flag if I'm using a i386 or 86 machine?
> Which one and Where?
> Which make file do I need to use to compile the code?
> 
> 
> 
> Thanks 
> 
> Xavier
> 
> 
> 



How to compile fuse-dfs

2008-03-10 Thread xavier.quintuna
Hi everybody,

I'm trying to compile fuse-dfs but I have problems. I don't have a lot
of experience with C++.
I would like to know:
Is it a clear readme file with the instructions to compile, install
fuse-dfs?
Do I need to replace  fuse_dfs.c with the one in
fuse-dfs/src/fuse_dfs.c?
Do I need to set up different flag if I'm using a i386 or 86 machine?
Which one and Where?
Which make file do I need to use to compile the code?



Thanks 

Xavier