Re: Hadoop+s3 fuse-dfs

2009-01-29 Thread Brian Bockelman

Hey all,

This is a long-shot, but I've noticed before that libhdfs doesn't load  
hadoop-site.xml *unless* hadoop-site.xml is in your local directory.


As a last try, maybe cd $HADOOP_HOME/conf and try running it from there?

Brian

On Jan 28, 2009, at 7:20 PM, Craig Macdonald wrote:


Hi Roopa,

Glad it worked :-)

Please file JIRA issues against the fuse-dfs / libhdfs components  
that would have made it easier to mount the S3 filesystem.


Craig

Roopa Sudheendra wrote:
Thanks, Yes a setup with fuse-dfs and hdfs works fine.I think the  
mount point was bad for whatever reason and was failing with that  
error .I created another mount point for mounting which resolved   
the transport end point error.


Also i had -d option on my command..:)


Roopa


On Jan 28, 2009, at 6:35 PM, Craig Macdonald wrote:


Hi Roopa,

Firstly, can you get the fuse-dfs working for an instance HDFS?
There is also a debug mode for fuse: enable this by adding -d on  
the command line.


C

Roopa Sudheendra wrote:

Hey Craig,
I tried the way u suggested..but i get this transport endpoint  
not connected. Can i see the logs anywhere? I dont see anything  
in /var/log/messages either
looks like it tries to create the file system in hdfs.c but not  
sure where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting,  
but not a scheme (hdfs etc). The easiest option would be to set  
the S3 as your default file system in your hadoop-site.xml, then  
use the host of default. That should get libhdfs to use the S3  
file system. i.e. set fuse-dfs to mount dfs://default:0/ and all  
should work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem  
for the df command. This would fail in your case. This issue is  
currently being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like  
anything other than dfs://  so with the fact that hadoop can  
connect to S3 file system ..allowing s3 scheme should solve my  
problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based  
on a C interface called libhdfs which allows C programs (such  
as fuse-dfs) to connect to the Hadoop file system Java API.  
This being the case, fuse-dfs should (theoretically) be able  
to connect to any file system that Hadoop can. Your mileage  
may vary, but if you find issues, please do report them  
through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem  
as one of our backup storage solution. Just the hadoop and  
s3(block based since it overcomes the 5gb limit) so far seems  
to be fine.
My problem is that i want to mount this filesystem using fuse- 
dfs ( since i don't have to worry about how the file is  
written on the system ) . Since the namenode does not get  
started with s3 backed hadoop system how can i connect fuse- 
dfs to this setup.


Appreciate your help.
Thanks,
Roopa
















Hadoop+s3 fuse-dfs

2009-01-28 Thread Roopa Sudheendra
I am experimenting with Hadoop backed by Amazon s3 filesystem as one  
of our backup storage solution. Just the hadoop and s3(block based  
since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs  
( since i don't have to worry about how the file is written on the  
system ) . Since the namenode does not get started with s3 backed  
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa


Re: Hadoop+s3 fuse-dfs

2009-01-28 Thread Craig Macdonald

Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on a C 
interface called libhdfs which allows C programs (such as fuse-dfs) to 
connect to the Hadoop file system Java API. This being the case, 
fuse-dfs should (theoretically) be able to connect to any file system 
that Hadoop can. Your mileage may vary, but if you find issues, please 
do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as one 
of our backup storage solution. Just the hadoop and s3(block based 
since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs ( 
since i don't have to worry about how the file is written on the 
system ) . Since the namenode does not get started with s3 backed 
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa




Re: Hadoop+s3 fuse-dfs

2009-01-28 Thread Roopa Sudheendra

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like anything  
other than dfs://  so with the fact that hadoop can connect to S3  
file system ..allowing s3 scheme should solve my problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on a  
C interface called libhdfs which allows C programs (such as fuse- 
dfs) to connect to the Hadoop file system Java API. This being the  
case, fuse-dfs should (theoretically) be able to connect to any file  
system that Hadoop can. Your mileage may vary, but if you find  
issues, please do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as  
one of our backup storage solution. Just the hadoop and s3(block  
based since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs  
( since i don't have to worry about how the file is written on the  
system ) . Since the namenode does not get started with s3 backed  
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa






Re: Hadoop+s3 fuse-dfs

2009-01-28 Thread Craig Macdonald

In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting, but 
not a scheme (hdfs etc). The easiest option would be to set the S3 as 
your default file system in your hadoop-site.xml, then use the host of 
default. That should get libhdfs to use the S3 file system. i.e. set 
fuse-dfs to mount dfs://default:0/ and all should work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for the 
df command. This would fail in your case. This issue is currently being 
worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like anything 
other than dfs://  so with the fact that hadoop can connect to S3 
file system ..allowing s3 scheme should solve my problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on a C 
interface called libhdfs which allows C programs (such as fuse-dfs) 
to connect to the Hadoop file system Java API. This being the case, 
fuse-dfs should (theoretically) be able to connect to any file system 
that Hadoop can. Your mileage may vary, but if you find issues, 
please do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as one 
of our backup storage solution. Just the hadoop and s3(block based 
since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs ( 
since i don't have to worry about how the file is written on the 
system ) . Since the namenode does not get started with s3 backed 
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa








Re: Hadoop+s3 fuse-dfs

2009-01-28 Thread Roopa Sudheendra

Hey Craig,
 I tried the way u suggested..but i get this transport endpoint not  
connected. Can i see the logs anywhere? I dont see anything in /var/ 
log/messages either
 looks like it tries to create the file system in hdfs.c but not sure  
where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting, but  
not a scheme (hdfs etc). The easiest option would be to set the S3  
as your default file system in your hadoop-site.xml, then use the  
host of default. That should get libhdfs to use the S3 file  
system. i.e. set fuse-dfs to mount dfs://default:0/ and all should  
work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for  
the df command. This would fail in your case. This issue is  
currently being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like  
anything other than dfs://  so with the fact that hadoop can  
connect to S3 file system ..allowing s3 scheme should solve my  
problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on  
a C interface called libhdfs which allows C programs (such as fuse- 
dfs) to connect to the Hadoop file system Java API. This being the  
case, fuse-dfs should (theoretically) be able to connect to any  
file system that Hadoop can. Your mileage may vary, but if you  
find issues, please do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as  
one of our backup storage solution. Just the hadoop and s3(block  
based since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs  
( since i don't have to worry about how the file is written on  
the system ) . Since the namenode does not get started with s3  
backed hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa










Re: Hadoop+s3 fuse-dfs

2009-01-28 Thread Craig Macdonald

Hi Roopa,

Firstly, can you get the fuse-dfs working for an instance HDFS?
There is also a debug mode for fuse: enable this by adding -d on the 
command line.


C

Roopa Sudheendra wrote:

Hey Craig,
 I tried the way u suggested..but i get this transport endpoint not 
connected. Can i see the logs anywhere? I dont see anything in 
/var/log/messages either
 looks like it tries to create the file system in hdfs.c but not sure 
where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting, but 
not a scheme (hdfs etc). The easiest option would be to set the S3 as 
your default file system in your hadoop-site.xml, then use the host 
of default. That should get libhdfs to use the S3 file system. i.e. 
set fuse-dfs to mount dfs://default:0/ and all should work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for 
the df command. This would fail in your case. This issue is currently 
being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like anything 
other than dfs://  so with the fact that hadoop can connect to S3 
file system ..allowing s3 scheme should solve my problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on a 
C interface called libhdfs which allows C programs (such as 
fuse-dfs) to connect to the Hadoop file system Java API. This being 
the case, fuse-dfs should (theoretically) be able to connect to any 
file system that Hadoop can. Your mileage may vary, but if you find 
issues, please do report them through the normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as 
one of our backup storage solution. Just the hadoop and s3(block 
based since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs 
( since i don't have to worry about how the file is written on the 
system ) . Since the namenode does not get started with s3 backed 
hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa












Re: Hadoop+s3 fuse-dfs

2009-01-28 Thread Roopa Sudheendra
Thanks, Yes a setup with fuse-dfs and hdfs works fine.I think the  
mount point was bad for whatever reason and was failing with that  
error .I created another mount point for mounting which resolved  the  
transport end point error.


Also i had -d option on my command..:)


Roopa


On Jan 28, 2009, at 6:35 PM, Craig Macdonald wrote:


Hi Roopa,

Firstly, can you get the fuse-dfs working for an instance HDFS?
There is also a debug mode for fuse: enable this by adding -d on the  
command line.


C

Roopa Sudheendra wrote:

Hey Craig,
I tried the way u suggested..but i get this transport endpoint not  
connected. Can i see the logs anywhere? I dont see anything in /var/ 
log/messages either
looks like it tries to create the file system in hdfs.c but not  
sure where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting,  
but not a scheme (hdfs etc). The easiest option would be to set  
the S3 as your default file system in your hadoop-site.xml, then  
use the host of default. That should get libhdfs to use the S3  
file system. i.e. set fuse-dfs to mount dfs://default:0/ and all  
should work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for  
the df command. This would fail in your case. This issue is  
currently being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like  
anything other than dfs://  so with the fact that hadoop can  
connect to S3 file system ..allowing s3 scheme should solve my  
problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based  
on a C interface called libhdfs which allows C programs (such as  
fuse-dfs) to connect to the Hadoop file system Java API. This  
being the case, fuse-dfs should (theoretically) be able to  
connect to any file system that Hadoop can. Your mileage may  
vary, but if you find issues, please do report them through the  
normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem  
as one of our backup storage solution. Just the hadoop and  
s3(block based since it overcomes the 5gb limit) so far seems  
to be fine.
My problem is that i want to mount this filesystem using fuse- 
dfs ( since i don't have to worry about how the file is written  
on the system ) . Since the namenode does not get started with  
s3 backed hadoop system how can i connect fuse-dfs to this setup.


Appreciate your help.
Thanks,
Roopa














Re: Hadoop+s3 fuse-dfs

2009-01-28 Thread Craig Macdonald

Hi Roopa,

Glad it worked :-)

Please file JIRA issues against the fuse-dfs / libhdfs components that 
would have made it easier to mount the S3 filesystem.


Craig

Roopa Sudheendra wrote:
Thanks, Yes a setup with fuse-dfs and hdfs works fine.I think the 
mount point was bad for whatever reason and was failing with that 
error .I created another mount point for mounting which resolved  the 
transport end point error.


Also i had -d option on my command..:)


Roopa


On Jan 28, 2009, at 6:35 PM, Craig Macdonald wrote:


Hi Roopa,

Firstly, can you get the fuse-dfs working for an instance HDFS?
There is also a debug mode for fuse: enable this by adding -d on the 
command line.


C

Roopa Sudheendra wrote:

Hey Craig,
I tried the way u suggested..but i get this transport endpoint not 
connected. Can i see the logs anywhere? I dont see anything in 
/var/log/messages either
looks like it tries to create the file system in hdfs.c but not sure 
where it fails.

I have the hadoop home set so i believe it gets the config info.

any idea?

Thanks,
Roopa
On Jan 28, 2009, at 1:59 PM, Craig Macdonald wrote:


In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:

* libhdfs takes a host and port number as input when connecting, 
but not a scheme (hdfs etc). The easiest option would be to set the 
S3 as your default file system in your hadoop-site.xml, then use 
the host of default. That should get libhdfs to use the S3 file 
system. i.e. set fuse-dfs to mount dfs://default:0/ and all should 
work as planned.


* libhdfs also casts the FileSystem to a DistributedFileSystem for 
the df command. This would fail in your case. This issue is 
currently being worked on - see HADOOP-4368

https://issues.apache.org/jira/browse/HADOOP-4368.

C


Roopa Sudheendra wrote:

Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like 
anything other than dfs://  so with the fact that hadoop can 
connect to S3 file system ..allowing s3 scheme should solve my 
problem?


Roopa

On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:


Hi Roopa,

I cant comment on the S3 specifics. However, fuse-dfs is based on 
a C interface called libhdfs which allows C programs (such as 
fuse-dfs) to connect to the Hadoop file system Java API. This 
being the case, fuse-dfs should (theoretically) be able to 
connect to any file system that Hadoop can. Your mileage may 
vary, but if you find issues, please do report them through the 
normal channels.


Craig


Roopa Sudheendra wrote:
I am experimenting with Hadoop backed by Amazon s3 filesystem as 
one of our backup storage solution. Just the hadoop and s3(block 
based since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using 
fuse-dfs ( since i don't have to worry about how the file is 
written on the system ) . Since the namenode does not get 
started with s3 backed hadoop system how can i connect fuse-dfs 
to this setup.


Appreciate your help.
Thanks,
Roopa