Re: Python access to HDFS

2008-02-21 Thread Steve Sapovits

Roddy Lindsay wrote:


I do it the old fashioned way:

(w, r) = os.popen2("%s/bin/hadoop dfs -cat %s" % (hadoop_home.rstrip('/'), 
filename))


I considered this but ultimately it probably won't scale for our data volume.

I'll probably continiue building on the SWIG base since that's working pretty 
well
so far ... there's just the SWIG learning curve for complicated interface 
mappings.

--
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]



RE: Python access to HDFS

2008-02-21 Thread dhruba Borthakur
Hi Pete,

If you are referring to the ability to re-open a file and append to it,
then this feature is not in 0.16. Please see:
http://issues.apache.org/jira/browse/HADOOP-1700

Thanks,
dhruba

-Original Message-
From: Pete Wyckoff [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 21, 2008 4:09 PM
To: core-user@hadoop.apache.org
Subject: Re: Python access to HDFS


We're profiling and tuning read performance for fuse dfs and have writes
implemented, but I haven 't been able to test it even as I haven't tried
0.16 yet - It requires the ability to create the file, close it and then
re-open it to start writing - which can't be done till 16.


--pete



On 2/21/08 3:50 PM, "Steve Sapovits" <[EMAIL PROTECTED]> wrote:

> Jeff Hammerbacher wrote:
> 
>> maybe the dfs could expose a thrift interface in future releases?
> 
> ThruDB exposes Lucene via Thrift but not the underlying HDFS.   I just
> need HDFS access in Python for now.
> 
>> you could also use the FUSE module to mount the dfs and just write to
it
>> like any other filesystem...
> 
> Good point.  I'll check that avenue.  Would FUSE add much overhead for
> writing lots of data?   I see a Python binding for it.



RE: Python access to HDFS

2008-02-21 Thread Roddy Lindsay
I do it the old fashioned way:

(w, r) = os.popen2("%s/bin/hadoop dfs -cat %s" % (hadoop_home.rstrip('/'), 
filename))



-Original Message-
From: Pete Wyckoff [mailto:[EMAIL PROTECTED]
Sent: Thu 2/21/2008 4:08 PM
To: core-user@hadoop.apache.org
Subject: Re: Python access to HDFS
 

We're profiling and tuning read performance for fuse dfs and have writes
implemented, but I haven 't been able to test it even as I haven't tried
0.16 yet - It requires the ability to create the file, close it and then
re-open it to start writing - which can't be done till 16.


--pete



On 2/21/08 3:50 PM, "Steve Sapovits" <[EMAIL PROTECTED]> wrote:

> Jeff Hammerbacher wrote:
> 
>> maybe the dfs could expose a thrift interface in future releases?
> 
> ThruDB exposes Lucene via Thrift but not the underlying HDFS.   I just
> need HDFS access in Python for now.
> 
>> you could also use the FUSE module to mount the dfs and just write to it
>> like any other filesystem...
> 
> Good point.  I'll check that avenue.  Would FUSE add much overhead for
> writing lots of data?   I see a Python binding for it.




Re: Python access to HDFS

2008-02-21 Thread Pete Wyckoff

We're profiling and tuning read performance for fuse dfs and have writes
implemented, but I haven 't been able to test it even as I haven't tried
0.16 yet - It requires the ability to create the file, close it and then
re-open it to start writing - which can't be done till 16.


--pete



On 2/21/08 3:50 PM, "Steve Sapovits" <[EMAIL PROTECTED]> wrote:

> Jeff Hammerbacher wrote:
> 
>> maybe the dfs could expose a thrift interface in future releases?
> 
> ThruDB exposes Lucene via Thrift but not the underlying HDFS.   I just
> need HDFS access in Python for now.
> 
>> you could also use the FUSE module to mount the dfs and just write to it
>> like any other filesystem...
> 
> Good point.  I'll check that avenue.  Would FUSE add much overhead for
> writing lots of data?   I see a Python binding for it.



Re: Python access to HDFS

2008-02-21 Thread Steve Sapovits

Jeff Hammerbacher wrote:


maybe the dfs could expose a thrift interface in future releases?


ThruDB exposes Lucene via Thrift but not the underlying HDFS.   I just
need HDFS access in Python for now.


you could also use the FUSE module to mount the dfs and just write to it
like any other filesystem...


Good point.  I'll check that avenue.  Would FUSE add much overhead for
writing lots of data?   I see a Python binding for it.

--
Steve Sapovits
Invite Media  -  http://www.invitemedia.com
[EMAIL PROTECTED]


Re: Python access to HDFS

2008-02-21 Thread Jeff Hammerbacher
maybe the dfs could expose a thrift interface in future releases?

you could also use the FUSE module to mount the dfs and just write to it
like any other filesystem...

On Thu, Feb 21, 2008 at 1:23 PM, Steve Sapovits <[EMAIL PROTECTED]>
wrote:

>
> Are there any existing HDFS access packages out there for Python?
>
> I've had some success using SWIG and the C HDFS code, as documented
> here:
>
> 
> http://www.stat.purdue.edu/~sguha/code.html
>
> (halfway down the page) but it's slow adding support for some of the more
> complex functions.  If there's anything out there I missed, I'd like to
> hear
> about it.
>
> --
> Steve Sapovits
> Invite Media  -  http://www.invitemedia.com
> [EMAIL PROTECTED]
>
>
>