Hadoop scripting when to use dfs -put

Håvard Wahl Kongsgård Mon, 13 Feb 2012 10:39:33 -0800

Hi, I originally posted this on the dumbo forum, but it's more a
general scripting hadoop issue.


When testing a simple script that created some local files
and then copied them to hdfs
with os.system("hadoop dfs -put /home/havard/bio_sci/file.json
/tmp/bio_sci/file.json")

the tasks fail with out of heap memory. The files are tiny, and I have
tried increasing the
heap size. When skipping the hadoop dfs -put, the tasks do not fail.

Is it wrong to use hadoop dfs -put inside running a script with
hadoop? Should I instead
transfer the files at the end with a combiner, or simply mount hdfs
locally and write directly to hdfs? Any general suggestions?


-- 
Håvard Wahl Kongsgård
NTNU

http://havard.security-review.net/

Hadoop scripting when to use dfs -put

Reply via email to