Hi, I originally posted this on the dumbo forum, but it's more a general scripting hadoop issue.
When testing a simple script that created some local files and then copied them to hdfs with os.system("hadoop dfs -put /home/havard/bio_sci/file.json /tmp/bio_sci/file.json") the tasks fail with out of heap memory. The files are tiny, and I have tried increasing the heap size. When skipping the hadoop dfs -put, the tasks do not fail. Is it wrong to use hadoop dfs -put inside running a script with hadoop? Should I instead transfer the files at the end with a combiner, or simply mount hdfs locally and write directly to hdfs? Any general suggestions? -- Håvard Wahl Kongsgård NTNU http://havard.security-review.net/