HDFS - many files, small size

2014-10-02 Thread Roger Maillist
Hi there I got millions of rather small PDF-Files which I want to load into HDFS for later analysis. Also I need to re-encode them as base64-stream to get the MR-Job for parsing work. Is there any better/faster method of just calling the 'put' function in a huge (bash) loop? Maybe I could

Re: HDFS - many files, small size

2014-10-02 Thread Mirko Kämpf
Hi Roger, you can use Apache Flume to ingest this files into your cluster. Store it in an HBase table for fast random access and extract the metadata on the fly using morphlines (See: http://kitesdk.org/docs/0.11.0/kite-morphlines/index.html). Even then base64 conversion can be done on the fly if

TestDFSIO with FS other than defaultFS

2014-10-02 Thread Jeffrey Denton
Hello all, I'm trying to run TestDFSIO using a different file system other than the configured defaultFS and it doesn't work for me: $ hadoop org.apache.hadoop.fs.TestDFSIO -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1 -fileSize 10240 14/10/02 11:24:19 INFO fs.TestDFSIO:

Re: Hadoop shuffling traffic

2014-10-02 Thread Abdul Navaz
Hello Pramod, This is great work !. Thank you for sharing the report. Thanks Regards, Abdul Navaz Research Assistant University of Houston Main Campus, Houston TX Ph: 281-685-0388 From: Pramod Biligiri pramodbilig...@gmail.com Reply-To: user@hadoop.apache.org Date: Thursday, October 2,

Hadoop on Stand-alone PC

2014-10-02 Thread Roger Maillist
Hi For learning purposes, I am trying to set up my own hadoop/hdfs system at home. I am running openSuse 13 and Hadoop 2.5.1. I followed the explanations in the Singe Node Setup: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html My problem is, the data

Re: TestDFSIO with FS other than defaultFS

2014-10-02 Thread Jay Vyas
Hi jeff. Wrong fs means that your configuration doesn't know how to bind ofs to the OrangeFS file system class. You can debug the configuration using fs.dumpConfiguration(), and you will likely see references to hdfs in there. By the way, have you tried our bigtop hcfs tests yet? We now

Oozie workflow for Sqoop Incremental update

2014-10-02 Thread Preya Shah
Hi I am trying to get updated or newly added data from relational database using Sqoop. Sqoop command is working fine but when I try to execute it through the oozie workflow it does not work. It is giving me below error: Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Re: Hadoop on Stand-alone PC

2014-10-02 Thread Roger Maillist
In case anyone wants to know: There was trash in the /tmp dir. I stopped all nodes, formatted the HDFS and then re-started the nodes. That seems to have solved the problem. 2014-10-02 20:58 GMT+02:00 Roger Maillist darkchanterl...@gmail.com: Hi For learning purposes, I am trying to set up my

hdfs: a C API call to getFileSize() through libhdfs or libhdfs3?

2014-10-02 Thread Demai Ni
hi, folks, To get the size of a hdfs file, jave API has FileSystem#getFileStatus(PATH)#getLen(); now I am trying to use a C client to do the same thing. For a file on local file system, I can grab the info like this: fseeko(file, 0, SEEK_END); size = ftello(file); But I can't find the SEEK_END

Re: No space when running a hadoop job

2014-10-02 Thread Abdul Navaz
Hello, As you suggested I have changed the hdfs-site.xml file of datanodes and name node as below and formatted the name node. /property property namedfs.datanode.data.dir/name value/mnt/value descriptionComma separated list of paths. Use the list of directories from $DFS_DATA_DIR.

Block placement without rack aware

2014-10-02 Thread SF Hadoop
What is the block placement policy hadoop follows when rack aware is not enabled? Does it just round robin? Thanks.

RE: Block placement without rack aware

2014-10-02 Thread Liu, Yi A
It’s still random. If rack aware is not enabled, all nodes are in “default-rack”, and random nodes are chosen for block replications. Regards, Yi Liu From: SF Hadoop [mailto:sfhad...@gmail.com] Sent: Friday, October 03, 2014 7:12 AM To: user@hadoop.apache.org Subject: Block placement without

Re: Block placement without rack aware

2014-10-02 Thread Pradeep Gollakota
It appears to be randomly chosen. I just came across this blog post from Lars George about HBase file locality in HDFS http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html On Thu, Oct 2, 2014 at 4:12 PM, SF Hadoop sfhad...@gmail.com wrote: What is the block placement policy hadoop

unsubscribe

2014-10-02 Thread Igor Gatis

Re: unsubscribe

2014-10-02 Thread Ted Yu
Please see http://hadoop.apache.org/mailing_lists.html#User On Oct 2, 2014, at 7:37 PM, Igor Gatis igorga...@gmail.com wrote:

Re: Block placement without rack aware

2014-10-02 Thread SF Hadoop
Thanks for the info. Exactly what I needed. Cheers. On Thu, Oct 2, 2014 at 4:21 PM, Pradeep Gollakota pradeep...@gmail.com wrote: It appears to be randomly chosen. I just came across this blog post from Lars George about HBase file locality in HDFS