Beginners Hadoop question

2014-03-03 Thread goi cto
Hi, I am sorry for the beginners question but... I have a spark java code which reads a file (c:\my-input.csv) process it and writes an output file (my-output.csv) Now I want to run it on Hadoop in a distributed environment 1) My inlut file should be one big file or separate smaller files? 2) if

Re: Beginners Hadoop question

2014-03-03 Thread Alonso Isidoro Roman
Hi, i am a beginner too, but as i have learned, hadoop works better with big files, at least with 64MB, 128MB or even more. I think you need to aggregate all the files into a new big one. Then you must copy to HDFS using this command: hadoop fs -put MYFILE /YOUR_ROUTE_ON_HDFS/MYFILE hadoop just