You could attach the hadoop dfs command per bootstrap.
http://stackoverflow.com/questions/12055595/emr-how-to-join-files-into-one
http://stackoverflow.com/questions/12055595/emr-how-to-join-files-into-one
BR,
Alex
On 23 Feb 2015, at 08:10, Jonathan Aquilina jaquil...@eagleeyet.net wrote:
Thanks Alex. where would that command be placed in a mapper or reducer or run
as a command. Here at work we are looking to use Amazon EMR to do our number
crunching and we have access to the master node, but not really the rest of
the cluster. Can this be added as a step to be run after initial processing?
---
Regards,
Jonathan Aquilina
Founder Eagle Eye T
On 2015-02-23 08:05, Alexander Alten-Lorenz wrote:
Hi,
You can use an single reducer
(http://wiki.apache.org/hadoop/HowManyMapsAndReduces
http://wiki.apache.org/hadoop/HowManyMapsAndReduces) for smaller datasets,
or ‚getmerge': hadoop dfs -getmerge /hdfs/path local_file_name
BR,
Alex
On 23 Feb 2015, at 08:00, Jonathan Aquilina jaquil...@eagleeyet.net
mailto:jaquil...@eagleeyet.net wrote:
Hey all,
I understand that the purpose of splitting files is to distribute the data
to multiple core and task nodes in a cluster. My question is that after the
output is complete is there a way one can combine all the parts into a
single file?
--
Regards,
Jonathan Aquilina
Founder Eagle Eye T