It might sound like a deprecated way but can't you move the data physically? >From what I understand, it is one shot and not "streaming" so it could be a good method if you the access of course.
Regards Bertrand On Tue, Oct 30, 2012 at 11:07 AM, sumit ghosh <sumi...@yahoo.com> wrote: > Hi, > > I have a data on remote machine accessible over ssh. I have Hadoop CDH4 > installed on RHEL. I am planning to load quite a few Petabytes of Data onto > HDFS. > > Which will be the fastest method to use and are there any projects around > Hadoop which can be used as well? > > > I cannot install Hadoop-Client on the remote machine. > > Have a great Day Ahead! > Sumit. > > > --------------- > Here I am attaching my previous discussion on CDH-user to avoid > duplication. > --------------- > On Wed, Oct 24, 2012 at 9:29 PM, Alejandro Abdelnur <t...@cloudera.com> > wrote: > in addition to jarcec's suggestions, you could use httpfs. then you'd only > need to poke a single host:port in your firewall as all the traffic goes > thru it. > thx > Alejandro > > On Oct 24, 2012, at 8:28 AM, Jarek Jarcec Cecho <jar...@cloudera.com> > wrote: > > Hi Sumit, > > there is plenty of ways how to achieve that. Please find my feedback > below: > > > >> Does Sqoop support loading flat files to HDFS? > > > > No, sqoop is supporting only data move from external database and > warehouse systems. Copying files is not supported at the moment. > > > >> Can use distcp? > > > > No. Distcp can be used only to copy data between HDFS filesystesm. > > > >> How do we use the core-site.xml file on the remote machine to use > >> copyFromLocal? > > > > Yes you can install hadoop binaries on your machine (with no hadoop > running services) and use hadoop binary to upload data. Installation > procedure is described in CDH4 installation guide [1] (follow "client" > installation). > > > > Another way that I can think of is leveraging WebHDFS [2] or maybe > hdfs-fuse [3]? > > > > Jarcec > > > > Links: > > 1: https://ccp.cloudera.com/display/CDH4DOC/CDH4+Installation > > 2: > https://ccp.cloudera.com/display/CDH4DOC/Deploying+HDFS+on+a+Cluster#DeployingHDFSonaCluster-EnablingWebHDFS > > 3: https://ccp.cloudera.com/display/CDH4DOC/Mountable+HDFS > > > > On Wed, Oct 24, 2012 at 01:33:29AM -0700, Sumit Ghosh wrote: > >> > >> > >> Hi, > >> > >> I have a data on remote machine accessible over ssh. What is the fastest > >> way to load data onto HDFS? > >> > >> Does Sqoop support loading flat files to HDFS? > >> Can use distcp? > >> How do we use the core-site.xml file on the remote machine to use > >> copyFromLocal? > >> > >> Which will be the best to use and are there any other open source > projects > >> around Hadoop which can be used as well? > >> Have a great Day Ahead! > >> Sumit -- Bertrand Dechoux