I just looked at the javadocs, but it is unclear to me what the difference 
between a TFile and Sequence File?  It also looks like you need to load append 
the data in a similar way as with normal sequence files.

On Mar 12, 2010, at 2:15 PM, Hong Tang wrote:

> Have you looked at TFile?
> 
> On Mar 12, 2010, at 5:22 AM, Scott Whitecross wrote:
> 
>> Hi -
>> 
>> I'd like to create a job that pulls small files from a remote server  
>> (using FTP, SCP, etc.) and stores them directly to sequence files on  
>> HDFS.  Looking at the sequence file APi, I don't see an obvious way  
>> to do this.  It looks like what I have to do is pull the remote file  
>> to disk, then read the file into memory to place in the sequence  
>> file.  Is there a better way?
>> 
>> Looking at the API, am I forced to use the append method?
>> 
>>           FileSystem hdfs =  
>> FileSystem.get(context.getConfiguration());
>>           FSDataOutputStream outputStream = hdfs.create(new  
>> Path(outputPath));
>>           writer =  
>> SequenceFile.createWriter(context.getConfiguration(), outputStream,  
>> Text.class, BytesWritable.class, null, null);
>>              
>>         // read in file to remotefilebytes
>> 
>>           writer.append(filekey, remotefilebytes);
>> 
>> 
>> The alternative would be to have one job pull the remote files, and  
>> a secondary job write them into sequence files.
>> 
>> I'm using the latest Cloudera release, which I believe is Hadoop 20.1
>> 
>> Thanks.
>> 
>> 
>> 
>> 
> 

Reply via email to