Small file problem and GenMRFileSink1

David Ginzburg Wed, 29 Jun 2011 08:34:16 -0700



Hi,
I'm not sure weather this belongs in the hive-dev or hive-user.
I have a folder with many small files.
I would like to reduce the number of files the way hive merges output .
I tried to understand from the source of 
org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1 how to leverage the API to 
submit a job 
that merges output files.
I think I was able to identify:  
  private void createMergeJob(FileSinkOperator fsOp, GenMRProcContext ctx, 
String finalName)
      throws SemanticException 
As the entry point to the logic that performs the operation, but I did not find 
documentation as to how to use it

Is there an example that simulates the use of this API call?

Small file problem and GenMRFileSink1

Reply via email to