Here is the pull request: https://github.com/apache/hbase/pull/60
2017-09-26 17:16 GMT+08:00 ShaoFeng Shi <[email protected]>: > Hello gentlemen, > > This is Shaofeng Shi from Apache Kylin community, we use HBase as the > storage engine, and we use MR job to generate HFile before bulk load. We > received user reporting that, if configured to use S3 as the output > location for HFile, the files were generated in "_temporary" folder and > won't be committed to the target path. This caused no data be loaded > finally. And we can reproduce this problem easily. The original reporting > is in [1]. > > Kylin uses HBase's HFileOutputFormat2.java to configure the MR job. After > some investigation, I found this class always uses the default > "FileOutputCommitter", see [2], regardless of the job's configuration; so > it always writing to "_temporary" folder. Since AWS EMR configured to use > DirectOutputCommitter for S3, then this problem occurs: Hadoop expects to > see the file directly under output path, while the RecordWriter generates > them in "_temporary" folder. > > Did you get such reporting before? I had a temporary fix in my fork now. > Just wondering how you think about it; if oaky I would report a JIRA. > Thanks! > > [1] https://issues.apache.org/jira/browse/KYLIN-2788 > [2] https://github.com/apache/hbase/blob/master/hbase- > mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/ > HFileOutputFormat2.java#L193 > > -- > Best regards, > > Shaofeng Shi 史少锋 > > -- Best regards, Shaofeng Shi 史少锋
