JIRA is created, and a patch is attached: https://issues.apache.org/jira/browse/HBASE-18885
Please review and merge, we need this in the future version. Thanks. 2017-09-26 19:00 GMT+08:00 ShaoFeng Shi <[email protected]>: > Here is the pull request: > > https://github.com/apache/hbase/pull/60 > > 2017-09-26 17:16 GMT+08:00 ShaoFeng Shi <[email protected]>: > >> Hello gentlemen, >> >> This is Shaofeng Shi from Apache Kylin community, we use HBase as the >> storage engine, and we use MR job to generate HFile before bulk load. We >> received user reporting that, if configured to use S3 as the output >> location for HFile, the files were generated in "_temporary" folder and >> won't be committed to the target path. This caused no data be loaded >> finally. And we can reproduce this problem easily. The original reporting >> is in [1]. >> >> Kylin uses HBase's HFileOutputFormat2.java to configure the MR job. After >> some investigation, I found this class always uses the default >> "FileOutputCommitter", see [2], regardless of the job's configuration; so >> it always writing to "_temporary" folder. Since AWS EMR configured to use >> DirectOutputCommitter for S3, then this problem occurs: Hadoop expects to >> see the file directly under output path, while the RecordWriter generates >> them in "_temporary" folder. >> >> Did you get such reporting before? I had a temporary fix in my fork now. >> Just wondering how you think about it; if oaky I would report a JIRA. >> Thanks! >> >> [1] https://issues.apache.org/jira/browse/KYLIN-2788 >> [2] https://github.com/apache/hbase/blob/master/hbase-mapreduce/ >> src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOu >> tputFormat2.java#L193 >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > > -- Best regards, Shaofeng Shi 史少锋
