Here is the pull request:

https://github.com/apache/hbase/pull/60

2017-09-26 17:16 GMT+08:00 ShaoFeng Shi <[email protected]>:

> Hello gentlemen,
>
> This is Shaofeng Shi from Apache Kylin community, we use HBase as the
> storage engine, and we use MR job to generate HFile before bulk load. We
> received user reporting that, if configured to use S3 as the output
> location for HFile, the files were generated in "_temporary" folder and
> won't be committed to the target path. This caused no data be loaded
> finally. And we can reproduce this problem easily. The original reporting
> is in [1].
>
> Kylin uses HBase's HFileOutputFormat2.java to configure the MR job. After
> some investigation, I found this class always uses the default
> "FileOutputCommitter", see [2], regardless of the job's configuration; so
> it always writing to "_temporary" folder. Since AWS EMR configured to use
> DirectOutputCommitter for S3, then this problem occurs: Hadoop expects to
> see the file directly under output path, while the RecordWriter generates
> them in "_temporary" folder.
>
> Did you get such reporting before? I had a temporary fix in my fork now.
> Just wondering how you think about it; if oaky I would report a JIRA.
> Thanks!
>
> [1] https://issues.apache.org/jira/browse/KYLIN-2788
> [2] https://github.com/apache/hbase/blob/master/hbase-
> mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/
> HFileOutputFormat2.java#L193
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to