guilload opened a new pull request #1104:
URL: https://github.com/apache/iceberg/pull/1104
Hello,
This is another attempt at implementing a MR v1 input format (mapred) for
Iceberg. For context, when I started working on this PR, #933 had been inactive
for about a month. There's been new activity since then, but since I'm finished
I thought I'd still push this branch to offer an alternative.
In this PR, I've tried to address the main concerns raised in #933, mostly
about reusing the input format, record reader, and split classes implemented
for MR v2.
I've also modified the input format test suite to be able to run against
both MR input formats.
A Hive input format can easily be built on this of this MR v1 input format.
The `IcebergSplit` class can be wrapped into a `FileSplit` if necessary (see a
comment from @massdosage on #933). I believe the nice test suite from #933
could also be reused:
```java
public class IcebergWritable extends Container<Record> {}
public class IcebergInputFormat extends
MapredIcebergInputFormat<IcebergWritable, Record> implements
CombineHiveInputFormat.AvoidSplitCombination {
@Override
public boolean shouldSkipCombine(Path path, Configuration conf) {
return true;
}
}
```
The PR is split into multiple commits:
Refactor:
bd4c090: Move ConfigBuilder and InMemoryDataModel out of IcebergInputFormat
ae01439: Move IcebergRecordReader and IcebergSplit out of IcebergInputFormat
4805ead: Refactor TestIcebergInputFormat, mostly factoring out duplicate code
Rename:
547ebad: Rename TestIcebergInputFormat class: `TestIcebergInputFormat` ->
`TestIcebergInputFormatS`
Feature:
4253ef9: Implement MR v1 (mapred) input format, wrapping the v2 classes and
introducing a `Container` class to deal with the cumbersome MR v1 API
(`createValue`)
@rdblue @rdsr
@cmathiesen @massdosage
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]