[
https://issues.apache.org/jira/browse/CRUNCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076640#comment-14076640
]
Josh Wills commented on CRUNCH-450:
-----------------------------------
Wow- that is a phenomenal amount of work- thanks for sending it along! A couple
of high-level questions:
1) What does OrcTypeFamily buy me? We've flirted with expanding the set of
TypeFamilies from Avro and Writable in the past, but have always been cautious
about actually doing it b/c the two-typefamily assumption is baked into so many
things in the system. If everything in Orc is compiled down to a type of
Writable, would it still work as a collection of derived PTypes on top of the
WritableTypeFamily?
2) We also try to avoid large and complex external dependencies in
crunch-core-- could we move this into a new submodule, crunch-hive, which would
contain all of our Hive dependency stuff? I think there's more of it that we
want to include (e.g., CRUNCH-340) and a few other things I wouldn't mind
having down the line, but I don't want to introduce the dependency complexity
for pipelines that don't actually make use of Hive stuff.
> Adding ORC file format support in Crunch
> ----------------------------------------
>
> Key: CRUNCH-450
> URL: https://issues.apache.org/jira/browse/CRUNCH-450
> Project: Crunch
> Issue Type: New Feature
> Components: Core, IO
> Reporter: Wang Zhong
> Assignee: Josh Wills
> Attachments: CRUNCH-450.patch
>
>
> This JIRA adds ORC file format support in Crunch by:
> --
> 1. Adding input source and output target for ORC
> 2. Adding a new type family - OrcTypeFamily to serialize / deserialize
> objects into OrcStruct
> 3. Supporting column pruning optimization
--
This message was sent by Atlassian JIRA
(v6.2#6252)