[
https://issues.apache.org/jira/browse/CRUNCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076738#comment-14076738
]
Wang Zhong commented on CRUNCH-450:
-----------------------------------
Thanks for your review, Josh! For your questions:
--
1) I implemented OrcTypeFamily because the low-level file layout of ORC is
distinguishable enough to have its own type family. OrcStruct is also a very
special Writable implementation, which doesn't actually support
write()/readFields(). In order to distinguish (and not to mix) orc with other
writable formats, I created a standalone type family for ORC.
2) I think it is a good idea to have a crunch-hive submodule for now. The Hive
team is also working on refactoring the Hive dependencies to make it more
concise and modular (HIVE-7423). I hope we can then move this orc support into
Crunch trunk after we have a modularized dependency for this component.
> Adding ORC file format support in Crunch
> ----------------------------------------
>
> Key: CRUNCH-450
> URL: https://issues.apache.org/jira/browse/CRUNCH-450
> Project: Crunch
> Issue Type: New Feature
> Components: Core, IO
> Reporter: Wang Zhong
> Assignee: Josh Wills
> Attachments: CRUNCH-450.patch
>
>
> This JIRA adds ORC file format support in Crunch by:
> --
> 1. Adding input source and output target for ORC
> 2. Adding a new type family - OrcTypeFamily to serialize / deserialize
> objects into OrcStruct
> 3. Supporting column pruning optimization
--
This message was sent by Atlassian JIRA
(v6.2#6252)