[
https://issues.apache.org/jira/browse/CRUNCH-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chao Shi updated CRUNCH-340:
----------------------------
Attachment: crunch-340-v2.patch
Here is v2 patch.
ChangeLog:
- add equals() and hashCode()
- temporarliy hack WritableDeepCopier to get HCatRecord work (*)
- more test cases on multiple input sources
(*) The problem is HCatRecord is an interface, DefaultHCatRecord and
LazyHCatRecord are the implementations, which are both writable. This is a
temporary hack. I think a better approach is to specify a custom DeepCopier to
WritableType of HCatRecord.
Of course, the fallback way is to use HCatSource<DefaultHCatRecord>
everywehere, but HCatSource<HCatRecord> definately looks better. I'm further
thinking about HCatSource<HCatRecordable>, which looks like a even cleaner
interface.
> HCatSource
> ----------
>
> Key: CRUNCH-340
> URL: https://issues.apache.org/jira/browse/CRUNCH-340
> Project: Crunch
> Issue Type: New Feature
> Reporter: Chao Shi
> Attachments: crunch-340-v2.patch, crunch-340.patch
>
>
> This patch adds HCatSource, which enables crunch pipeline to read from Hive
> tables. This is the very first version, leaving a few TODOs in code.
> It adds new dependency from crunch-core to hcatalog (as well as several hive
> components). I guess maybe we should create a new subproject (e.g.
> crunch-hcatalog) rather than add it into crunch-core.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)