[
https://issues.apache.org/jira/browse/CRUNCH-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282636#comment-16282636
]
Micah Whitacre commented on CRUNCH-340:
---------------------------------------
Some comments:
* It'd be nice to redo the test setup to follow what crunch-core does to avoid
you having to deal with custom failsafe/surefire config in the project. (e.g.
make the suite end with "IT.java" and rename the individual tests in that suite
to something that doesn't match on either failsafe or surefire (e.g. "spec"?)
* In your tests you call p.done() after a bunch of asserts. This can lead to
some stranded pipelines/intermediate files if any test fails. try to ensure
calling pipeline.done()
* In HCatTestSuite, you have a variable named "hbaseTmpDir" but nothing in that
class actually deals with HBase. Should just be hadoopTmpDir.
* In HCatTestSuite, "databaseLocation" make that a temporary folder to ensure
it doesn't exist under the project to accidentally be checked in/linger between
runs and also the JUnit TemporaryFolder will ensure the data will be cleaned up
at the end of the suite. It'll also assume less about the launching/running
location. Also then in your cleanup you can do the cleanup using the folder vs
a specific file. Also why cleanup using Hadoop FileSystem vs Java File objects?
* Javadoc for FromHCat seems to indicate the default db is "database" when it
should be "default"
* Can likely remove the site.xml from the project. It is not the right file
but also looks like that only exists in certain projects.
Other than that +1 from me.
> Create HCatSource and HCatTarget
> --------------------------------
>
> Key: CRUNCH-340
> URL: https://issues.apache.org/jira/browse/CRUNCH-340
> Project: Crunch
> Issue Type: New Feature
> Reporter: Chao Shi
> Assignee: Stephen Durfey
> Attachments: 0001-CRUNCH-340-added-HCatSource-HCatTarget.patch,
> CRUNCH-340.patch, crunch-340-v2.patch, crunch-340-v3.patch,
> crunch-340-v4.patch, crunch-340.patch
>
>
> This patch adds HCatSource, which enables crunch pipeline to read from Hive
> tables. This is the very first version, leaving a few TODOs in code.
> It adds new dependency from crunch-core to hcatalog (as well as several hive
> components). I guess maybe we should create a new subproject (e.g.
> crunch-hcatalog) rather than add it into crunch-core.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)