[ https://issues.apache.org/jira/browse/HCATALOG-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225369#comment-13225369 ]
jirapos...@reviews.apache.org commented on HCATALOG-287: -------------------------------------------------------- bq. On 2012-03-07 01:31:16, Alan Gates wrote: bq. > trunk/src/java/org/apache/hcatalog/data/transfer/HCatWriter.java, line 35 bq. > <https://reviews.apache.org/r/4163/diff/1/?file=87844#file87844line35> bq. > bq. > Should this pass an iterator or a Collection? A Collection seems more natural. Also, I think there should be a write for single records as some writers may not want to buffer records. Its much easier to control the lifecycle of writes if its an iterator, since then we are in control of how writes are done. If we reverse it and accept individual writes of record then construction of RecordReader instanstiation, close gets complex. Further, most of the setup/teardown calls are right now within write(), if we accept individual writes, the external framework has to understand semantics of all those calls, do those calls, manage state etc. So, complexity is for both user as well as for HCatalog. So, I went with Iterator approach, which is simple and easy. But, if there really is a use-case for collection / single writes I can work on that, but probably in a follow-up patch. What do you think? bq. On 2012-03-07 01:31:16, Alan Gates wrote: bq. > trunk/src/java/org/apache/hcatalog/data/transfer/ReaderInfo.java, line 14 bq. > <https://reviews.apache.org/r/4163/diff/1/?file=87847#file87847line14> bq. > bq. > Should this be Serializable or Writable? HCatInputFormat implements Writable. It might make sense to stick with that. This is intended to be used by frameworks other MR who don't have concept of Writable. Serializable is standard for any based platform. Perhaps, we can make it implement both Serializable and Writable. bq. On 2012-03-07 01:31:16, Alan Gates wrote: bq. > trunk/src/java/org/apache/hcatalog/data/transfer/WriterInfo.java, line 8 bq. > <https://reviews.apache.org/r/4163/diff/1/?file=87850#file87850line8> bq. > bq. > Same question on Writable vs Serializable as on ReaderInfo Same answer as above : ) bq. On 2012-03-07 01:31:16, Alan Gates wrote: bq. > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsReader.java, line 30 bq. > <https://reviews.apache.org/r/4163/diff/1/?file=87851#file87851line30> bq. > bq. > What is HDFS specific about this reader? It is just using HCatInputFormat, so anything that works with that should work, right? Correct. May be I am over-designing. I was thinking of HCatalog to be a federation system, which can federate read/writes even for other storage system for which HCatIF/OF are not used. bq. On 2012-03-07 01:31:16, Alan Gates wrote: bq. > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsReader.java, line 60 bq. > <https://reviews.apache.org/r/4163/diff/1/?file=87851#file87851line60> bq. > bq. > I don't see how the conf values set in prepareRead get propagated to this read, since prepareRead will be called on the master and read on the slaves. This happens via construction semantics of HCatReader. DataTransferFactory requires Conf to instantiate HCatReader for slaves. So, HCatReader has this information at the time of this call. bq. On 2012-03-07 01:31:16, Alan Gates wrote: bq. > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java, line 55 bq. > <https://reviews.apache.org/r/4163/diff/1/?file=87852#file87852line55> bq. > bq. > Same question as above on the reader as to how the conf object gets passed to this method. Same answer as above : ) bq. On 2012-03-07 01:31:16, Alan Gates wrote: bq. > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java, line 77 bq. > <https://reviews.apache.org/r/4163/diff/1/?file=87852#file87852line77> bq. > bq. > Same question as above about how the conf object gets here. Same answer as above. bq. On 2012-03-07 01:31:16, Alan Gates wrote: bq. > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java, line 89 bq. > <https://reviews.apache.org/r/4163/diff/1/?file=87852#file87852line89> bq. > bq. > Same question as above on how the conf object gets here. Same answer as above. bq. On 2012-03-07 01:31:16, Alan Gates wrote: bq. > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java, line 66 bq. > <https://reviews.apache.org/r/4163/diff/1/?file=87852#file87852line66> bq. > bq. > Don't you need to call committer.abortTask in the case the task fails? Good catch. I am not sure when though. Need to think over it. - Ashutosh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4163/#review5658 ----------------------------------------------------------- On 2012-03-03 00:16:31, Ashutosh Chauhan wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4163/ bq. ----------------------------------------------------------- bq. bq. (Updated 2012-03-03 00:16:31) bq. bq. bq. Review request for hcatalog. bq. bq. bq. Summary bq. ------- bq. bq. This is more of a protype to garner some initial feedback. There might be bugs hiding in implementation. But, currently looking for design level comments. bq. I have added two classes DataReadApi.java and DataWriteApi.java which exercise these new interfaces and represent idiomatic usage of this functionality. bq. bq. bq. This addresses bug HCATALOG-287. bq. https://issues.apache.org/jira/browse/HCATALOG-287 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/src/java/org/apache/hcatalog/data/transfer/ReadEntity.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/ReadEntityBuilder.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/ReaderInfo.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/WriteEntity.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/Entity.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/EntityBuilder.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/HCatReader.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/HCatWriter.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/WriteEntityBuilder.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/WriterInfo.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsReader.java PRE-CREATION bq. trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java PRE-CREATION bq. trunk/src/test/e2e/hcatalog/udfs/java/org/apache/hcatalog/utils/DataReadApi.java PRE-CREATION bq. trunk/src/test/e2e/hcatalog/udfs/java/org/apache/hcatalog/utils/DataWriteApi.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4163/diff bq. bq. bq. Testing bq. ------- bq. bq. None. bq. bq. bq. Thanks, bq. bq. Ashutosh bq. bq. > Add data api to HCatalog > ------------------------ > > Key: HCATALOG-287 > URL: https://issues.apache.org/jira/browse/HCATALOG-287 > Project: HCatalog > Issue Type: New Feature > Reporter: Ashutosh Chauhan > Assignee: Ashutosh Chauhan > Fix For: 0.5 > > > HCatalog does a great job providing metadata querying facilities. It also has > {{HCatInputFormat}} and {{HCatOutputFormat}} which lets you read and write > data via {{MapReduce}} framework. This jira is about adding Read/Write > capabilities into Hadoop cluster independent of any framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira