[ 
https://issues.apache.org/jira/browse/HCATALOG-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225369#comment-13225369
 ] 

jirapos...@reviews.apache.org commented on HCATALOG-287:
--------------------------------------------------------



bq.  On 2012-03-07 01:31:16, Alan Gates wrote:
bq.  > trunk/src/java/org/apache/hcatalog/data/transfer/HCatWriter.java, line 35
bq.  > <https://reviews.apache.org/r/4163/diff/1/?file=87844#file87844line35>
bq.  >
bq.  >     Should this pass an iterator or a Collection?  A Collection seems 
more natural.  Also, I think there should be a write for single records as some 
writers may not want to buffer records.

Its much easier to control the lifecycle of writes if its an iterator, since 
then we are in control of how writes are done. If we reverse it and accept 
individual writes of record then construction of RecordReader instanstiation, 
close gets complex. Further, most of the setup/teardown calls are right now 
within write(), if we accept individual writes, the external framework has to 
understand semantics of all those calls, do those calls, manage state etc. So, 
complexity is for both user as well as for HCatalog. So, I went with Iterator 
approach, which is simple and easy. But, if there really is a use-case for 
collection / single writes I can work on that, but probably in a follow-up 
patch. What do you think? 


bq.  On 2012-03-07 01:31:16, Alan Gates wrote:
bq.  > trunk/src/java/org/apache/hcatalog/data/transfer/ReaderInfo.java, line 14
bq.  > <https://reviews.apache.org/r/4163/diff/1/?file=87847#file87847line14>
bq.  >
bq.  >     Should this be Serializable or Writable?  HCatInputFormat implements 
Writable.  It might make sense to stick with that.

This is intended to be used by frameworks other MR who don't have concept of 
Writable. Serializable is standard for any based platform. Perhaps, we can make 
it implement both Serializable and Writable.


bq.  On 2012-03-07 01:31:16, Alan Gates wrote:
bq.  > trunk/src/java/org/apache/hcatalog/data/transfer/WriterInfo.java, line 8
bq.  > <https://reviews.apache.org/r/4163/diff/1/?file=87850#file87850line8>
bq.  >
bq.  >     Same question on Writable vs Serializable as on ReaderInfo

Same answer as above : )


bq.  On 2012-03-07 01:31:16, Alan Gates wrote:
bq.  > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsReader.java, 
line 30
bq.  > <https://reviews.apache.org/r/4163/diff/1/?file=87851#file87851line30>
bq.  >
bq.  >     What is HDFS specific about this reader?  It is just using 
HCatInputFormat, so anything that works with that should work, right?

Correct. May be I am over-designing. I was thinking of HCatalog to be a 
federation system, which can federate read/writes even for other storage system 
for which HCatIF/OF are not used. 


bq.  On 2012-03-07 01:31:16, Alan Gates wrote:
bq.  > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsReader.java, 
line 60
bq.  > <https://reviews.apache.org/r/4163/diff/1/?file=87851#file87851line60>
bq.  >
bq.  >     I don't see how the conf values set in prepareRead get propagated to 
this read, since prepareRead will be called on the master and read on the 
slaves.

This happens via construction semantics of HCatReader. DataTransferFactory 
requires Conf to instantiate HCatReader for slaves. So, HCatReader has this 
information at the time of this call.


bq.  On 2012-03-07 01:31:16, Alan Gates wrote:
bq.  > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java, 
line 55
bq.  > <https://reviews.apache.org/r/4163/diff/1/?file=87852#file87852line55>
bq.  >
bq.  >     Same question as above on the reader as to how the conf object gets 
passed to this method.

Same answer as above : )


bq.  On 2012-03-07 01:31:16, Alan Gates wrote:
bq.  > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java, 
line 77
bq.  > <https://reviews.apache.org/r/4163/diff/1/?file=87852#file87852line77>
bq.  >
bq.  >     Same question as above about how the conf object gets here.

Same answer as above.


bq.  On 2012-03-07 01:31:16, Alan Gates wrote:
bq.  > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java, 
line 89
bq.  > <https://reviews.apache.org/r/4163/diff/1/?file=87852#file87852line89>
bq.  >
bq.  >     Same question as above on how the conf object gets here.

Same answer as above.


bq.  On 2012-03-07 01:31:16, Alan Gates wrote:
bq.  > trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java, 
line 66
bq.  > <https://reviews.apache.org/r/4163/diff/1/?file=87852#file87852line66>
bq.  >
bq.  >     Don't you need to call committer.abortTask in the case the task 
fails?

Good catch. I am not sure when though. Need to think over it.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4163/#review5658
-----------------------------------------------------------


On 2012-03-03 00:16:31, Ashutosh Chauhan wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4163/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-03 00:16:31)
bq.  
bq.  
bq.  Review request for hcatalog.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This is more of a protype to garner some initial feedback. There might be 
bugs hiding in implementation. But, currently looking for design level comments.
bq.  I have added two classes DataReadApi.java and DataWriteApi.java which 
exercise these new interfaces and represent idiomatic usage of this 
functionality.
bq.  
bq.  
bq.  This addresses bug HCATALOG-287.
bq.      https://issues.apache.org/jira/browse/HCATALOG-287
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/ReadEntity.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/ReadEntityBuilder.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/ReaderInfo.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/WriteEntity.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/Entity.java 
PRE-CREATION 
bq.    
trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/EntityBuilder.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/HCatReader.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/HCatWriter.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/WriteEntityBuilder.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/WriterInfo.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsReader.java 
PRE-CREATION 
bq.    trunk/src/java/org/apache/hcatalog/data/transfer/impl/HdfsWriter.java 
PRE-CREATION 
bq.    
trunk/src/test/e2e/hcatalog/udfs/java/org/apache/hcatalog/utils/DataReadApi.java
 PRE-CREATION 
bq.    
trunk/src/test/e2e/hcatalog/udfs/java/org/apache/hcatalog/utils/DataWriteApi.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4163/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  None.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ashutosh
bq.  
bq.


                
> Add data api to HCatalog
> ------------------------
>
>                 Key: HCATALOG-287
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-287
>             Project: HCatalog
>          Issue Type: New Feature
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.5
>
>
> HCatalog does a great job providing metadata querying facilities. It also has 
> {{HCatInputFormat}} and {{HCatOutputFormat}} which lets you read and write 
> data via {{MapReduce}} framework. This jira is about adding Read/Write 
> capabilities into Hadoop cluster independent of any framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to