subject:"\[jira\] \[Commented\] \(HIVE\-5687\) Streaming support in Hive"

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-11 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967400#comment-13967400
 ] 

Lefty Leverenz commented on HIVE-5687:
--

All's well.  Thanks Alan.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965646#comment-13965646
 ] 

Lars Francke commented on HIVE-5687:


Thanks Alan for the follow-up!

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965644#comment-13965644
 ] 

Alan Gates commented on HIVE-5687:
--

File HIVE-6885 to address the style and docs feedback.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965544#comment-13965544
 ] 

Alan Gates commented on HIVE-5687:
--

[~leftylev] sorry, this is my fault.  I meant to file a JIRA to address yours 
and Lars style comments and forgot.  I'll do that shortly.
[~lars_francke] the latest patch was only different from the one Owen +1'd in a 
few small packaging details.  Sorry if that wasn't clear.  I pushed it in 
because I know Harish is anxious to get a release candidate for 0.13 and this 
was one of the last blockers.  

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965143#comment-13965143
 ] 

Lars Francke commented on HIVE-5687:


I don't understand why this was rushed. There were only a couple of hours to 
review the final patch.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965129#comment-13965129
 ] 

Roshan Naik commented on HIVE-5687:
---

[~leftylev] Yes looks like it went unnoticed due to the short time frame. For 
some reason i never got a notification of your review. We can get it in via 
another patch... but it appears to be too late to get it into this release.

[~orahive] You can query the data while it is being streamed into Hive. Queries 
will always see a consistent view of the data as this feature relies on the new 
ACID support in Hive. So queries will not see new data that was committed after 
they began executing. 

FLUME-1734 consumes this API to implement a Flume sink that streams data 
continuously into Hive.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-10 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965071#comment-13965071
 ] 

Lefty Leverenz commented on HIVE-5687:
--

[~roshan_naik], were my comments on the review board too late?  Most of them 
were trivial edits of the javadocs, but several seemed worth fixing.  (For 
example, some methods' javadocs got an exception name wrong.)  Maybe I should 
have raised issues, but I figured you're the best judge of which changes should 
be made.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-09 Thread SS (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965061#comment-13965061
 ] 

SS commented on HIVE-5687:
--

Would you support live hive query on streaming data like esper , CEP in this or 
future patch  or we would  just have an API to persist data in hadoop ?

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 
> patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-08 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963820#comment-13963820
 ] 

Roshan Naik commented on HIVE-5687:
---

I had posted the revised patch on RB

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive 
> Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-08 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963560#comment-13963560
 ] 

Roshan Naik commented on HIVE-5687:
---

Owen: Thanks a lot for revising package.html

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive 
> Streaming Ingest API for v4 patch.pdf, package.html
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-08 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962755#comment-13962755
 ] 

Lars Francke commented on HIVE-5687:


Thanks, could you put a new version up on RB?

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, 
> HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive 
> Streaming Ingest API for v4 patch.pdf
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-07 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961928#comment-13961928
 ] 

Lars Francke commented on HIVE-5687:


To add to Owen's style guidelines thing: Just throwing this patch in my IDE 
gives me a lot of warnings and errors.

Things like:
* missorted modifiers (static final private -> private static final)
* Unnecessary package-level visibility
* Redundant exceptions in throws clauses
* Some very weird formatting
* Call to simple getters from within class
* for loop without initializer that can be a while loop
* Unused variables
* Conditions that are always true or false
* Empty Javadoc tags
* Unnecessary "this"
* Missing @Override annotations
* StringBuffer usage
* Modifiers in interfaces (public)

etc.

I'm happy to do a full review on ReviewBoard but these are all things that 
Eclipse and IntelliJ can show you out of the box. So I'd appreciate it if you 
could set your IDE up to show these things and fix them in addition to using 
proper code formatting. Contact me if I can help in any way.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming 
> Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-06 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961469#comment-13961469
 ] 

Owen O'Malley commented on HIVE-5687:
-

Initial comments
* Please create a package-info.java (or package.html) for the entire package 
that has the text from the design document, but without the example.
* I believe the API will go through some iterations and use before it becomes 
stable. We should warn users that it will likely evolve in future versions of 
Hive and won't necessarily be backwards compatible. The package-info.java is 
probably the best place to place the warning.
* The current API requires users to implement a RecordWriter wrapper for each 
SerDe they want to use. In Hive 0.14, I think we need to revisit this and 
switch to just requiring a serde class name and a string to string map of serde 
properties. This way, any of a user's current SerDes can be used to parse the 
byte[] and there can be a generic method for constructing the object using 
reflection.
* The code shouldn't depend on OrcOutputFormat, but instead find the 
OutputFormat of the table/partition and use that. The streaming code should 
only require that it implement AcidOutputFormat.
* The RecordWriter should be passed the HiveConf rather than create it. It will 
make it easier to do unit tests.
* The StreamingIntegrationTester needs to print the exception's getMessage to 
stderr if the options don't parse correctly. Otherwise, the user doesn't get 
any clue as to which parameter they forgot.
* I don't see how the column reordering can be invoken. The SerDe is using the 
table properties from the table in the MetaStore to define the columns it 
returns, so the two should always be the same. My suggestion is to remove all 
of the column reordering code.
* If you don't remove the column ordering code, you should deserialize and then 
reorder the columns rather than the current strategy of deserialize, reorder, 
serialize, and deserialize.
* Revert the change that adds startMetaStore. It isn't called and thus 
shouldn't be added.
* The method writeImpl(byte[]) doesn't add any value and should just be inlined.
* Why do you use DDL to create partitions rather than the MetaStoreClient API 
that you everywhere else?

Some style guidelines:
* Please split the lines that are longer than 100 characters and it is even 
better if they are less than 80 characters.
* Ensure your if statements have a space before the parenthesis.
* Remove the commented out code.
* Please remove the private uncalled functions (eg. 
HiveEndPoint.newConnection(String proxyUser, ...) )
* You've defined a lot of exceptions in this API. Is the user expected to 
handle each exception separately? Your throw declarations don't list the exact 
exceptions that are thrown. You'd be better off with different exceptions only 
when the user is expected to be able to handle a specific error. Otherwise, you 
might as well use StreamingException with a descriptive error message for 
everything.


> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming 
> Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-04-04 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960948#comment-13960948
 ] 

Alan Gates commented on HIVE-5687:
--

After discussing this with [~ashutoshc] I think this should move from 
hive/streaming to hive/hcatalog/streaming.  This is a non-SQL interface for 
Hive, and that is the area HCatalog covers.  This should be a simple refactor 
and shouldn't require a re-review.

> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: ACID, Streaming
> Fix For: 0.13.0
>
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, 
> HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, 
> HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming 
> Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

2014-03-17 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938187#comment-13938187
 ] 

Alan Gates commented on HIVE-5687:
--

A few comments:
* Right now you're building one lock request and re-using it.  That won't work. 
 A new lock needs to be constructed with each transaction and then associated 
with that transaction so that the transaction manager knows to release the lock 
when the transaction is committed or aborted.  This should be done in 
beginNextTxn().
* The partition name is currently being built incorrectly.  It is just using 
the values.  It should be constructed using Warehouse.makePartName.
* The lock components are being constructed incorrectly.  You are building a 
component for every key/value pair in the partition.  You should only build one 
component for each partition you want to lock.  So in your case, each lock 
request will have exactly one lock component.
* The file is being written to the table location instead of the partition 
location.  When I run this with table foo and partition bar I get files in 
/hive/warehouse/foo instead of /hive/warehouse/foo/bar
* The metaStoreClient is being prematurely closed.  It certainly shouldn't be 
closed in createPartition.  I'm not sure it should be closed at all.


> Streaming support in Hive
> -
>
> Key: HIVE-5687
> URL: https://issues.apache.org/jira/browse/HIVE-5687
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, 
> HIVE-5687.v2.patch
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

[jira] [Commented] (HIVE-5687) Streaming support in Hive

15 matches

Site Navigation

Mail list logo

Footer information