[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967400#comment-13967400 ] Lefty Leverenz commented on HIVE-5687: -- All's well. Thanks Alan. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 > patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965646#comment-13965646 ] Lars Francke commented on HIVE-5687: Thanks Alan for the follow-up! > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 > patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965644#comment-13965644 ] Alan Gates commented on HIVE-5687: -- File HIVE-6885 to address the style and docs feedback. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 > patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965544#comment-13965544 ] Alan Gates commented on HIVE-5687: -- [~leftylev] sorry, this is my fault. I meant to file a JIRA to address yours and Lars style comments and forgot. I'll do that shortly. [~lars_francke] the latest patch was only different from the one Owen +1'd in a few small packaging details. Sorry if that wasn't clear. I pushed it in because I know Harish is anxious to get a release candidate for 0.13 and this was one of the last blockers. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 > patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965143#comment-13965143 ] Lars Francke commented on HIVE-5687: I don't understand why this was rushed. There were only a couple of hours to review the final patch. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 > patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965129#comment-13965129 ] Roshan Naik commented on HIVE-5687: --- [~leftylev] Yes looks like it went unnoticed due to the short time frame. For some reason i never got a notification of your review. We can get it in via another patch... but it appears to be too late to get it into this release. [~orahive] You can query the data while it is being streamed into Hive. Queries will always see a consistent view of the data as this feature relies on the new ACID support in Hive. So queries will not see new data that was committed after they began executing. FLUME-1734 consumes this API to implement a Flume sink that streams data continuously into Hive. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 > patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965071#comment-13965071 ] Lefty Leverenz commented on HIVE-5687: -- [~roshan_naik], were my comments on the review board too late? Most of them were trivial edits of the javadocs, but several seemed worth fixing. (For example, some methods' javadocs got an exception name wrong.) Maybe I should have raised issues, but I figured you're the best judge of which changes should be made. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 > patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965061#comment-13965061 ] SS commented on HIVE-5687: -- Would you support live hive query on streaming data like esper , CEP in this or future patch or we would just have an API to persist data in hadoop ? > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 > patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963820#comment-13963820 ] Roshan Naik commented on HIVE-5687: --- I had posted the revised patch on RB > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive > Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963560#comment-13963560 ] Roshan Naik commented on HIVE-5687: --- Owen: Thanks a lot for revising package.html > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive > Streaming Ingest API for v4 patch.pdf, package.html > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962755#comment-13962755 ] Lars Francke commented on HIVE-5687: Thanks, could you put a new version up on RB? > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, > HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive > Streaming Ingest API for v4 patch.pdf > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961928#comment-13961928 ] Lars Francke commented on HIVE-5687: To add to Owen's style guidelines thing: Just throwing this patch in my IDE gives me a lot of warnings and errors. Things like: * missorted modifiers (static final private -> private static final) * Unnecessary package-level visibility * Redundant exceptions in throws clauses * Some very weird formatting * Call to simple getters from within class * for loop without initializer that can be a while loop * Unused variables * Conditions that are always true or false * Empty Javadoc tags * Unnecessary "this" * Missing @Override annotations * StringBuffer usage * Modifiers in interfaces (public) etc. I'm happy to do a full review on ReviewBoard but these are all things that Eclipse and IntelliJ can show you out of the box. So I'd appreciate it if you could set your IDE up to show these things and fix them in addition to using proper code formatting. Contact me if I can help in any way. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming > Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961469#comment-13961469 ] Owen O'Malley commented on HIVE-5687: - Initial comments * Please create a package-info.java (or package.html) for the entire package that has the text from the design document, but without the example. * I believe the API will go through some iterations and use before it becomes stable. We should warn users that it will likely evolve in future versions of Hive and won't necessarily be backwards compatible. The package-info.java is probably the best place to place the warning. * The current API requires users to implement a RecordWriter wrapper for each SerDe they want to use. In Hive 0.14, I think we need to revisit this and switch to just requiring a serde class name and a string to string map of serde properties. This way, any of a user's current SerDes can be used to parse the byte[] and there can be a generic method for constructing the object using reflection. * The code shouldn't depend on OrcOutputFormat, but instead find the OutputFormat of the table/partition and use that. The streaming code should only require that it implement AcidOutputFormat. * The RecordWriter should be passed the HiveConf rather than create it. It will make it easier to do unit tests. * The StreamingIntegrationTester needs to print the exception's getMessage to stderr if the options don't parse correctly. Otherwise, the user doesn't get any clue as to which parameter they forgot. * I don't see how the column reordering can be invoken. The SerDe is using the table properties from the table in the MetaStore to define the columns it returns, so the two should always be the same. My suggestion is to remove all of the column reordering code. * If you don't remove the column ordering code, you should deserialize and then reorder the columns rather than the current strategy of deserialize, reorder, serialize, and deserialize. * Revert the change that adds startMetaStore. It isn't called and thus shouldn't be added. * The method writeImpl(byte[]) doesn't add any value and should just be inlined. * Why do you use DDL to create partitions rather than the MetaStoreClient API that you everywhere else? Some style guidelines: * Please split the lines that are longer than 100 characters and it is even better if they are less than 80 characters. * Ensure your if statements have a space before the parenthesis. * Remove the commented out code. * Please remove the private uncalled functions (eg. HiveEndPoint.newConnection(String proxyUser, ...) ) * You've defined a lot of exceptions in this API. Is the user expected to handle each exception separately? Your throw declarations don't list the exact exceptions that are thrown. You'd be better off with different exceptions only when the user is expected to be able to handle a specific error. Otherwise, you might as well use StreamingException with a descriptive error message for everything. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming > Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960948#comment-13960948 ] Alan Gates commented on HIVE-5687: -- After discussing this with [~ashutoshc] I think this should move from hive/streaming to hive/hcatalog/streaming. This is a non-SQL interface for Hive, and that is the area HCatalog covers. This should be a simple refactor and shouldn't require a re-review. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: ACID, Streaming > Fix For: 0.13.0 > > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, > HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, > HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming > Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938187#comment-13938187 ] Alan Gates commented on HIVE-5687: -- A few comments: * Right now you're building one lock request and re-using it. That won't work. A new lock needs to be constructed with each transaction and then associated with that transaction so that the transaction manager knows to release the lock when the transaction is committed or aborted. This should be done in beginNextTxn(). * The partition name is currently being built incorrectly. It is just using the values. It should be constructed using Warehouse.makePartName. * The lock components are being constructed incorrectly. You are building a component for every key/value pair in the partition. You should only build one component for each partition you want to lock. So in your case, each lock request will have exactly one lock component. * The file is being written to the table location instead of the partition location. When I run this with table foo and partition bar I get files in /hive/warehouse/foo instead of /hive/warehouse/foo/bar * The metaStoreClient is being prematurely closed. It certainly shouldn't be closed in createPartition. I'm not sure it should be closed at all. > Streaming support in Hive > - > > Key: HIVE-5687 > URL: https://issues.apache.org/jira/browse/HIVE-5687 > Project: Hive > Issue Type: Sub-task >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, > 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, > HIVE-5687.v2.patch > > > Implement support for Streaming data into HIVE. > - Provide a client streaming API > - Transaction support: Clients should be able to periodically commit a batch > of records atomically > - Immediate visibility: Records should be immediately visible to queries on > commit > - Should not overload HDFS with too many small files > Use Cases: > - Streaming logs into HIVE via Flume > - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)