----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66645/#review201640 -----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Lines 2499 (patched) <https://reviews.apache.org/r/66645/#comment283134> Heartbeat for each txn should be sent at ConfVars.HIVE_TXN_TIMEOUT/2 intervals. HIVE_TXN_TIMEOUT is what the server uses to timeout a txn. Sending heartbets more frequently loads the metastore unnecessarily. See DbTxnManger.getHeartbeatInterval() for example. This also guarantees that server and client are in sync. streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java Lines 54 (patched) <https://reviews.apache.org/r/66645/#comment283128> Why is it useful to make this distinction? Since the imple is able to route all events to appropriate partition, why should it not do that all the time. streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java Lines 68 (patched) <https://reviews.apache.org/r/66645/#comment283129> if isStaticPartitioning() == true, can isDynamicPartitioning == ture, i.e. why 2 methods streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 121 (patched) <https://reviews.apache.org/r/66645/#comment283133> if all operations on hms client are mutexted via txnBatchLock, why have 2 hms clients? More generally, having 1 thread per connection may be rather heavy weight. Each thread will send 1 msg every few miniutes, i.e. will be mostly idle. DbTxnManager uses a static pool of 5 or so threads which does heartbeats for the whole VM (HS2). I think that would be a good model here as well. streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 309 (patched) <https://reviews.apache.org/r/66645/#comment283143> Shoule each streamingConnection have a UUID to include here? I can imagine mulitple instances writing to the same table which are not distiguishable by db.table streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 326 (patched) <https://reviews.apache.org/r/66645/#comment283140> this should be in try/catch. SessionState.start() can sometimes throw, but after it creates a Session object and ataches it to a threadLocal which then makes every user of that thread fail. There is an Apache bug describing some situations where this can happen. streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 337 (patched) <https://reviews.apache.org/r/66645/#comment283141> should include db.table in all exceptions/message streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 422 (patched) <https://reviews.apache.org/r/66645/#comment283144> The API in general doesn't expose 'catalog' - should it? streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 538 (patched) <https://reviews.apache.org/r/66645/#comment283146> why should the client know this? This is an implementation detail that should be hidden streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 549 (patched) <https://reviews.apache.org/r/66645/#comment283145> why should the client know this? This is an implementation detail that should be hidden streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 574 (patched) <https://reviews.apache.org/r/66645/#comment283147> include db.table in message? streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 650 (patched) <https://reviews.apache.org/r/66645/#comment283148> what if someone calls this after above close()? Seems like isClosed() should be a property of connection itself not current batch. The original API had a serious flaw where when something threw an exception, the connection was not made fully unusable and some clients ignored exceptions and continued to try to wrie corrupting files. I think the flow should be if any 'unexpected' error happens connection should abort any remaning txns in the batch and make all subsequent methods on it throw and not propagate the call further. streaming/src/java/org/apache/hive/streaming/TransactionBatch.java Line 25 (original), 25 (patched) <https://reviews.apache.org/r/66645/#comment283125> Why does this need to be public at all? If you are hiding the concept ot TB from end user, I'd say the Connection object should have open/commit/abort. The 1st open() opens N txns at once (as currently), etc but the end user doesn't need these implementation details. In particular for Blob stores w/o append operation, we'd hae to make batch size = 1. Connection.close() would abort any remaining (unsed) txns in the current batch. streaming/src/java/org/apache/hive/streaming/TransactionBatch.java Line 88 (original), 88 (patched) <https://reviews.apache.org/r/66645/#comment283126> there is no way to have a txn in a batch that is not open/committed/aborted. In the old api you make a metastore call to open all 'batch-size' txns at once so that you know the x and y of delta_x_y to create. streaming/src/java/org/apache/hive/streaming/TransactionBatch.java Line 92 (original), 93 (patched) <https://reviews.apache.org/r/66645/#comment283127> why is this needed? - Eugene Koifman On April 18, 2018, 7:36 p.m., Prasanth_J wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66645/ > ----------------------------------------------------------- > > (Updated April 18, 2018, 7:36 p.m.) > > > Review request for hive, Ashutosh Chauhan and Eugene Koifman. > > > Bugs: HIVE-19211 > https://issues.apache.org/jira/browse/HIVE-19211 > > > Repository: hive-git > > > Description > ------- > > HIVE-19211: New streaming ingest API and support for dynamic partitioning > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 73492ff > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 82ba775 > metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java > PRE-CREATION > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java > a66c135 > serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION > streaming/pom.xml b58ec01 > streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java > 25998ae > streaming/src/java/org/apache/hive/streaming/ConnectionError.java 668bffb > streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java > 898b3f9 > streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java b1f9520 > streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java b04e137 > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java > 23e17e7 > streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 0011b14 > streaming/src/java/org/apache/hive/streaming/InvalidPartition.java f1f9804 > streaming/src/java/org/apache/hive/streaming/InvalidTable.java ef1c91d > streaming/src/java/org/apache/hive/streaming/InvalidTransactionState.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java > 762f5f8 > streaming/src/java/org/apache/hive/streaming/PartitionCreationFailed.java > 5f9aca6 > streaming/src/java/org/apache/hive/streaming/PartitionHandler.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/QueryFailedException.java > ccd3ae0 > streaming/src/java/org/apache/hive/streaming/RecordWriter.java dc6d70e > streaming/src/java/org/apache/hive/streaming/SerializationError.java > a57ba00 > streaming/src/java/org/apache/hive/streaming/StreamingConnection.java > 2f760ea > streaming/src/java/org/apache/hive/streaming/StreamingException.java > a7f84c1 > streaming/src/java/org/apache/hive/streaming/StreamingIOFailure.java > 0dfbfa7 > > streaming/src/java/org/apache/hive/streaming/StrictDelimitedInputWriter.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/StrictJsonWriter.java 0077913 > streaming/src/java/org/apache/hive/streaming/StrictRegexWriter.java c0b7324 > streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 2b05771 > > streaming/src/java/org/apache/hive/streaming/TransactionBatchUnAvailable.java > a8c8cd4 > streaming/src/java/org/apache/hive/streaming/TransactionError.java a331b20 > streaming/src/test/org/apache/hive/streaming/TestDelimitedInputWriter.java > f0843a1 > streaming/src/test/org/apache/hive/streaming/TestStreaming.java 6f63bfb > > streaming/src/test/org/apache/hive/streaming/TestStreamingDynamicPartitioning.java > PRE-CREATION > > > Diff: https://reviews.apache.org/r/66645/diff/4/ > > > Testing > ------- > > > Thanks, > > Prasanth_J > >