-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66645/#review201640
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2499 (patched)
<https://reviews.apache.org/r/66645/#comment283134>

    Heartbeat for each txn should be sent at ConfVars.HIVE_TXN_TIMEOUT/2 
intervals.  HIVE_TXN_TIMEOUT is what the server uses to timeout a txn.  Sending 
heartbets more frequently loads the metastore unnecessarily.  See 
DbTxnManger.getHeartbeatInterval() for example.  This also guarantees that 
server and client are in sync.



streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java
Lines 54 (patched)
<https://reviews.apache.org/r/66645/#comment283128>

    Why is it useful to make this distinction?  Since the imple is able to 
route all events to appropriate partition, why should it not do that all the 
time.



streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java
Lines 68 (patched)
<https://reviews.apache.org/r/66645/#comment283129>

    if isStaticPartitioning() == true, can isDynamicPartitioning == ture, i.e. 
why 2 methods



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 121 (patched)
<https://reviews.apache.org/r/66645/#comment283133>

    if all operations on hms client are mutexted via txnBatchLock, why have 2 
hms clients?
    
    More generally, having 1 thread per connection may be rather heavy weight.  
Each thread will send 1 msg every few miniutes, i.e. will be mostly idle.  
DbTxnManager uses a static pool of 5 or so threads which does heartbeats for 
the whole VM (HS2).  I think that would be a good model here as well.



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 309 (patched)
<https://reviews.apache.org/r/66645/#comment283143>

    Shoule each streamingConnection have a UUID to include here?  I can imagine 
mulitple instances writing to the same table which are not distiguishable by 
db.table



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 326 (patched)
<https://reviews.apache.org/r/66645/#comment283140>

    this should be in try/catch.  SessionState.start() can sometimes throw, but 
after it creates a Session object and ataches it to a threadLocal which then 
makes every user of that thread fail.  There is an Apache bug describing some 
situations where this can happen.



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 337 (patched)
<https://reviews.apache.org/r/66645/#comment283141>

    should include db.table in all exceptions/message



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 422 (patched)
<https://reviews.apache.org/r/66645/#comment283144>

    The API in general doesn't expose 'catalog' - should it?



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 538 (patched)
<https://reviews.apache.org/r/66645/#comment283146>

    why should the client know this?  This is an implementation detail that 
should be hidden



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 549 (patched)
<https://reviews.apache.org/r/66645/#comment283145>

    why should the client know this?  This is an implementation detail that 
should be hidden



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 574 (patched)
<https://reviews.apache.org/r/66645/#comment283147>

    include db.table in message?



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 650 (patched)
<https://reviews.apache.org/r/66645/#comment283148>

    what if someone calls this after above close()?  Seems like isClosed() 
should be a property of connection itself not current batch.
    
    The original API had a serious flaw where when something threw an 
exception, the connection was not made fully unusable and some clients ignored 
exceptions and continued to try to wrie corrupting files.  I think the flow 
should be if any 'unexpected' error happens connection should abort any 
remaning txns in the batch and make all subsequent methods on it throw and not 
propagate the call further.



streaming/src/java/org/apache/hive/streaming/TransactionBatch.java
Line 25 (original), 25 (patched)
<https://reviews.apache.org/r/66645/#comment283125>

    Why does this need to be public at all?  If you are hiding the concept ot 
TB from end user, I'd say the Connection object should have open/commit/abort.  
The 1st open() opens N txns at once (as currently), etc but the end user 
doesn't need these implementation details.  In particular for Blob stores w/o 
append operation, we'd hae to make batch size = 1.
    Connection.close() would abort any remaining (unsed) txns in the current 
batch.



streaming/src/java/org/apache/hive/streaming/TransactionBatch.java
Line 88 (original), 88 (patched)
<https://reviews.apache.org/r/66645/#comment283126>

    there is no way to have a txn in a batch that is not 
open/committed/aborted.  In the old api you make a metastore call to open all 
'batch-size' txns at once so that you know the x and y of delta_x_y to create.



streaming/src/java/org/apache/hive/streaming/TransactionBatch.java
Line 92 (original), 93 (patched)
<https://reviews.apache.org/r/66645/#comment283127>

    why is this needed?


- Eugene Koifman


On April 18, 2018, 7:36 p.m., Prasanth_J wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66645/
> -----------------------------------------------------------
> 
> (Updated April 18, 2018, 7:36 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Eugene Koifman.
> 
> 
> Bugs: HIVE-19211
>     https://issues.apache.org/jira/browse/HIVE-19211
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-19211: New streaming ingest API and support for dynamic partitioning
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 73492ff 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  82ba775 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java 
> PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java 
> a66c135 
>   serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION 
>   streaming/pom.xml b58ec01 
>   streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java 
> 25998ae 
>   streaming/src/java/org/apache/hive/streaming/ConnectionError.java 668bffb 
>   streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java 
> 898b3f9 
>   streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java b1f9520 
>   streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java b04e137 
>   streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java 
> 23e17e7 
>   streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 0011b14 
>   streaming/src/java/org/apache/hive/streaming/InvalidPartition.java f1f9804 
>   streaming/src/java/org/apache/hive/streaming/InvalidTable.java ef1c91d 
>   streaming/src/java/org/apache/hive/streaming/InvalidTransactionState.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java 
> 762f5f8 
>   streaming/src/java/org/apache/hive/streaming/PartitionCreationFailed.java 
> 5f9aca6 
>   streaming/src/java/org/apache/hive/streaming/PartitionHandler.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/QueryFailedException.java 
> ccd3ae0 
>   streaming/src/java/org/apache/hive/streaming/RecordWriter.java dc6d70e 
>   streaming/src/java/org/apache/hive/streaming/SerializationError.java 
> a57ba00 
>   streaming/src/java/org/apache/hive/streaming/StreamingConnection.java 
> 2f760ea 
>   streaming/src/java/org/apache/hive/streaming/StreamingException.java 
> a7f84c1 
>   streaming/src/java/org/apache/hive/streaming/StreamingIOFailure.java 
> 0dfbfa7 
>   
> streaming/src/java/org/apache/hive/streaming/StrictDelimitedInputWriter.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/StrictJsonWriter.java 0077913 
>   streaming/src/java/org/apache/hive/streaming/StrictRegexWriter.java c0b7324 
>   streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 2b05771 
>   
> streaming/src/java/org/apache/hive/streaming/TransactionBatchUnAvailable.java 
> a8c8cd4 
>   streaming/src/java/org/apache/hive/streaming/TransactionError.java a331b20 
>   streaming/src/test/org/apache/hive/streaming/TestDelimitedInputWriter.java 
> f0843a1 
>   streaming/src/test/org/apache/hive/streaming/TestStreaming.java 6f63bfb 
>   
> streaming/src/test/org/apache/hive/streaming/TestStreamingDynamicPartitioning.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66645/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Prasanth_J
> 
>

Reply via email to