NiXuebing commented on issue #206: FLUME-2956 - hive sink not sending heartbeat 
correctly
URL: https://github.com/apache/flume/pull/206#issuecomment-469562180
 
 
   > > 
我看HiveSink.java里`setupHeartBeatTimer()`的作用也仅仅是将`timeToSendHeartBeat`设置为`true`,实际触发心跳还是得等消息来时的flush。如果长时间没有消息,事务还是会自动断开。
   > > 
我的处理是`HiveSink.setupHeartBeatTimer()`里直接加上`writer.heartBeat()`,不知道这样有没有问题 
@hejiang2000
   > 
   > 
最好不要这么做。setupHeartBeatTimer()是在异步定时通过TCP发送这个心跳信息(调用txnBatch.heartbeat()),所以有可能和我们在同一个TCP上发送数据的操作(txnBatch.commit()
 等)发生冲突,导致问题。@NiXuebing
   
   我发现我的问题是`drainOneBatch(Channel 
channel)`中循环`batchSize`处理event和write的耗时太久,目前设指定`sink.batchSize`=10000,处理时间都在300秒左右,再进行`flush`时事务已经被断开了。
   
   `2019-03-05 14:43:20,543 INFO org.apache.flume.sink.hive.HiveSink: batch 
event write = 277142
   2019-03-05 14:43:20,543 INFO org.apache.flume.sink.hive.HiveWriter: 
Committing Txn 205240 on EndPoint: {metaStoreUri='thrift://hdfs-master01:9083', 
database='prod_ad_rds', table='startup_shutdown', partitionVals=[2019-03-05,  
2019-03-05-14] }
   2019-03-05 14:48:36,292 INFO org.apache.flume.sink.hive.HiveSink: batch 
event write = 315524
   2019-03-05 14:48:36,293 INFO org.apache.flume.sink.hive.HiveWriter: Sending 
heartbeat on batch TxnIds=[205240...205299] on endPoint = 
{metaStoreUri='thrift://hdfs-master01:9083', database='prod_ad_rds', 
table='startup_shutdown', partitionVals=[2019-03-05,  2019-03-05-14] }
   2019-03-05 14:48:37,124 WARN org.apache.flume.sink.hive.HiveWriter: Unable 
to send heartbeat on Txn Batch TxnIds=[205240...205299] on endPoint = 
{metaStoreUri='thrift://hdfs-master01:9083', database='prod_ad_rds', 
table='startup_shutdown', partitionVals=[2019-03-05,  2019-03-05-14] }
   org.apache.hive.hcatalog.streaming.HeartBeatFailure: Heart beat error. 
InvalidTxns: [205243, 205242, 205247, 205246, 205245, 205244, 205251, 205250, 
205249, 205248, 205255, 205254, 205253, 205252, 205259, 205258, 205257, 205256, 
205263, 205262, 205261, 205260, 205267, 205266, 205265, 205264, 205271, 205270, 
205269, 205268, 205275, 205274, 205273, 205272, 205279, 205278, 205277, 205276, 
205283, 205282, 205281, 205280, 205287, 205286, 205285, 205284, 205291, 205290, 
205289, 205288, 205295, 205294, 205293, 205292, 205299, 205298, 205297, 
205296]. AbortedTxns: [205241]
        at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.heartbeat(HiveEndPoint.java:953)
        at org.apache.flume.sink.hive.HiveWriter$2.call(HiveWriter.java:240)
        at org.apache.flume.sink.hive.HiveWriter$2.call(HiveWriter.java:236)
        at org.apache.flume.sink.hive.HiveWriter$11.call(HiveWriter.java:431)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   2019-03-05 14:48:37,133 INFO org.apache.flume.sink.hive.HiveWriter: 
Committing Txn 205241 on EndPoint: {metaStoreUri='thrift://hdfs-master01:9083', 
database='prod_ad_rds', table='startup_shutdown', partitionVals=[2019-03-05,  
2019-03-05-14] }
   2019-03-05 14:48:37,209 ERROR 
org.apache.hive.hcatalog.streaming.HiveEndPoint: Fatal error on 
TxnIds=[205240...205299] on endPoint = 
{metaStoreUri='thrift://hdfs-master01:9083', database='prod_ad_rds', 
table='startup_shutdown', partitionVals=[2019-03-05,  2019-03-05-14] }; cause 
Unable to abort invalid transaction id : 205241: No such transaction 
txnid:205241
   org.apache.hive.hcatalog.streaming.TransactionError: Unable to abort invalid 
transaction id : 205241: No such transaction txnid:205241
        at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.abortImpl(HiveEndPoint.java:936)
        at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.abort(HiveEndPoint.java:894)
        at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.markDead(HiveEndPoint.java:753)
        at 
org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.commit(HiveEndPoint.java:853)
        at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:346)
        at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:343)
        at org.apache.flume.sink.hive.HiveWriter$11.call(HiveWriter.java:431)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: NoSuchTxnException(message:No such transaction txnid:205241)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$abort_txn_result$abort_txn_resultStandardScheme.read(ThriftHiveMetastore.java)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$abort_txn_result$abort_txn_resultStandardScheme.read(ThriftHiveMetastore.java)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$abort_txn_result.read(ThriftHiveMetastore.java)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_abort_txn(ThriftHiveMetastore.java:4484)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.abort_txn(ThriftHiveMetastore.java:4471)
        `
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to