hudi-bot opened a new issue, #15064:
URL: https://github.com/apache/hudi/issues/15064

   Scenario: multi-writer test, one writer doing ingesting with Deltastreamer 
continuous mode, COW, inserts, async clustering and cleaning (partitions under 
2022/1, 2022/2), another writer with Spark datasource doing backfills to 
different partitions (2021/12).  
   
   0.10.0 no MT, clustering instant is inflight (failing it in the middle 
before upgrade) ➝ 0.11 MT, with multi-writer configuration the same as before.
   
   The clustering/replace instant cannot make progress due to marker creation 
failure, failing the DS ingestion as well.  Need to investigate if this is 
timeline-server-based marker related or MT related.
   {code:java}
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in 
stage 46.0 failed 1 times, most recent failure: Lost task 2.0 in stage 46.0 
(TID 277) (192.168.70.231 executor driver): java.lang.RuntimeException: 
org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
   Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
       at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
       at 
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
       at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
       at scala.collection.Iterator.foreach(Iterator.scala:943)
       at scala.collection.Iterator.foreach$(Iterator.scala:943)
       at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
       at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
       at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
       at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
       at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
       at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
       at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
       at scala.collection.AbstractIterator.to(Iterator.scala:1431)
       at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
       at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
       at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
       at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
       at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
       at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
       at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
       at 
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:131)
       at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
   Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
       at 
org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:94)
       at 
org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:37)
       at 
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119)
       ... 30 more
   Caused by: org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
   Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
       at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:160)
       at 
org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:90)
       ... 32 more
   Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file 
2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
   Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
       at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:154)
       ... 33 more
   Caused by: org.apache.hudi.exception.HoodieRemoteException: Failed to create 
marker file 
2022/1/24/aa2f24d3-882f-4d48-b20e-9fcd3540c7a7-0_2-46-277_20220314101326706.parquet.marker.CREATE
   Connect to localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
       at 
org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.create(TimelineServerBasedWriteMarkers.java:149)
       at org.apache.hudi.table.marker.WriteMarkers.create(WriteMarkers.java:64)
       at 
org.apache.hudi.io.HoodieWriteHandle.createMarkerFile(HoodieWriteHandle.java:181)
       at 
org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:99)
       at 
org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:73)
       at 
org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:46)
       at 
org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:83)
       at 
org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:40)
       at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
       at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:134)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       ... 3 more
   Caused by: org.apache.http.conn.HttpHostConnectException: Connect to 
localhost:26754 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: 
Connection refused (Connection refused)
       at 
org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:156)
       at 
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
       at 
org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
       at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
       at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
       at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
       at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
       at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
       at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
       at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
       at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
       at org.apache.http.client.fluent.Request.execute(Request.java:151)
       at 
org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.executeRequestToTimelineServer(TimelineServerBasedWriteMarkers.java:177)
       at 
org.apache.hudi.table.marker.TimelineServerBasedWriteMarkers.create(TimelineServerBasedWriteMarkers.java:145)
       ... 13 more
   Caused by: java.net.ConnectException: Connection refused (Connection refused)
       at java.net.PlainSocketImpl.socketConnect(Native Method)
       at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
       at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
       at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
       at java.net.Socket.connect(Socket.java:607)
       at 
org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:75)
       at 
org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
       ... 26 more {code}
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-3636
   - Type: Bug
   - Epic: https://issues.apache.org/jira/browse/HUDI-5425
   - Fix version(s):
     - 1.1.0
   
   
   ---
   
   
   ## Comments
   
   16/Mar/22 17:54;guoyihua;This also happens in a test with deltastreamer 
continuous mode writing COW table with async clustering and cleaning.;;;
   
   ---
   
   04/Apr/22 13:56;zhangyue19921010;Have a try but couldn't reproduce this 
error based on master branch. 
   According to above error logs, DS failed during creating marker file using 
timeline-server `connection refused`. Nothing to do with MDT I believe.;;;
   
   ---
   
   24/Jul/23 01:33;shivnarayan;* In Deltastreamer, we re-instantiate 
WriteClient whenever schema changes. Same write client is used by all async 
table services as well. This poses an issue, because the new write client when 
re-instantiated is intimated to the async table service. but if the async table 
service is in the middle of compaction, uses a local copy of write client. and 
hence may not be able to reach the timeline server and will run into connection 
issues. We are fixing this in this patch.
    * Spark streaming sink flow: We start a new write client during first batch 
and close it at the end. But keep re-using the same instance of writeClient for 
subsequent batches. Only core entity that is impacted here was the embedded 
timeline server since we were closing it when write client was closed. So, 
after batch1, if timeline server was enabled, pipeline will fail since timeline 
server is shutdown. 
    *  ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to