pengxianzi commented on issue #12585:
URL: https://github.com/apache/hudi/issues/12585#issuecomment-2574674708

   > For bucketed table are you referring to the bucket index of MOR table? One 
fact to know is that the writer would write pure avro logs at first so the 
streaming reader would also read these logs.
   > 
   > For streaming read we have an option value named "earliest" for the 
`read.start-commit` option, which is more straight-forward.
   > 
   > It looks like the waning log is normal because of the explicit specified 
read start commit, this log shows there when the commit to read has already 
been archived.
   
   Thank you for your help! We followed your suggestion and used the following 
configuration:
   
   options.put(FlinkOptions.READ_START_COMMIT.key(), "earliest");
   
   This configuration indeed resolved the read task lag issue, and we were able 
to read the Hudi table and write to the Kudu table normally. However, the task 
stopped after running for a while and threw the following error:
   
   org.apache.flink.runtime.executiongraph.ExecutionGraph [] - split_reader -> 
Sink:Unnamed(1/1) switched from INITIALIZING to FAILED on container_e30_xxx
   
   org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job switched 
from state RUNNING to FAILED
   
   org.apache.flink.runtime.JobException: Recovery is suppressed by 
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=3, 
backoffTimeMs=60000)
   
   Caused by: org.apache.hudi.exception.HoodieException: Get reader error for 
path: hdfs://nameservice1:xxx.parquet
   
   We tried to skip files by using the following configurations:
   
   options.put("read.streaming.skip_clustering", "true");  
   options.put("read.streaming.skip_compaction", "true");  
   
   Clean Policy:
   We used the following clean policy:
   
   options.put("hoodie.clean.automatic", "true");
   options.put("hoodie.cleaner.policy", "KEEP_LATEST_COMMITS");
   options.put("hoodie.cleaner.commits.retained", "5");
   options.put("hoodie.clean.async", "true");
   
   But the issue persists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to