[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-06 Thread GitBox


zherenyu831 commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-814540364


   @satishkotha 
   Thank you for the explaining.
   Totally understood. 
   Since insert_overwrite_table every time creating version of files, in this 
case,
   instead of using KEEP_LATEST_VERSIONS, KEEP_LATEST_COMMITS make more sense.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-06 Thread GitBox


zherenyu831 commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-814157136


   @satishkotha cc @bvaradar 
   I found the problem why cleaner only keep one file version 
   
https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L216-L217
   
   I think it breaks the isolation, do you have any plan?

   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-03-31 Thread GitBox


zherenyu831 commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-811565897


   @jsbali 
   For make everything clear:
   On archiving:
   Since the first replacecommit have 0 partitionToReplaceFileIds, and it 
failed at
   
https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/ReplaceArchivalHelper.java#L73
   
   with error `Positive number of partitions required`
   
   Solution is 
   Manually delete first replacecommit or  ignore deletion on  0 
partitionToReplaceFileIds replacecommit  in the code as your mentioned on your 
ticket
   
   After I deleted the first replacecommit, the archiving finished successfully 
   But I got another issue, 
   https://github.com/apache/hudi/issues/2707#issuecomment-804831651
   
   This is not related to what I deleted, commit time is different
   Once the commits has been deleted on archiving process, hudi tried to load 
timeline again without refresh it.
   
   I didn't debug more, since I want to ask the developers about how they want 
to deal with replaceFile deletion.
   seems like someone want to use cleaner to handle it.
   https://github.com/apache/hudi/issues/2707#issuecomment-804849028
   
   Unfortunately, I also found cleaner is not working well with 
insert_overwrite_table, 
   it only keep one file group
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-03-23 Thread GitBox


zherenyu831 commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-805372384


   @satishkotha 
   Sorry I parste wrong log, I updated the log, please check again.
   
   When archiving finished, seems metaclient didn't reload, so it's trying to 
read the archived commit, then we got the error 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-03-23 Thread GitBox


zherenyu831 commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-804849028


   Also saw some comment and find below link
   https://issues.apache.org/jira/browse/HUDI-1518
   Seems you are going to use cleaner to delete the file group,
   but currently, if I use cleaner, there is only one file group will be kept


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-03-23 Thread GitBox


zherenyu831 commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-804831651


   But got new error
   
   ```
   User class threw exception: org.apache.hudi.exception.HoodieIOException: 
Could not read commit details from s3://xxx/.hoodie/20210317155538.replacecommit
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530)
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
   at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
   at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
   at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
   at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
   at 
java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
   at 
org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
   at 
org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179)
   at 
org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112)
   at 
org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieClient.java:90)
   at 
org.apache.hudi.client.AbstractHoodieClient.close(AbstractHoodieClient.java:82)
   at 
org.apache.hudi.client.AbstractHoodieWriteClient.close(AbstractHoodieWriteClient.java:912)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:464)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218)
   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
   at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
   at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
   at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   at 
jp.ne.paypay.daas.dataloader.writer.HudiDataWriter.write(HudiDataWriter.scala:272)
   at 
jp.ne.paypay.daas.dataloader.writer.HudiDataWriter.insertOverrideTable(HudiDataWriter.scala:161)
   at 
jp.ne.paypay.daas.dataloader.FileSystemJob$.mainProcedure(FileSystemJob.scala:107)
   at jp.ne.paypay.daas.dataloader.FileSystemJob$.main(FileSystemJob.scala:38)
   at jp.ne.paypay.daas.dataloader.FileSystemJob.main(FileSystemJob.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 

[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-03-23 Thread GitBox


zherenyu831 commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-804762627


   simple workaround is delete first commit file 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org