[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
zherenyu831 commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-814540364 @satishkotha Thank you for the explaining. Totally understood. Since insert_overwrite_table every time creating version of files, in this case, instead of using KEEP_LATEST_VERSIONS, KEEP_LATEST_COMMITS make more sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
zherenyu831 commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-814157136 @satishkotha cc @bvaradar I found the problem why cleaner only keep one file version https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L216-L217 I think it breaks the isolation, do you have any plan? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
zherenyu831 commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-811565897 @jsbali For make everything clear: On archiving: Since the first replacecommit have 0 partitionToReplaceFileIds, and it failed at https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/ReplaceArchivalHelper.java#L73 with error `Positive number of partitions required` Solution is Manually delete first replacecommit or ignore deletion on 0 partitionToReplaceFileIds replacecommit in the code as your mentioned on your ticket After I deleted the first replacecommit, the archiving finished successfully But I got another issue, https://github.com/apache/hudi/issues/2707#issuecomment-804831651 This is not related to what I deleted, commit time is different Once the commits has been deleted on archiving process, hudi tried to load timeline again without refresh it. I didn't debug more, since I want to ask the developers about how they want to deal with replaceFile deletion. seems like someone want to use cleaner to handle it. https://github.com/apache/hudi/issues/2707#issuecomment-804849028 Unfortunately, I also found cleaner is not working well with insert_overwrite_table, it only keep one file group -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
zherenyu831 commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-805372384 @satishkotha Sorry I parste wrong log, I updated the log, please check again. When archiving finished, seems metaclient didn't reload, so it's trying to read the archived commit, then we got the error -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
zherenyu831 commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-804849028 Also saw some comment and find below link https://issues.apache.org/jira/browse/HUDI-1518 Seems you are going to use cleaner to delete the file group, but currently, if I use cleaner, there is only one file group will be kept -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
zherenyu831 commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-804831651 But got new error ``` User class threw exception: org.apache.hudi.exception.HoodieIOException: Could not read commit details from s3://xxx/.hoodie/20210317155538.replacecommit at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248) at org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353) at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707) at org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118) at org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179) at org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112) at org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieClient.java:90) at org.apache.hudi.client.AbstractHoodieClient.close(AbstractHoodieClient.java:82) at org.apache.hudi.client.AbstractHoodieWriteClient.close(AbstractHoodieWriteClient.java:912) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:464) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at jp.ne.paypay.daas.dataloader.writer.HudiDataWriter.write(HudiDataWriter.scala:272) at jp.ne.paypay.daas.dataloader.writer.HudiDataWriter.insertOverrideTable(HudiDataWriter.scala:161) at jp.ne.paypay.daas.dataloader.FileSystemJob$.mainProcedure(FileSystemJob.scala:107) at jp.ne.paypay.daas.dataloader.FileSystemJob$.main(FileSystemJob.scala:38) at jp.ne.paypay.daas.dataloader.FileSystemJob.main(FileSystemJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[GitHub] [hudi] zherenyu831 commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
zherenyu831 commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-804762627 simple workaround is delete first commit file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org