Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
yunlou11 commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2434551384 @rahil-c please check if its true below: listWithPrefix maybe list too much unexpected files, such as 2 tables: sample , sample_part matchingFiles will contain: ```text ... DeleteOrphanFilesSparkAction: match files: s3a://ice-lake/warehouse/ice_db/sample/metadata/snap-4835616794401450947-1-f5bb6d24-162f-4c9d-a426-893c07cac506.avro DeleteOrphanFilesSparkAction: match files: s3a://ice-lake/warehouse/ice_db/sample_part/data/ds=20240806/0-3-0c28a14d-186c-489d-aeb8-f0a949a67297-0-2.parquet ... ``` so sample_part table will lost its metadata files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
MonkeyCanCode commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2427982933 Confirm the issue is still there. After manually set the spark.hadoop.fs.s3.impl to S3A. If the client has S3 credential with needed access, it will work. However, if through credential vending from Polars, it can fail (in this case, client doesn't have S3 credential). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
yunlou11 commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2421517117 ```sql CALL nessie.system.remove_orphan_files(table => 'nessie.robot_dev.robot_data') ``` ```text Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3" at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521) ``` @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
github-actions[bot] commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2408258018 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
github-actions[bot] closed pull request #7914: Use SupportsPrefixOperations for Remove OrphanFile Procedure URL: https://github.com/apache/iceberg/pull/7914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
flyrain commented on code in PR #7914: URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736840318 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -330,11 +345,18 @@ private Dataset listedFileDS() { Broadcast conf = sparkContext().broadcast(hadoopConf); ListDirsRecursively listDirs = new ListDirsRecursively(conf, olderThanTimestamp, pathFilter); JavaRDD matchingLeafFileRDD = subDirRDD.mapPartitions(listDirs); - JavaRDD completeMatchingFileRDD = matchingFileRDD.union(matchingLeafFileRDD); return spark().createDataset(completeMatchingFileRDD.rdd(), Encoders.STRING()); } + private Dataset listedFileDS() { Review Comment: I think it is called by method `actualFileIdentDS()`. ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -330,11 +345,18 @@ private Dataset listedFileDS() { Broadcast conf = sparkContext().broadcast(hadoopConf); ListDirsRecursively listDirs = new ListDirsRecursively(conf, olderThanTimestamp, pathFilter); JavaRDD matchingLeafFileRDD = subDirRDD.mapPartitions(listDirs); - JavaRDD completeMatchingFileRDD = matchingFileRDD.union(matchingLeafFileRDD); return spark().createDataset(completeMatchingFileRDD.rdd(), Encoders.STRING()); } + private Dataset listedFileDS() { +if (table.io() instanceof SupportsPrefixOperations) { Review Comment: +1 for fallback -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
flyrain commented on code in PR #7914: URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736836729 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() { } } - private Dataset listedFileDS() { + private Dataset listWithPrefix() { +List matchingFiles = Lists.newArrayList(); +Iterator iterator = +((SupportsPrefixOperations) table.io()).listPrefix(location).iterator(); Review Comment: Should we fall back in case `table.io()` doesn't support interface `SupportsPrefixOperations`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
flyrain commented on code in PR #7914: URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736836729 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() { } } - private Dataset listedFileDS() { + private Dataset listWithPrefix() { +List matchingFiles = Lists.newArrayList(); +Iterator iterator = +((SupportsPrefixOperations) table.io()).listPrefix(location).iterator(); Review Comment: Should we fall back in case `table.io()` doesn't support interface `SupportsPrefixOperations`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
steveloughran commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2288374116 HadoopFileIO (and therefore the local fs) supports listPrefix. It'll need a CustomFileIO as with similar tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
steveloughran commented on code in PR #7914: URL: https://github.com/apache/iceberg/pull/7914#discussion_r1715740496 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -330,11 +345,18 @@ private Dataset listedFileDS() { Broadcast conf = sparkContext().broadcast(hadoopConf); ListDirsRecursively listDirs = new ListDirsRecursively(conf, olderThanTimestamp, pathFilter); JavaRDD matchingLeafFileRDD = subDirRDD.mapPartitions(listDirs); - JavaRDD completeMatchingFileRDD = matchingFileRDD.union(matchingLeafFileRDD); return spark().createDataset(completeMatchingFileRDD.rdd(), Encoders.STRING()); } + private Dataset listedFileDS() { Review Comment: is this actually used? I can't see it being invoked ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() { } } - private Dataset listedFileDS() { + private Dataset listWithPrefix() { +List matchingFiles = Lists.newArrayList(); Review Comment: this is going to have fantastic speedups with S3 and any Hadoop FS which does deep listing (s3, gcs). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
rahil-c commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2216488472 > The basic issue is we want to make sure we have a test which uses both the supportsPrefix enabled FS and one where it is not enabled to we are sure that both implementations remain correct. I see thank you for the clarification @RussellSpitzer @amogh-jahagirdar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
RussellSpitzer commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2214536758 The basic issue is we want to make sure we have a test which uses both the supportsPrefix enabled FS and one where it is not enabled to we are sure that both implementations remain correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
rahil-c commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2212775830 @RussellSpitzer @amogh-jahagirdar Wanted to understand what is the actual test needed for this change? I saw this comment ``` We also need a test which exercises this code path, (Does HadoopFS do this by default? If So do we have a test for the other path:noprefix) ``` However based on the diff of the pr the actual logic change is on the list with prefix path. When checking `TestRemoveOrphanFilesAction` which uses `DeleteOrphanFilesSparkAction` my assumption is it would test list prefix as this test is using `HadoopTables` which use the `HadoopFileIO` which leverages the `SupportPrefixOperations` interface. ``` @Override public Iterable listPrefix(String prefix) { Path prefixToList = new Path(prefix); FileSystem fs = Util.getFs(prefixToList, hadoopConf.get()); return () -> { try { return Streams.stream( new AdaptingIterator<>(fs.listFiles(prefixToList, true /* recursive */))) .map( fileStatus -> new FileInfo( fileStatus.getPath().toString(), fileStatus.getLen(), fileStatus.getModificationTime())) .iterator(); } catch (IOException e) { throw new UncheckedIOException(e); } }; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
rahil-c commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2208107487 Hi all sorry for the delay on this issue, been engaged in many internal things at work so did not get time to revisit this. Originally when I encountered this issue it was a very specific feature I was working on with AWS LakeFormation and Iceberg integration hence I opened this PR, to solve that issue. It seems there are several people however that have been hitting issues around this Remove OrphanFile Procedure but unsure as to if its exactly the same issue that I mentioned in the overview. In terms of the following issue `No FileSystem for scheme "s3".`, my understanding is the remove orphan file procedure is invoking the hadoop file system, and if a user is trying to read a s3 path, hadoop does not understand naturally what this file scheme is. https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java#L356 The mitigation would be to likely leverage `hadoop-aws` jar and configure spark with the appropriate hadoop aws configurations. In the iceberg aws docs: https://github.com/apache/iceberg/blob/main/docs/docs/aws.md#hadoop-s3a-filesystem ``` Add [hadoop-aws](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws) as a runtime dependency of your compute engine. Configure AWS settings based on [hadoop-aws documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html) (make sure you check the version, S3A configuration varies a lot based on the version you use). ``` I think in users spark configurations they can try adding `"spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"`, as I saw a similar thread here: https://apache-iceberg.slack.com/archives/C03LG1D563F/p1656918500567629 As for landing this PR will see if I can add tests based on @RussellSpitzer feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
schobe commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2207301578 Hi , I am also facing the same issue while running orphan file clean up via Nessie REST. Auto-compaction and snapshot expiry works, but orphan file clean up procecure gives the same error. Is there any ETA on this fix? _java.io.UncheckedIOException: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
carlosescura commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2036615576 @rahil-c is there any possibility to continue working on this PR? Many of us would really appreciate it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
nastra commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2006792641 @carlosescura the issue itself hasn't be solved yet. I'm not sure if @rahil-c is actively working on this issue. If not, maybe someone else from the community is interested in working on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
carlosescura commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2006763129 @lokeshrdy Still doesn't work using Spark `3.5.0`and Iceberg `1.5.0` and Glue as catalog with the following config: ``` SPARK_SETTINGS = [ ( "spark.jars", """ /opt/spark/jars/iceberg-aws-bundle-1.5.0.jar, /opt/spark/jars/iceberg-spark-runtime-3.5_2.12-1.5.0.jar, /opt/spark/jars/aws-java-sdk-bundle-1.12.262.jar, /opt/spark/jars/hadoop-aws-3.3.4.jar """, ), ("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"), ("spark.hadoop.com.amazonaws.services.s3.enableV4", "true"), ( "spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", ), ( "spark.sql.catalog.main_catalog", "org.apache.iceberg.spark.SparkCatalog", ), ( "spark.sql.catalog.main_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog", ), ( "spark.sql.catalog.main_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO", ), ( "spark.sql.catalog.main_catalog.warehouse", ICEBERG_CATALOG_WHAREHOUSE, ), ] ``` I had to add `hadoop-aws-3.3.4.jar` to be able to download some CSVs and load them as Spark DF. When calling the `remove_orphan_files` procedure I get the following exception: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o46.sql. : java.io.UncheckedIOException: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3" at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.listDirRecursively(DeleteOrphanFilesSparkAction.java:386) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.listedFileDS(DeleteOrphanFilesSparkAction.java:311) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.actualFileIdentDS(DeleteOrphanFilesSparkAction.java:296) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.doExecute(DeleteOrphanFilesSparkAction.java:247) at org.apache.iceberg.spark.JobGroupUtils.withJobGroupInfo(JobGroupUtils.java:59) at org.apache.iceberg.spark.JobGroupUtils.withJobGroupInfo(JobGroupUtils.java:51) at org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:130) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.execute(DeleteOrphanFilesSparkAction.java:223) at org.apache.iceberg.spark.procedures.RemoveOrphanFilesProcedure.lambda$call$3(RemoveOrphanFilesProcedure.java:185) at org.apache.iceberg.spark.procedures.BaseProcedure.execute(BaseProcedure.java:107) at org.apache.iceberg.spark.procedures.BaseProcedure.withIcebergTable(BaseProcedure.java:96) at org.apache.iceberg.spark.procedures.RemoveOrphanFilesProcedure.call(RemoveOrphanFilesProcedure.java:139) at org.apache.spark.sql.execution.datasources.v2.CallExec.run(CallExec.scala:34) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWi
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
lokeshrdy commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1980642069 same issue here . let me know if anyone solved this with latest version? @carlosescura @domonkosbalogh-seon @rahil-c -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
carlosescura commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1939197029 Same issue here. I can't run the `remove_orphan_files` procedure using Glue and S3 😢 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
domonkosbalogh-seon commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1877212931 Ran into a similar issue (same as in https://github.com/apache/iceberg/issues/8368) using the Glue Catalog. Is there maybe a workaround to this, or this PR would be the only fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
lyohar commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1822629812 Got similar issue in 1.4.2, spark 3.5 My iceberg catalogue in Spark is configured via org.apache.iceberg.aws.s3.S3FileIO filesystem. I store files using s3 prefix; Hovewer, when trying to clean files i get an error org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org