Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-24 Thread via GitHub


yunlou11 commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2434551384

   @rahil-c  please check if its true below:
   listWithPrefix maybe list too much unexpected files, such as 2 tables: 
sample ,  sample_part
   matchingFiles will contain:
   ```text
   ...
   DeleteOrphanFilesSparkAction: match files: 
s3a://ice-lake/warehouse/ice_db/sample/metadata/snap-4835616794401450947-1-f5bb6d24-162f-4c9d-a426-893c07cac506.avro
   DeleteOrphanFilesSparkAction: match files: 
s3a://ice-lake/warehouse/ice_db/sample_part/data/ds=20240806/0-3-0c28a14d-186c-489d-aeb8-f0a949a67297-0-2.parquet
   ...
   ```
   so sample_part table will lost its metadata files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-21 Thread via GitHub


MonkeyCanCode commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2427982933

   Confirm the issue is still there. After manually set the 
spark.hadoop.fs.s3.impl to S3A. If the client has S3 credential with needed 
access, it will work. However, if through credential vending from Polars, it 
can fail (in this case, client doesn't have S3 credential).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-17 Thread via GitHub


yunlou11 commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2421517117

   ```sql
   CALL nessie.system.remove_orphan_files(table => 
'nessie.robot_dev.robot_data')
   
   ```
   ```text
   Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No 
FileSystem for scheme "s3"
at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
   ```
   
   @RussellSpitzer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-11 Thread via GitHub


github-actions[bot] commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2408258018

   This pull request has been closed due to lack of activity. This is not a 
judgement on the merit of the PR in any way. It is just a way of keeping the PR 
queue manageable. If you think that is incorrect, or the pull request requires 
review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-11 Thread via GitHub


github-actions[bot] closed pull request #7914: Use SupportsPrefixOperations for 
Remove OrphanFile Procedure
URL: https://github.com/apache/iceberg/pull/7914


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-29 Thread via GitHub


flyrain commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736840318


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -330,11 +345,18 @@ private Dataset listedFileDS() {
 Broadcast conf = 
sparkContext().broadcast(hadoopConf);
 ListDirsRecursively listDirs = new ListDirsRecursively(conf, 
olderThanTimestamp, pathFilter);
 JavaRDD matchingLeafFileRDD = subDirRDD.mapPartitions(listDirs);
-
 JavaRDD completeMatchingFileRDD = 
matchingFileRDD.union(matchingLeafFileRDD);
 return spark().createDataset(completeMatchingFileRDD.rdd(), 
Encoders.STRING());
   }
 
+  private Dataset listedFileDS() {

Review Comment:
   I think it is called by method `actualFileIdentDS()`.



##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -330,11 +345,18 @@ private Dataset listedFileDS() {
 Broadcast conf = 
sparkContext().broadcast(hadoopConf);
 ListDirsRecursively listDirs = new ListDirsRecursively(conf, 
olderThanTimestamp, pathFilter);
 JavaRDD matchingLeafFileRDD = subDirRDD.mapPartitions(listDirs);
-
 JavaRDD completeMatchingFileRDD = 
matchingFileRDD.union(matchingLeafFileRDD);
 return spark().createDataset(completeMatchingFileRDD.rdd(), 
Encoders.STRING());
   }
 
+  private Dataset listedFileDS() {
+if (table.io() instanceof SupportsPrefixOperations) {

Review Comment:
   +1 for fallback



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-29 Thread via GitHub


flyrain commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736836729


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() {
 }
   }
 
-  private Dataset listedFileDS() {
+  private Dataset listWithPrefix() {
+List matchingFiles = Lists.newArrayList();
+Iterator iterator =
+((SupportsPrefixOperations) 
table.io()).listPrefix(location).iterator();

Review Comment:
   Should we fall back in case `table.io()` doesn't support interface 
`SupportsPrefixOperations`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-29 Thread via GitHub


flyrain commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736836729


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() {
 }
   }
 
-  private Dataset listedFileDS() {
+  private Dataset listWithPrefix() {
+List matchingFiles = Lists.newArrayList();
+Iterator iterator =
+((SupportsPrefixOperations) 
table.io()).listPrefix(location).iterator();

Review Comment:
   Should we fall back in case `table.io()` doesn't support interface 
`SupportsPrefixOperations`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-14 Thread via GitHub


steveloughran commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2288374116

   HadoopFileIO (and therefore the local fs) supports listPrefix. It'll need a 
CustomFileIO as with similar tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-13 Thread via GitHub


steveloughran commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1715740496


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -330,11 +345,18 @@ private Dataset listedFileDS() {
 Broadcast conf = 
sparkContext().broadcast(hadoopConf);
 ListDirsRecursively listDirs = new ListDirsRecursively(conf, 
olderThanTimestamp, pathFilter);
 JavaRDD matchingLeafFileRDD = subDirRDD.mapPartitions(listDirs);
-
 JavaRDD completeMatchingFileRDD = 
matchingFileRDD.union(matchingLeafFileRDD);
 return spark().createDataset(completeMatchingFileRDD.rdd(), 
Encoders.STRING());
   }
 
+  private Dataset listedFileDS() {

Review Comment:
   is this actually used? I can't see it being invoked



##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() {
 }
   }
 
-  private Dataset listedFileDS() {
+  private Dataset listWithPrefix() {
+List matchingFiles = Lists.newArrayList();

Review Comment:
   this is going to have fantastic speedups with S3 and any Hadoop FS which 
does deep listing (s3, gcs). 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-08 Thread via GitHub


rahil-c commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2216488472

   > The basic issue is we want to make sure we have a test which uses both the 
supportsPrefix enabled FS and one where it is not enabled to we are sure that 
both implementations remain correct.
   
   I see thank you for the clarification @RussellSpitzer @amogh-jahagirdar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-08 Thread via GitHub


RussellSpitzer commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2214536758

   The basic issue is we want to make sure we have a test which uses both the 
supportsPrefix enabled FS and one where it is not enabled to we are sure that 
both implementations remain correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-07 Thread via GitHub


rahil-c commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2212775830

   @RussellSpitzer @amogh-jahagirdar Wanted to understand what is the actual 
test needed for this change?  I saw this comment 
   ```
   We also need a test which exercises this code path, (Does HadoopFS do this 
by default? If So do we have a test for the other path:noprefix)
   ```
   However based on the diff of the pr the actual logic change is on the  list 
with prefix path.
   
   When checking `TestRemoveOrphanFilesAction` which uses 
`DeleteOrphanFilesSparkAction` my assumption is it would test list prefix as 
this test is using `HadoopTables` which use the `HadoopFileIO` which leverages 
the `SupportPrefixOperations`  interface.
   
   ```
@Override
 public Iterable listPrefix(String prefix) {
   Path prefixToList = new Path(prefix);
   FileSystem fs = Util.getFs(prefixToList, hadoopConf.get());
   
   return () -> {
 try {
   return Streams.stream(
   new AdaptingIterator<>(fs.listFiles(prefixToList, true /* 
recursive */)))
   .map(
   fileStatus ->
   new FileInfo(
   fileStatus.getPath().toString(),
   fileStatus.getLen(),
   fileStatus.getModificationTime()))
   .iterator();
 } catch (IOException e) {
   throw new UncheckedIOException(e);
 }
   };
 }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-03 Thread via GitHub


rahil-c commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2208107487

   Hi all sorry for the delay on this issue, been engaged in many internal 
things at work so did not get time to revisit this. 
   
   Originally when I encountered this issue it was a very specific feature I 
was working on with AWS LakeFormation and Iceberg integration hence I opened 
this PR, to solve that issue. It seems there are several people however that 
have been hitting issues around this Remove OrphanFile Procedure but unsure as 
to if its exactly the same issue that I mentioned in the overview. 
   
   In terms of the following issue `No FileSystem for scheme "s3".`, my 
understanding is the remove orphan file procedure is invoking the hadoop file 
system,  and if a user is trying to read a s3 path, hadoop does not understand 
naturally what this file scheme is. 
https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java#L356
   
   The mitigation would be to likely leverage `hadoop-aws` jar and configure 
spark with the appropriate hadoop aws configurations. In the iceberg aws docs: 
https://github.com/apache/iceberg/blob/main/docs/docs/aws.md#hadoop-s3a-filesystem
   ```
   Add 
[hadoop-aws](https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws) 
as a runtime dependency of your compute engine.
   Configure AWS settings based on [hadoop-aws 
documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html)
 (make sure you check the version, S3A configuration varies a lot based on the 
version you use).
   ```
   I think in users spark configurations they can try adding
   `"spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"`, as I 
saw a similar thread here: 
https://apache-iceberg.slack.com/archives/C03LG1D563F/p1656918500567629
   
   As for landing this PR will see if I can add tests based on @RussellSpitzer 
feedback.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-03 Thread via GitHub


schobe commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2207301578

   Hi , I am also facing the same issue while running orphan file clean up via 
Nessie REST. Auto-compaction and snapshot expiry works, but orphan file clean 
up procecure gives the same error. Is there any ETA on this fix? 
   
   _java.io.UncheckedIOException: 
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme 
"s3"_
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-04-04 Thread via GitHub


carlosescura commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2036615576

   @rahil-c is there any possibility to continue working on this PR? Many of us 
would really appreciate it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-03-19 Thread via GitHub


nastra commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2006792641

   @carlosescura the issue itself hasn't be solved yet. I'm not sure if 
@rahil-c is actively working on this issue. If not, maybe someone else from the 
community is interested in working on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-03-19 Thread via GitHub


carlosescura commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2006763129

   @lokeshrdy  Still doesn't work using Spark `3.5.0`and Iceberg `1.5.0` and 
Glue as catalog with the following config:
   
   ```
   SPARK_SETTINGS = [
   (
   "spark.jars",
   """
   /opt/spark/jars/iceberg-aws-bundle-1.5.0.jar,
   /opt/spark/jars/iceberg-spark-runtime-3.5_2.12-1.5.0.jar,
   /opt/spark/jars/aws-java-sdk-bundle-1.12.262.jar,
   /opt/spark/jars/hadoop-aws-3.3.4.jar
   """,
   ),
   ("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"),
   ("spark.hadoop.com.amazonaws.services.s3.enableV4", "true"),
   (
   "spark.sql.extensions",
   "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
   ),
   (
   "spark.sql.catalog.main_catalog",
   "org.apache.iceberg.spark.SparkCatalog",
   ),
   (
   "spark.sql.catalog.main_catalog.catalog-impl",
   "org.apache.iceberg.aws.glue.GlueCatalog",
   ),
   (
   "spark.sql.catalog.main_catalog.io-impl",
   "org.apache.iceberg.aws.s3.S3FileIO",
   ),
   (
   "spark.sql.catalog.main_catalog.warehouse",
   ICEBERG_CATALOG_WHAREHOUSE,
   ),
   ]
   ```
   
   I had to add `hadoop-aws-3.3.4.jar` to be able to download some CSVs and 
load them as Spark DF.
   
   When calling the `remove_orphan_files` procedure I get the following 
exception:
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o46.sql.
   : java.io.UncheckedIOException: 
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme 
"s3"
at 
org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.listDirRecursively(DeleteOrphanFilesSparkAction.java:386)
at 
org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.listedFileDS(DeleteOrphanFilesSparkAction.java:311)
at 
org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.actualFileIdentDS(DeleteOrphanFilesSparkAction.java:296)
at 
org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.doExecute(DeleteOrphanFilesSparkAction.java:247)
at 
org.apache.iceberg.spark.JobGroupUtils.withJobGroupInfo(JobGroupUtils.java:59)
at 
org.apache.iceberg.spark.JobGroupUtils.withJobGroupInfo(JobGroupUtils.java:51)
at 
org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:130)
at 
org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.execute(DeleteOrphanFilesSparkAction.java:223)
at 
org.apache.iceberg.spark.procedures.RemoveOrphanFilesProcedure.lambda$call$3(RemoveOrphanFilesProcedure.java:185)
at 
org.apache.iceberg.spark.procedures.BaseProcedure.execute(BaseProcedure.java:107)
at 
org.apache.iceberg.spark.procedures.BaseProcedure.withIcebergTable(BaseProcedure.java:96)
at 
org.apache.iceberg.spark.procedures.RemoveOrphanFilesProcedure.call(RemoveOrphanFilesProcedure.java:139)
at 
org.apache.spark.sql.execution.datasources.v2.CallExec.run(CallExec.scala:34)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWi

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-03-06 Thread via GitHub


lokeshrdy commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1980642069

   same issue here . let me know if anyone solved this with latest version? 
@carlosescura @domonkosbalogh-seon @rahil-c 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-02-12 Thread via GitHub


carlosescura commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1939197029

   Same issue here. I can't run the `remove_orphan_files` procedure using Glue 
and S3 😢 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-01-04 Thread via GitHub


domonkosbalogh-seon commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1877212931

   Ran into a similar issue (same as in 
https://github.com/apache/iceberg/issues/8368) using the Glue Catalog. Is there 
maybe a workaround to this, or this PR would be the only fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2023-11-22 Thread via GitHub


lyohar commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1822629812

   Got similar issue in 1.4.2, spark 3.5
   
   My iceberg catalogue in Spark is configured via 
org.apache.iceberg.aws.s3.S3FileIO  filesystem. I store files using s3 prefix;
   Hovewer, when trying to clean files i get an error 
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme 
"s3".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org