Re: [I] Spark rewrite Files Action OOM [iceberg]
manuzhang commented on issue #10054: URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2218139056 I've created a [draft PR](https://github.com/apache/iceberg/pull/10667) which has been verified in our environment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Spark rewrite Files Action OOM [iceberg]
pdames commented on issue #10054: URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2121567721 Any updates here @Zhanxiao-Ma? Would love to take a look at what you've implemented if you've got a pending PR to link back to this issue, and see if there's an opportunity to work together to improve the state of affairs here! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Spark rewrite Files Action OOM [iceberg]
manuzhang commented on issue #10054: URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2100175845 > I have implemented a disk-based map to solve this problem. Is this what Iceberg expects? If so, I will submit the code. @Zhanxiao-Ma I think it will be valuable to the community. Please open a PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Spark rewrite Files Action OOM [iceberg]
nk1506 commented on issue #10054: URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2041538414 > @nk1506 Echoing Russell's comments, how many small files are there in your OOM case? How much memory do you set up? I didn't use spark-engine for compaction. I was using Java Client API. My queries might distract from the original problem. Although my requirement is to compact very large datasets(say 10K datafiles) with single commit. Using [RewriteFiles](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/RewriteFiles.java#L171) always might cause OOM. So I am looking something which can help to manage manifestFiles more intelligently. I think I will start different thread to discuss the other problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Spark rewrite Files Action OOM [iceberg]
Zhanxiao-Ma commented on issue #10054: URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2041085307 > @nk1506 Echoing Russell's comments, how many small files are there in your OOM case? How much memory do you set up? @RussellSpitzer I believe increasing memory is not a good solution for dealing with excessive information deletion because it is impossible to predict how much memory would be appropriate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Spark rewrite Files Action OOM [iceberg]
manuzhang commented on issue #10054: URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2024928341 It's not forbidden to delete too many records but could increase memory required in the driver. If you are using position deletes, there's `rewrite_position_delete_files`. As for equality deletes, there was https://github.com/apache/iceberg/pull/2364 to rewrite equality deletes as position deletes but not merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org