Re: [I] Spark rewrite Files Action OOM [iceberg]

2024-07-09 Thread via GitHub


manuzhang commented on issue #10054:
URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2218139056

   I've created a [draft PR](https://github.com/apache/iceberg/pull/10667) 
which has been verified in our environment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Spark rewrite Files Action OOM [iceberg]

2024-05-20 Thread via GitHub


pdames commented on issue #10054:
URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2121567721

   Any updates here @Zhanxiao-Ma? Would love to take a look at what you've 
implemented if you've got a pending PR to link back to this issue, and see if 
there's an opportunity to work together to improve the state of affairs here!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Spark rewrite Files Action OOM [iceberg]

2024-05-08 Thread via GitHub


manuzhang commented on issue #10054:
URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2100175845

   > I have implemented a disk-based map to solve this problem. Is this what 
Iceberg expects? If so, I will submit the code.
   @Zhanxiao-Ma I think it will be valuable to the community. Please open a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Spark rewrite Files Action OOM [iceberg]

2024-04-07 Thread via GitHub


nk1506 commented on issue #10054:
URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2041538414

   > @nk1506 Echoing Russell's comments, how many small files are there in your 
OOM case? How much memory do you set up?
   
   I didn't use spark-engine for compaction. I was using Java Client API. My 
queries might distract from the original problem. Although my requirement is to 
compact very large datasets(say 10K datafiles) with single commit. Using 
[RewriteFiles](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/RewriteFiles.java#L171)
 always might cause OOM. So I am looking something which can help to manage 
manifestFiles more intelligently. I think I will start different thread to 
discuss the other problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Spark rewrite Files Action OOM [iceberg]

2024-04-06 Thread via GitHub


Zhanxiao-Ma commented on issue #10054:
URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2041085307

   > @nk1506 Echoing Russell's comments, how many small files are there in your 
OOM case? How much memory do you set up?
   
   @RussellSpitzer I believe increasing memory is not a good solution for 
dealing with excessive information deletion because it is impossible to predict 
how much memory would be appropriate. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Spark rewrite Files Action OOM [iceberg]

2024-03-28 Thread via GitHub


manuzhang commented on issue #10054:
URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2024928341

   It's not forbidden to delete too many records but could increase memory 
required in the driver. If you are using position deletes, there's 
`rewrite_position_delete_files`. As for equality deletes, there was 
https://github.com/apache/iceberg/pull/2364 to rewrite equality deletes as 
position deletes but not merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org