[GitHub] [hudi] tooptoop4 edited a comment on issue #1833: [SUPPORT] 100% update on 10mn keys in single partition slow

2020-07-20 Thread GitBox


tooptoop4 edited a comment on issue #1833:
URL: https://github.com/apache/hudi/issues/1833#issuecomment-660715533


   @bvaradar i noticed "There is insufficient memory for the Java Runtime 
Environment to continue." error so i reduced SPARK_WORKER_MEMORY (ie leave more 
room for OS memory). Now the timings I get are: 43mins for 
hoodie.bloom.index.bucketized.checking = false. 59 mins for 
hoodie.bloom.index.bucketized.checking = true.
   
   **hoodie.bloom.index.bucketized.checking = false**
   
   
![image](https://user-images.githubusercontent.com/33283496/87885750-b4829b80-ca0f-11ea-99d9-195b3a6cc562.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885776-dda32c00-ca0f-11ea-9f8e-e9c15ead96c2.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885794-fad7fa80-ca0f-11ea-8d16-b5a290676525.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885812-1e9b4080-ca10-11ea-9ac7-e3a487f4a8b7.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885847-5dc99180-ca10-11ea-9a13-fbef57f240b3.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885876-91a4b700-ca10-11ea-906b-563cd0d25d55.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885894-bb5dde00-ca10-11ea-977a-681a3c7b4d1c.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885907-d6305280-ca10-11ea-8f2d-aeec67b1916b.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885922-f19b5d80-ca10-11ea-8359-5fc0adecb8cb.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885930-07a91e00-ca11-11ea-84f5-379f1953ad67.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885947-1e4f7500-ca11-11ea-81cb-977a289eba53.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885961-4343e800-ca11-11ea-9f7d-bea8d5a47012.png)
   
![image](https://user-images.githubusercontent.com/33283496/87885972-5eaef300-ca11-11ea-82a2-3dcc70474d5c.png)
   
   
   
   
   **hoodie.bloom.index.bucketized.checking = true**
   
   
   
   
![image](https://user-images.githubusercontent.com/33283496/87886008-a03f9e00-ca11-11ea-9a23-acccedbcae29.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886021-bd746c80-ca11-11ea-986f-ce83b8430869.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886046-e85ec080-ca11-11ea-99d0-52fe4d7bdc2d.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886069-09271600-ca12-11ea-8bab-e06ccb503e80.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886091-2bb92f00-ca12-11ea-9d00-561ef63bcabf.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886110-4be8ee00-ca12-11ea-9eb3-d17de793bb9b.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886117-63c07200-ca12-11ea-97fa-7655500c3848.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886131-79ce3280-ca12-11ea-898b-bbaca156fd91.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886152-95393d80-ca12-11ea-8b90-c6f6c52bff94.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886164-ac782b00-ca12-11ea-8231-e147ad4376b5.png)
   
![image](https://user-images.githubusercontent.com/33283496/87886171-bc900a80-ca12-11ea-9ef6-a7b680d2943a.png)
   
   
   i wonder if https://issues.apache.org/jira/browse/SPARK-27734 is causing the 
memory issues



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] tooptoop4 edited a comment on issue #1833: [SUPPORT] 100% update on 10mn keys in single partition slow

2020-07-16 Thread GitBox


tooptoop4 edited a comment on issue #1833:
URL: https://github.com/apache/hudi/issues/1833#issuecomment-659327491


   Actually was using 0.4.6-SNAPSHOT before bucketized.checking code landed. I 
changed hoodie.bloom.index.bucketized.checking to false on hudi 0.5.3 and time 
down to 107mins :) 
   
   hudi 0.5.3 in local mode with hoodie.bloom.index.bucketized.checking to 
false takes 122mins
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] tooptoop4 edited a comment on issue #1833: [SUPPORT] 100% update on 10mn keys in single partition slow

2020-07-16 Thread GitBox


tooptoop4 edited a comment on issue #1833:
URL: https://github.com/apache/hudi/issues/1833#issuecomment-659327491


   Actually was using 0.4.6-SNAPSHOT before bucketized.checking code landed. I 
changed hoodie.bloom.index.bucketized.checking to false on hudi 0.5.3 and time 
down to 107mins :) 
   
   hudi 0.5.3 in local mode takes 122mins
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org