[GitHub] [hudi] satishkotha edited a comment on issue #1866: [SUPPORT]Clean up does not seem to happen on MOR table

2020-07-24 Thread GitBox


satishkotha edited a comment on issue #1866:
URL: https://github.com/apache/hudi/issues/1866#issuecomment-663683323


   > Is there a possibility that commits get archived before clean job is 
resulting in a noop. I will continue to monitor.
   
   clean and archival are somewhat independent today. So this 'noop' should not 
happen.
   
   > Also can you confirm If I can run a clean job in a separate spark job 
concurrently while streaming write is happening, guess it should be fine as 
compaction runs have that ability
   
   Why are you considering separate spark job for clean? Are you seeing clean 
take a lot of time? You can consider running clean concurrently with write by 
setting 'hoodie.clean.async' to true. (This runs clean in same job, but 
concurrently with write). 
   
   I don't know of anyone using separate spark job to run clean. Theoretically, 
I think it is possible. But you may have to do some testing because it isn't 
used like this afaik.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha edited a comment on issue #1866: [SUPPORT]Clean up does not seem to happen on MOR table

2020-07-24 Thread GitBox


satishkotha edited a comment on issue #1866:
URL: https://github.com/apache/hudi/issues/1866#issuecomment-663683323


   > Is there a possibility that commits get archived before clean job is 
resulting in a noop. I will continue to monitor.
   
   clean and archival are somewhat independent today. So this 'noop' should not 
happen.
   
   > Also can you confirm If I can run a clean job in a separate spark job 
concurrently while streaming write is happening, guess it should be fine as 
compaction runs have that ability
   Why are you considering separate spark job for clean? Are you seeing clean 
take a lot of time? You can consider running clean concurrently with write by 
setting 'hoodie.clean.async' to true. (This runs clean in same job, but 
concurrently with write). 
   
   I don't know of anyone using separate spark job to run clean. Theoretically, 
I think it is possible. But you may have to do some testing because it isn't 
used like this afaik.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha edited a comment on issue #1866: [SUPPORT]Clean up does not seem to happen on MOR table

2020-07-24 Thread GitBox


satishkotha edited a comment on issue #1866:
URL: https://github.com/apache/hudi/issues/1866#issuecomment-663683323


   > Is there a possibility that commits get archived before clean job is 
resulting in a noop. I will continue to monitor.
   
   clean and archival are somewhat independent. So noop should not happen.
   
   > Also can you confirm If I can run a clean job in a separate spark job 
concurrently while streaming write is happening, guess it should be fine as 
compaction runs have that ability
   Why are you considering separate spark job for clean? Are you seeing clean 
take a lot of time? You can consider running clean concurrently with write by 
setting 'hoodie.clean.async' to true. (This runs clean in same job, but 
concurrently with write). 
   
   I don't know of anyone using separate spark job to run clean. Theoretically, 
I think it is possible. But you may have to do some testing because it isn't 
used like this afaik.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha edited a comment on issue #1866: [SUPPORT]Clean up does not seem to happen on MOR table

2020-07-23 Thread GitBox


satishkotha edited a comment on issue #1866:
URL: https://github.com/apache/hudi/issues/1866#issuecomment-663298411


   Hi @luffyd  
   
   By default, upsert on MOR tables creates 'deltacommits'.  
[Compaction](https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture#DesignAndArchitecture-Compaction)
 needs to run to convert deltacommits into commits. Clean works only after 
compaction runs and commits are created. Clean also does not remove file groups 
that have pending compaction.  Can you setup inline compaction [using 
instructions 
here](https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-HowdoIruncompactionforaMORdataset)
 for testing and see if that helps?
   
   If that doesn't work, can you share screenshot of files in .hoodie folder in 
'getHudiPath'



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org