[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]

2020-07-23 Thread GitBox


ssomuah edited a comment on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-663189699


   What do you mean by "runs serially with ingestion"? My understanding was 
that inline compaction happened in the same flow as writing so an inline 
compaction would simply slow down ingestion. 
   
   Does INLINE_COMPACT_NUM_DELTA_COMMITS_PROP refer to the number of commits 
retained in general, or the number of commits for a record? 
   
   I see in the timeline I have several clean.requested and clean.inflight, how 
can I get these to actually complete?
   
   What determines how many log files are created in each batch for a MOR table?
   
   EDIT:
   Is it possible to force a compaction of the existing log files.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]

2020-07-23 Thread GitBox


ssomuah edited a comment on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-663153261


   @bvaradar I think the issue I'm facing is due to configuration, but I can't 
pinpoint what it is. 
   
   I'm ending up with an extremely large number of files fo a single partition 
merge on read table. 
   
   I have tens of thousands of log files which I would have thought would get 
compacted into parquet at some point. 
   
   what volume of updates are working well for merge on read tables today?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]

2020-07-23 Thread GitBox


ssomuah edited a comment on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-663153261


   @bvaradar I think the issue I'm facing is due to configuration, but I can't 
pinpoint what it is. 
   
   I'm ending up with an extremely large number of files fo a single partition 
merge on read table. 
   
   I have tens of thousands of log files which I would have thought would get 
compacted into parquet at some point. 
   
   what volume of updates is working well for merge on read tables today?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]

2020-07-23 Thread GitBox


ssomuah edited a comment on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-663153261


   @bvaradar I think the issue I'm facing is due to configuration, but I can't 
pinpoint what it is. 
   
   I'm ending up with an extremely large number of files fo a single partition 
merge on read table. 
   
   I have tens of thousands of log files which I would have thought would get 
compacted into parquet at some point. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]

2020-07-21 Thread GitBox


ssomuah edited a comment on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-661970919


   I don't see any exceptions in the driver logs or executor logs. 
   
   I see these two warnings in driver logs
   ```
   20/07/21 13:12:28 WARN IncrementalTimelineSyncFileSystemView: Incremental 
Sync of timeline is turned off or deemed unsafe. Will revert to full syncing
   ```
   ```
   20/07/21 13:12:29 WARN CleanPlanner: Incremental Cleaning mode is enabled. 
Looking up partition-paths that have since changed since last cleaned at 
20200721032203. New Instant to retain : 
Option{val=[20200721032203__commit__COMPLETED]}
   ```
   
   These are the contests of the timeline 
   
[dot_hoodie_folder.txt](https://github.com/apache/hudi/files/4954820/dot_hoodie_folder.txt)
   
   The timeline only has files from the current day but I see log files in the 
data folder from over a week ago, do you have any idea what might be causing so 
many log files



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org