[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]
ssomuah edited a comment on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-663189699 What do you mean by "runs serially with ingestion"? My understanding was that inline compaction happened in the same flow as writing so an inline compaction would simply slow down ingestion. Does INLINE_COMPACT_NUM_DELTA_COMMITS_PROP refer to the number of commits retained in general, or the number of commits for a record? I see in the timeline I have several clean.requested and clean.inflight, how can I get these to actually complete? What determines how many log files are created in each batch for a MOR table? EDIT: Is it possible to force a compaction of the existing log files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]
ssomuah edited a comment on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-663153261 @bvaradar I think the issue I'm facing is due to configuration, but I can't pinpoint what it is. I'm ending up with an extremely large number of files fo a single partition merge on read table. I have tens of thousands of log files which I would have thought would get compacted into parquet at some point. what volume of updates are working well for merge on read tables today? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]
ssomuah edited a comment on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-663153261 @bvaradar I think the issue I'm facing is due to configuration, but I can't pinpoint what it is. I'm ending up with an extremely large number of files fo a single partition merge on read table. I have tens of thousands of log files which I would have thought would get compacted into parquet at some point. what volume of updates is working well for merge on read tables today? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]
ssomuah edited a comment on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-663153261 @bvaradar I think the issue I'm facing is due to configuration, but I can't pinpoint what it is. I'm ending up with an extremely large number of files fo a single partition merge on read table. I have tens of thousands of log files which I would have thought would get compacted into parquet at some point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ssomuah edited a comment on issue #1852: [SUPPORT]
ssomuah edited a comment on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-661970919 I don't see any exceptions in the driver logs or executor logs. I see these two warnings in driver logs ``` 20/07/21 13:12:28 WARN IncrementalTimelineSyncFileSystemView: Incremental Sync of timeline is turned off or deemed unsafe. Will revert to full syncing ``` ``` 20/07/21 13:12:29 WARN CleanPlanner: Incremental Cleaning mode is enabled. Looking up partition-paths that have since changed since last cleaned at 20200721032203. New Instant to retain : Option{val=[20200721032203__commit__COMPLETED]} ``` These are the contests of the timeline [dot_hoodie_folder.txt](https://github.com/apache/hudi/files/4954820/dot_hoodie_folder.txt) The timeline only has files from the current day but I see log files in the data folder from over a week ago, do you have any idea what might be causing so many log files This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org