[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5392: [HUDI-3942] [RFC-50] Improve Timeline Server
yuzhaojing commented on code in PR #5392: URL: https://github.com/apache/hudi/pull/5392#discussion_r878878012 ## rfc/rfc-50/rfc-50.md: ## @@ -0,0 +1,93 @@ + + +# RFC-50: Improve Timeline Server + +## Proposers +- @yuzhaojing + +## Approvers + - @xushiyan + - @danny0405 + +## Abstract + +Support client to obtain timeline from timeline server. + +## Background + +At its core, Hudi maintains a timeline of all actions performed on the table at different instants of time. Before each operation is performed on the Hoodie table, the information of the HUDI table needs to be obtained through the timeline. At present, there are two ways to obtain the timeline of HUDI : +- Create a MetaClient and get the complete timeline through MetaClient #getActiveTimeline, which will directly scan the HDFS directory of metadata +- Get the timeline through FileSystemView#getTimeline. This timeline is the cache timeline obtained by requesting the Embedded timeline service. There is no need to repeatedly scan the HDFS directory of metadata, but this timeline only contains completed instants + +### Problem description + +- HUDI designs the Timeline service for processing and caching when accessing metadata , but currently does not converge all access to metadata to the Timeline service, such as the acquisition of a complete timeline. +- When the number of tasks written increases, a large number of repeated access to metadata will lead to high HDFS NameNode requests, causing greater pressure and not easy to expand. + +### Spark and Flink write flow comparison diagram + +Since Hudi is designed based on the Spark micro-batch model, in the Spark write process, all operations on the timeline are completed on the driver side, and then distributed to the executor side to start the write operation. + +But for Flink , Write tasks are resident services due to their pure streaming model. There is also no highly reliable communication mechanism between the user-side JM and the TM in Flink, so the TM needs to obtain the latest instant by polling the timeline for writing. + +![](ComparisonDiagram.png) + +### Current + +![](CurrentDesign.png) + +The current design implementation has two main problems with the convergence timeline +- Since the timeline of the task is pulled from the Embedded timeline service, the refresh mechanism of the Embedded timeline service itself will doesn't work +- MetaClient and HoodieTable are decoupled. Obtain the timeline in MetaClient and then request the Embedded timeline service to obtain file-related information through the FileSystemViewManager in HoodieTable combined with the timeline. There are circular dependencies and problems in the case of using MetaClient alone without creating HoodieTable + +## Implementation + +### Design target + +The goal of this solution is to converge the acquisition of timelines and obtain them through the Embedded timeline service uniformly. The timeline is pulled through HDFS only when the Embedded timeline service is not started. + +### Converge the request to loop instant in Flink to JM + +- Store the latest instant on the Embedded Timeline Server. Every time JM modifies the instant state, it actively performs a sync to Embedded Timeline Server +- Return the latest instant directly when the task pulls the latest instant + +### Converge the request to pull instant in meta client initialization to JM + +- Abstract the timeline-related acquisition methods into the new interface TableTimelineView, and create the corresponding TimelineViewManager in MetaClient, and obtain the timeline through TimelineViewManager. + +![](Design.png) + +### Flink optimization before and after schematic diagram + +![](SchematicDiagram.png) + +## Rollout/Adoption Plan + +- What impact (if any) will there be on existing users? +- Since the Embedded Timeline Service is used to pull the timeline, users who use flink to write to hudi will observe that file system requests are greatly reduced, thereby reducing the pressure on the file system. Review Comment: Okay, This will not affect the Spark user, because spark only get the timeline from the driver, and the behavior of the timeline accessed by the driver is almost the same as before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5392: [HUDI-3942] [RFC-50] Improve Timeline Server
yuzhaojing commented on code in PR #5392: URL: https://github.com/apache/hudi/pull/5392#discussion_r878705378 ## rfc/rfc-50/rfc-50.md: ## @@ -0,0 +1,93 @@ + + +# RFC-50: Improve Timeline Server + +## Proposers +- @yuzhaojing + +## Approvers + - @xushiyan + - @danny0405 + +## Abstract + +Support client to obtain timeline from timeline server. + +## Background + +At its core, Hudi maintains a timeline of all actions performed on the table at different instants of time. Before each operation is performed on the Hoodie table, the information of the HUDI table needs to be obtained through the timeline. At present, there are two ways to obtain the timeline of HUDI : +- Create a MetaClient and get the complete timeline through MetaClient #getActiveTimeline, which will directly scan the HDFS directory of metadata +- Get the timeline through FileSystemView#getTimeline. This timeline is the cache timeline obtained by requesting the Embedded timeline service. There is no need to repeatedly scan the HDFS directory of metadata, but this timeline only contains completed instants + +### Problem description + +- HUDI designs the Timeline service for processing and caching when accessing metadata , but currently does not converge all access to metadata to the Timeline service, such as the acquisition of a complete timeline. +- When the number of tasks written increases, a large number of repeated access to metadata will lead to high HDFS NameNode requests, causing greater pressure and not easy to expand. + +### Spark and Flink write flow comparison diagram + +Since Hudi is designed based on the Spark micro-batch model, in the Spark write process, all operations on the timeline are completed on the driver side, and then distributed to the executor side to start the write operation. + +But for Flink , Write tasks are resident services due to their pure streaming model. There is also no highly reliable communication mechanism between the user-side JM and the TM in Flink, so the TM needs to obtain the latest instant by polling the timeline for writing. + +![](ComparisonDiagram.png) + +### Current + +![](CurrentDesign.png) + +The current design implementation has two main problems with the convergence timeline +- Since the timeline of the task is pulled from the Embedded timeline service, the refresh mechanism of the Embedded timeline service itself will doesn't work +- MetaClient and HoodieTable are decoupled. Obtain the timeline in MetaClient and then request the Embedded timeline service to obtain file-related information through the FileSystemViewManager in HoodieTable combined with the timeline. There are circular dependencies and problems in the case of using MetaClient alone without creating HoodieTable + +## Implementation + +### Design target + +The goal of this solution is to converge the acquisition of timelines and obtain them through the Embedded timeline service uniformly. The timeline is pulled through HDFS only when the Embedded timeline service is not started. + +### Converge the request to loop instant in Flink to JM + +- Store the latest instant on the Embedded Timeline Server. Every time JM modifies the instant state, it actively performs a sync to Embedded Timeline Server +- Return the latest instant directly when the task pulls the latest instant + +### Converge the request to pull instant in meta client initialization to JM + +- Abstract the timeline-related acquisition methods into the new interface TableTimelineView, and create the corresponding TimelineViewManager in MetaClient, and obtain the timeline through TimelineViewManager. + +![](Design.png) + +### Flink optimization before and after schematic diagram + +![](SchematicDiagram.png) + +## Rollout/Adoption Plan + +- What impact (if any) will there be on existing users? +- Since the Embedded Timeline Service is used to pull the timeline, users who use flink to write to hudi will observe that file system requests are greatly reduced, thereby reducing the pressure on the file system. +- However, in a scenario with a relatively high degree of parallelism, it may be necessary to increase the resources of JM to ensure the effectiveness of the response +- If we are changing behavior how will we phase out the older behavior? +- Add a configuration to control this behavior +- If we need special migration tools, describe them here. +- No special migration tools will be necessary +- When will we remove the existing behavior +- In subsequent releases (1.0 or later) +## Test Plan + +Test plan +No additional regression testing is required, as the behavior of MetaClient's active timeline has not been changed Review Comment: We can enable the config for existing test cases to cover the new code path, like change Test to ParameterizedTest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5392: [HUDI-3942] [RFC-50] Improve Timeline Server
yuzhaojing commented on code in PR #5392: URL: https://github.com/apache/hudi/pull/5392#discussion_r878704899 ## rfc/rfc-50/rfc-50.md: ## @@ -0,0 +1,93 @@ + + +# RFC-50: Improve Timeline Server + +## Proposers +- @yuzhaojing + +## Approvers + - @xushiyan + - @danny0405 + +## Abstract + +Support client to obtain timeline from timeline server. + +## Background + +At its core, Hudi maintains a timeline of all actions performed on the table at different instants of time. Before each operation is performed on the Hoodie table, the information of the HUDI table needs to be obtained through the timeline. At present, there are two ways to obtain the timeline of HUDI : +- Create a MetaClient and get the complete timeline through MetaClient #getActiveTimeline, which will directly scan the HDFS directory of metadata +- Get the timeline through FileSystemView#getTimeline. This timeline is the cache timeline obtained by requesting the Embedded timeline service. There is no need to repeatedly scan the HDFS directory of metadata, but this timeline only contains completed instants + +### Problem description + +- HUDI designs the Timeline service for processing and caching when accessing metadata , but currently does not converge all access to metadata to the Timeline service, such as the acquisition of a complete timeline. +- When the number of tasks written increases, a large number of repeated access to metadata will lead to high HDFS NameNode requests, causing greater pressure and not easy to expand. + +### Spark and Flink write flow comparison diagram + +Since Hudi is designed based on the Spark micro-batch model, in the Spark write process, all operations on the timeline are completed on the driver side, and then distributed to the executor side to start the write operation. + +But for Flink , Write tasks are resident services due to their pure streaming model. There is also no highly reliable communication mechanism between the user-side JM and the TM in Flink, so the TM needs to obtain the latest instant by polling the timeline for writing. + +![](ComparisonDiagram.png) + +### Current + +![](CurrentDesign.png) + +The current design implementation has two main problems with the convergence timeline +- Since the timeline of the task is pulled from the Embedded timeline service, the refresh mechanism of the Embedded timeline service itself will doesn't work +- MetaClient and HoodieTable are decoupled. Obtain the timeline in MetaClient and then request the Embedded timeline service to obtain file-related information through the FileSystemViewManager in HoodieTable combined with the timeline. There are circular dependencies and problems in the case of using MetaClient alone without creating HoodieTable + +## Implementation + +### Design target + +The goal of this solution is to converge the acquisition of timelines and obtain them through the Embedded timeline service uniformly. The timeline is pulled through HDFS only when the Embedded timeline service is not started. + +### Converge the request to loop instant in Flink to JM + +- Store the latest instant on the Embedded Timeline Server. Every time JM modifies the instant state, it actively performs a sync to Embedded Timeline Server +- Return the latest instant directly when the task pulls the latest instant + +### Converge the request to pull instant in meta client initialization to JM + +- Abstract the timeline-related acquisition methods into the new interface TableTimelineView, and create the corresponding TimelineViewManager in MetaClient, and obtain the timeline through TimelineViewManager. Review Comment: The name of TimelineViewManager is to be consistent with FlieSystemVIewManager, because by design the two are the same abstract implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5392: [HUDI-3942] [RFC-50] Improve Timeline Server
yuzhaojing commented on code in PR #5392: URL: https://github.com/apache/hudi/pull/5392#discussion_r878704647 ## rfc/rfc-50/rfc-50.md: ## @@ -0,0 +1,93 @@ + + +# RFC-50: Improve Timeline Server + +## Proposers +- @yuzhaojing + +## Approvers + - @xushiyan + - @danny0405 + +## Abstract + +Support client to obtain timeline from timeline server. + +## Background + +At its core, Hudi maintains a timeline of all actions performed on the table at different instants of time. Before each operation is performed on the Hoodie table, the information of the HUDI table needs to be obtained through the timeline. At present, there are two ways to obtain the timeline of HUDI : +- Create a MetaClient and get the complete timeline through MetaClient #getActiveTimeline, which will directly scan the HDFS directory of metadata +- Get the timeline through FileSystemView#getTimeline. This timeline is the cache timeline obtained by requesting the Embedded timeline service. There is no need to repeatedly scan the HDFS directory of metadata, but this timeline only contains completed instants + +### Problem description + +- HUDI designs the Timeline service for processing and caching when accessing metadata , but currently does not converge all access to metadata to the Timeline service, such as the acquisition of a complete timeline. +- When the number of tasks written increases, a large number of repeated access to metadata will lead to high HDFS NameNode requests, causing greater pressure and not easy to expand. + +### Spark and Flink write flow comparison diagram + +Since Hudi is designed based on the Spark micro-batch model, in the Spark write process, all operations on the timeline are completed on the driver side, and then distributed to the executor side to start the write operation. + +But for Flink , Write tasks are resident services due to their pure streaming model. There is also no highly reliable communication mechanism between the user-side JM and the TM in Flink, so the TM needs to obtain the latest instant by polling the timeline for writing. + +![](ComparisonDiagram.png) + +### Current + +![](CurrentDesign.png) + +The current design implementation has two main problems with the convergence timeline +- Since the timeline of the task is pulled from the Embedded timeline service, the refresh mechanism of the Embedded timeline service itself will doesn't work +- MetaClient and HoodieTable are decoupled. Obtain the timeline in MetaClient and then request the Embedded timeline service to obtain file-related information through the FileSystemViewManager in HoodieTable combined with the timeline. There are circular dependencies and problems in the case of using MetaClient alone without creating HoodieTable + +## Implementation + +### Design target + +The goal of this solution is to converge the acquisition of timelines and obtain them through the Embedded timeline service uniformly. The timeline is pulled through HDFS only when the Embedded timeline service is not started. + +### Converge the request to loop instant in Flink to JM Review Comment: Okay, I will update these in an other pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5392: [HUDI-3942] [RFC-50] Improve Timeline Server
yuzhaojing commented on code in PR #5392: URL: https://github.com/apache/hudi/pull/5392#discussion_r878704620 ## rfc/rfc-50/rfc-50.md: ## @@ -0,0 +1,93 @@ + + +# RFC-50: Improve Timeline Server + +## Proposers +- @yuzhaojing + +## Approvers + - @xushiyan + - @danny0405 + +## Abstract + +Support client to obtain timeline from timeline server. + +## Background + +At its core, Hudi maintains a timeline of all actions performed on the table at different instants of time. Before each operation is performed on the Hoodie table, the information of the HUDI table needs to be obtained through the timeline. At present, there are two ways to obtain the timeline of HUDI : +- Create a MetaClient and get the complete timeline through MetaClient #getActiveTimeline, which will directly scan the HDFS directory of metadata +- Get the timeline through FileSystemView#getTimeline. This timeline is the cache timeline obtained by requesting the Embedded timeline service. There is no need to repeatedly scan the HDFS directory of metadata, but this timeline only contains completed instants + +### Problem description + +- HUDI designs the Timeline service for processing and caching when accessing metadata , but currently does not converge all access to metadata to the Timeline service, such as the acquisition of a complete timeline. +- When the number of tasks written increases, a large number of repeated access to metadata will lead to high HDFS NameNode requests, causing greater pressure and not easy to expand. + +### Spark and Flink write flow comparison diagram + +Since Hudi is designed based on the Spark micro-batch model, in the Spark write process, all operations on the timeline are completed on the driver side, and then distributed to the executor side to start the write operation. + +But for Flink , Write tasks are resident services due to their pure streaming model. There is also no highly reliable communication mechanism between the user-side JM and the TM in Flink, so the TM needs to obtain the latest instant by polling the timeline for writing. + +![](ComparisonDiagram.png) + +### Current + +![](CurrentDesign.png) + +The current design implementation has two main problems with the convergence timeline +- Since the timeline of the task is pulled from the Embedded timeline service, the refresh mechanism of the Embedded timeline service itself will doesn't work +- MetaClient and HoodieTable are decoupled. Obtain the timeline in MetaClient and then request the Embedded timeline service to obtain file-related information through the FileSystemViewManager in HoodieTable combined with the timeline. There are circular dependencies and problems in the case of using MetaClient alone without creating HoodieTable + +## Implementation + +### Design target + +The goal of this solution is to converge the acquisition of timelines and obtain them through the Embedded timeline service uniformly. The timeline is pulled through HDFS only when the Embedded timeline service is not started. + +### Converge the request to loop instant in Flink to JM + +- Store the latest instant on the Embedded Timeline Server. Every time JM modifies the instant state, it actively performs a sync to Embedded Timeline Server Review Comment: Only commit / delta commit, other type is unnecessary for Flink request to timeline service. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5392: [HUDI-3942] [RFC-50] Improve Timeline Server
yuzhaojing commented on code in PR #5392: URL: https://github.com/apache/hudi/pull/5392#discussion_r875580178 ## rfc/rfc-50/rfc-50.md: ## @@ -0,0 +1,93 @@ + + +# RFC-50: Improve Timeline Server + +## Proposers +- @yuzhaojing + +## Approvers + - @xushiyan + - @danny0405 + +## Abstract + +Support client to obtain timeline from timeline server. + +## Background + +At its core, Hudi maintains a timeline of all actions performed on the table at different instants of time.Before each operation is performed on the Hoodie table, the information of the HUDI table needs to be obtained through the timeline.At present, there are two ways to obtain the timeline of HUDI : +- Create a MetaClient and get the complete timeline through MetaClient #getActiveTimeline, which will directly scan the HDFS directory of metadata Review Comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5392: [HUDI-3942] [RFC-50] Improve Timeline Server
yuzhaojing commented on code in PR #5392: URL: https://github.com/apache/hudi/pull/5392#discussion_r875446437 ## rfc/rfc-50/rfc-50.md: ## @@ -0,0 +1,94 @@ + + +# RFC-50: Improve Timeline Server + +## Proposers +- @yuzhaojing + +## Approvers + - @xushiyan + - @danny0405 + +## Abstract + +Support client to obtain timeline from timeline server. + +## Background + +The core of HUDI is to maintain all the operations performed by the timeline on the table at different times. Every time you write and read, you need to obtain the information of the HUDI table through the timeline. +At present, there are two ways to obtain the timeline of HUDI : +- Create a MetaClient and get the complete timeline through MetaClient #getActiveTimeline, which will directly scan the HDFS directory of metadata +- Get the timeline through FileSystemView#getTimeline. This timeline is the cache timeline obtained by requesting the Embedded timeline service. There is no need to repeatedly scan the HDFS directory of metadata, but this timeline only contains completed instants + +### Problem description + +- HUDI designs the Timeline service for processing and caching when accessing metadata , but currently does not converge all access to metadata to the Timeline service, such as the acquisition of a complete timeline. +- When the number of tasks written increases, a large number of repeated access to metadata will lead to high HDFS NameNode requests, causing greater pressure and not easy to expand. + +### Spark and Flink write flow comparison diagram + +Since Hudi is designed based on the Spark micro-batch model, in the Spark write process, all operations on the timeline are completed on the driver side, and then distributed to the executor side to start the write operation. + +But for Flink , Write tasks are resident services due to their pure streaming model. There is also no highly reliable communication mechanism between the user-side JM and the TM in Flink, so the TM needs to obtain the latest instant by polling the timeline for writing. + +![](ComparisonDiagram.png) + +### Current + +![](CurrentDesign.png) + +The current design implementation has two main problems with the convergence timeline +- Since the timeline of the task is pulled from the Embedded timeline service, the refresh mechanism of the Embedded timeline service itself will doesn't work +- MetaClient and HoodieTable are decoupled. Obtain the timeline in MetaClient and then request the Embedded timeline service to obtain file-related information through the FileSystemViewManager in HoodieTable combined with the timeline. There are circular dependencies and problems in the case of using MetaClient alone without creating HoodieTable + Review Comment: I know this implementation, should we maintain a timeline each for the timeline view and fs view? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5392: [HUDI-3942] [RFC-50] Improve Timeline Server
yuzhaojing commented on code in PR #5392: URL: https://github.com/apache/hudi/pull/5392#discussion_r875414766 ## rfc/rfc-50/rfc-50.md: ## @@ -0,0 +1,94 @@ + + +# RFC-50: Improve Timeline Server + +## Proposers +- @yuzhaojing + +## Approvers + - @xushiyan + - @danny0405 + +## Abstract + +Support client to obtain timeline from timeline server. + +## Background + +The core of HUDI is to maintain all the operations performed by the timeline on the table at different times. Every time you write and read, you need to obtain the information of the HUDI table through the timeline. +At present, there are two ways to obtain the timeline of HUDI : +- Create a MetaClient and get the complete timeline through MetaClient #getActiveTimeline, which will directly scan the HDFS directory of metadata +- Get the timeline through FileSystemView#getTimeline. This timeline is the cache timeline obtained by requesting the Embedded timeline service. There is no need to repeatedly scan the HDFS directory of metadata, but this timeline only contains completed instants + +### Problem description + +- HUDI designs the Timeline service for processing and caching when accessing metadata , but currently does not converge all access to metadata to the Timeline service, such as the acquisition of a complete timeline. +- When the number of tasks written increases, a large number of repeated access to metadata will lead to high HDFS NameNode requests, causing greater pressure and not easy to expand. + +### Spark and Flink write flow comparison diagram + +Since Hudi is designed based on the Spark micro-batch model, in the Spark write process, all operations on the timeline are completed on the driver side, and then distributed to the executor side to start the write operation. + +But for Flink , Write tasks are resident services due to their pure streaming model. There is also no highly reliable communication mechanism between the user-side JM and the TM in Flink, so the TM needs to obtain the latest instant by polling the timeline for writing. + +![](ComparisonDiagram.png) + +### Current + +![](CurrentDesign.png) + +The current design implementation has two main problems with the convergence timeline +- Since the timeline of the task is pulled from the Embedded timeline service, the refresh mechanism of the Embedded timeline service itself will doesn't work +- MetaClient and HoodieTable are decoupled. Obtain the timeline in MetaClient and then request the Embedded timeline service to obtain file-related information through the FileSystemViewManager in HoodieTable combined with the timeline. There are circular dependencies and problems in the case of using MetaClient alone without creating HoodieTable + Review Comment: As you said, meta client only triggers the timeline sync/refresh, fs view was synced lazily from the next request to fs view. We can record the last instant of the last request in fs view, and sync if there is a change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on a diff in pull request #5392: [HUDI-3942] [RFC-50] Improve Timeline Server
yuzhaojing commented on code in PR #5392: URL: https://github.com/apache/hudi/pull/5392#discussion_r875408492 ## rfc/rfc-50/rfc-50.md: ## @@ -0,0 +1,94 @@ + + +# RFC-50: Improve Timeline Server + +## Proposers +- @yuzhaojing + +## Approvers + - @xushiyan + - @danny0405 + +## Abstract + +Support client to obtain timeline from timeline server. + +## Background + +The core of HUDI is to maintain all the operations performed by the timeline on the table at different times. Every time you write and read, you need to obtain the information of the HUDI table through the timeline. +At present, there are two ways to obtain the timeline of HUDI : Review Comment: Update it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org