[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384171#comment-17384171 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384131#comment-17384131 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384132#comment-17384132 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -128,14 +128,35 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) + .setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(), HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean) .initTable(sparkContext.hadoopConfiguration, path.get) tableConfig = tableMetaClient.getTableConfig + } else { +// validate table properties +val tableMetaClient = HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build() Review comment: I understand that we need one place to do the validation. As of now, any callers directly to WriteClient has to make another call to do the validation. I could think of two other options. 1. Basically fix all startCommit() methods in WriteClient to take in operationType and do the validation. But there are quite a few callers(200 places) which I need to fix. If we go ahead with this approach, validation happens within our custom data source where we instantiate the writeClient and start the commit (in row writer path). Once we have consensus I can make the changes. Once fixed, callers don't need to make any additional calls to do validations. 2. I also thought we can add the validation to MetaClient and call it from within [getTableAndInitCtx()](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L405) in SparkRDDWriteClient. But We need the raw properties from user. If we call it from within getTableAndInitCtx(), we might fetch properties from writeConfig which would have read the table props already. So, not sure if we can go with this approach. But if we can get this in neatly, no changes required for those using writeClient directly. For row writer path, we need to make one additional call from within [DataSourceInternalWriterHelper](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java#L68) to explicitly validateProps using metaClient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384118#comment-17384118 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan merged pull request #3247: URL: https://github.com/apache/hudi/pull/3247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384012#comment-17384012 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383971#comment-17383971 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -128,14 +128,35 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) + .setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(), HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean) .initTable(sparkContext.hadoopConfiguration, path.get) tableConfig = tableMetaClient.getTableConfig + } else { +// validate table properties +val tableMetaClient = HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build() Review comment: I understand that we need one place to do the validation. As of now, any callers directly to WriteClient has to make another call to do the validation. I could think of two other options. 1. Basically fix all startCommit() methods in WriteClient to take in operationType and do the validation. But there are quite a few callers(200 places) which I need to fix. If we go ahead with this approach, validation happens within our custom data source where we instantiate the writeClient and start the commit (in row writer path). Once we have consensus I can make the changes. Once fixed, callers don't need to make any additional calls to do validations. 2. I also thought we can add the validation to MetaClient and call it from within [getTableAndInitCtx()](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L405) in SparkRDDWriteClient. But We need the raw properties from user. If we call it from within getTableAndInitCtx(), we might fetch properties from writeConfig which would have read the table props already. So, not sure if we can go with this approach. But if we can get this in neatly, no changes required for those using writeClient directly. For row writer path, we need to make one additional call from within [DataSourceInternalWriterHelper](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java#L68) to explicitly validateProps using metaClient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383970#comment-17383970 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383958#comment-17383958 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan merged pull request #3247: URL: https://github.com/apache/hudi/pull/3247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383675#comment-17383675 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan merged pull request #3247: URL: https://github.com/apache/hudi/pull/3247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383672#comment-17383672 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (129b2e6) into [master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (50c2b76) will **decrease** coverage by `0.03%`. > The diff coverage is `54.30%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3247 +/- ## - Coverage 47.78% 47.74% -0.04% - Complexity 5557 5591 +34 Files 936 938 +2 Lines 4159641815 +219 Branches 4185 4213 +28 + Hits 1987719965 +88 - Misses1994920063 +114 - Partials 1770 1787 +17 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.97% <ø> (ø)` | | | hudiclient | `34.55% <47.22%> (+0.03%)` | :arrow_up: | | hudicommon | `48.65% <23.52%> (-0.04%)` | :arrow_down: | | hudiflink | `59.44% <ø> (+0.08%)` | :arrow_up: | | hudihadoopmr | `52.02% <ø> (ø)` | | | hudisparkdatasource | `67.18% <75.40%> (-0.19%)` | :arrow_down: | | hudisync | `55.97% <ø> (ø)` | | | huditimelineservice | `64.07% <ø> (ø)` | | | hudiutilities | `59.77% <ø> (+0.50%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `43.37% <0.00%> (-0.15%)` | :arrow_down: | | [...torage/row/HoodieInternalRowFileWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dGaWxlV3JpdGVyRmFjdG9yeS5qYXZh) | `51.61% <0.00%> (ø)` | | | [...io/storage/row/HoodieInternalRowParquetWriter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dQYXJxdWV0V3JpdGVyLmphdmE=) | `84.21% <0.00%> (ø)` | | | [.
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383666#comment-17383666 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (129b2e6) into [master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (50c2b76) will **decrease** coverage by `3.79%`. > The diff coverage is `54.30%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3247 +/- ## - Coverage 47.78% 43.99% -3.80% + Complexity 5557 5173 -384 Files 936 938 +2 Lines 4159641815 +219 Branches 4185 4213 +28 - Hits 1987718396-1481 - Misses1994921770+1821 + Partials 1770 1649 -121 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.97% <ø> (ø)` | | | hudiclient | `34.55% <47.22%> (+0.03%)` | :arrow_up: | | hudicommon | `48.65% <23.52%> (-0.04%)` | :arrow_down: | | hudiflink | `59.44% <ø> (+0.08%)` | :arrow_up: | | hudihadoopmr | `52.02% <ø> (ø)` | | | hudisparkdatasource | `67.18% <75.40%> (-0.19%)` | :arrow_down: | | hudisync | `55.97% <ø> (ø)` | | | huditimelineservice | `64.07% <ø> (ø)` | | | hudiutilities | `8.99% <ø> (-50.27%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `43.37% <0.00%> (-0.15%)` | :arrow_down: | | [...torage/row/HoodieInternalRowFileWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dGaWxlV3JpdGVyRmFjdG9yeS5qYXZh) | `51.61% <0.00%> (ø)` | | | [...io/storage/row/HoodieInternalRowParquetWriter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dQYXJxdWV0V3JpdGVyLmphdmE=) | `84.21% <0.00%> (ø)` | | |
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383661#comment-17383661 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (129b2e6) into [master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (50c2b76) will **decrease** coverage by `30.23%`. > The diff coverage is `47.22%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3247 +/- ## = - Coverage 47.78% 17.55% -30.24% + Complexity 5557 905 -4652 = Files 936 390 -546 Lines 4159615579-26017 Branches 4185 1381 -2804 = - Hits 19877 2735-17142 + Misses1994912658 -7291 + Partials 1770 186 -1584 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `21.19% <47.22%> (-13.33%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `8.99% <ø> (-50.27%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `0.00% <0.00%> (-43.52%)` | :arrow_down: | | [...torage/row/HoodieInternalRowFileWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dGaWxlV3JpdGVyRmFjdG9yeS5qYXZh) | `51.61% <0.00%> (ø)` | | | [...io/storage/row/HoodieInternalRowParquetWriter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dQYXJxdWV0V3JpdGVyLmphdmE=) | `84.21% <0.00%> (ø)` | | | [...che/hudi/io/storage/row/HoodieRowCreateHandle.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383656#comment-17383656 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (129b2e6) into [master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (50c2b76) will **decrease** coverage by `44.96%`. > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #3247 +/- ## - Coverage 47.78% 2.82% -44.97% + Complexity 5557 85 -5472 Files 936 284 -652 Lines 41596 11873-29723 Branches 4185 986 -3199 - Hits 19877 335-19542 + Misses19949 11512 -8437 + Partials 1770 26 -1744 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `0.00% <0.00%> (-34.53%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `8.99% <ø> (-50.27%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `0.00% <0.00%> (-43.52%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comme
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383565#comment-17383565 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 129b2e6b3c374ca973ea7d9f1ef3d33dbc884aa9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1024) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383543#comment-17383543 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=983) * 129b2e6b3c374ca973ea7d9f1ef3d33dbc884aa9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1024) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383541#comment-17383541 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=983) * 129b2e6b3c374ca973ea7d9f1ef3d33dbc884aa9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383408#comment-17383408 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -128,14 +128,35 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) + .setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(), HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean) .initTable(sparkContext.hadoopConfiguration, path.get) tableConfig = tableMetaClient.getTableConfig + } else { +// validate table properties +val tableMetaClient = HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build() Review comment: I understand that we need one place to do the validation. As of now, any callers directly to WriteClient has to make another call to do the validation. I could think of two other options. 1. Basically fix all startCommit() methods in WriteClient to take in operationType and do the validation. But there are quite a few callers(200 places) which I need to fix. If we go ahead with this approach, validation happens within our custom data source where we instantiate the writeClient and start the commit (in row writer path). Once we have consensus I can make the changes. Once fixed, callers don't need to make any additional calls to do validations. 2. I also thought we can add the validation to MetaClient and call it from within [getTableAndInitCtx()](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L405) in SparkRDDWriteClient. But We need the raw properties from user. If we call it from within getTableAndInitCtx(), we might fetch properties from writeConfig which would have read the table props already. So, not sure if we can go with this approach. But if we can get this in neatly, no changes required for those using writeClient directly. For row writer path, we need to make one additional call from within [DataSourceInternalWriterHelper](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java#L68) to explicitly validateProps using metaClient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382928#comment-17382928 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671605915 ## File path: hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala ## @@ -21,21 +21,24 @@ package org.apache.hudi import org.apache.avro.Schema import org.apache.avro.generic.GenericRecord import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hudi.client.utils.SparkRowSerDe Review comment: Moved HoodiesparkUtils, SparkAdaptorSupport and SparkAdaptor from hudi-spark module to hudi-spark-client module since we wanted to access SparkAdaptor from within BuiltInKeygen. If we don't want to move entire HoodiesparkUtils to a diff module, I can move just SparkAdaptor and SparkAdaptorSupport to hudi-spark-client module and create another class locally similar to HoodieSparkUtils in this module and expose createSparkRowSerDe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382916#comment-17382916 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -128,14 +128,35 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) + .setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(), HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean) .initTable(sparkContext.hadoopConfiguration, path.get) tableConfig = tableMetaClient.getTableConfig + } else { +// validate table properties +val tableMetaClient = HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build() Review comment: I understand that we need one place to do the validation. As of now, any callers directly to WriteClient has to make another call to do validation. I could think of another option. Basically fix all startCommit() methods in WriteClient to take in operationType and do the validation. But there are quite a few callers(200 places) which I need to fix. If we go ahead with this approach, validation happens within our custom data source where we instantiate the writeClient and start the commit (in row writer path). Once we have consensus I can make the changes. I feel this is the right approach. Once fixed, callers don't need to make any additional calls to do validations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382915#comment-17382915 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -128,14 +128,35 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) + .setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(), HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean) .initTable(sparkContext.hadoopConfiguration, path.get) tableConfig = tableMetaClient.getTableConfig + } else { +// validate table properties +val tableMetaClient = HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build() Review comment: I understand that we need one place to do the validation. As of now, any callers directly to WriteClient has to make another call to do validation. I could think of another option. Basically fix all startCommit() methods in WriteClient to take in operationType and do the validation. But there are quite a few callers(30 places) which I need to fix. If we go ahead with this approach, validation happens within our custom data source where we instantiate the writeClient and start the commit (in row writer path). Once we have consensus I can make the changes. I feel this is the right approach. Once fixed, callers don't need to make any additional calls to do validations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382914#comment-17382914 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -128,14 +128,35 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) + .setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(), HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean) .initTable(sparkContext.hadoopConfiguration, path.get) tableConfig = tableMetaClient.getTableConfig + } else { +// validate table properties +val tableMetaClient = HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build() Review comment: I understand that we need one place to do the validation. As of now, any callers directly to WriteClient has to make another call to do validation. I could think of another option. Basically fix all startCommit() methods in WriteClient to take in operationType and do the validation. But there are quite a few callers which I need to fix. If we go ahead with this approach, validation happens within our custom data source where we instantiate the writeClient and start the commit (in row writer path). Once we have consensus I can make the changes. I feel this is the right approach. Once fixed, callers don't need to make any additional calls to do validations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382912#comment-17382912 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -128,14 +128,35 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) + .setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(), HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean) .initTable(sparkContext.hadoopConfiguration, path.get) tableConfig = tableMetaClient.getTableConfig + } else { +// validate table properties +val tableMetaClient = HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build() Review comment: I understand that we need one place to do the validation. As of now, any callers directly to WriteClient has to make another call to do validation. I could think of another option. Basically fix all startCommit() methods in WriteClient to take in operationType and do the validation. But there are quite a few callers which I need to fix. If we go ahead with this approach, validation happens within our custom data source where we instantiate the writeClient and start the commit (in row writer path). Once we have consensus I can make the changes. I feel this is the right approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382487#comment-17382487 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5b5c7c2) into [master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (50c2b76) will **decrease** coverage by `0.10%`. > The diff coverage is `54.73%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3247 +/- ## - Coverage 47.78% 47.68% -0.11% - Complexity 5557 5583 +26 Files 936 938 +2 Lines 4159641763 +167 Branches 4185 4204 +19 + Hits 1987719913 +36 - Misses1994920066 +117 - Partials 1770 1784 +14 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.97% <ø> (ø)` | | | hudiclient | `34.55% <46.72%> (+0.02%)` | :arrow_up: | | hudicommon | `48.67% <26.66%> (-0.03%)` | :arrow_down: | | hudiflink | `59.36% <ø> (ø)` | | | hudihadoopmr | `52.02% <ø> (ø)` | | | hudisparkdatasource | `67.12% <73.52%> (-0.26%)` | :arrow_down: | | hudisync | `55.97% <ø> (ø)` | | | huditimelineservice | `64.07% <ø> (ø)` | | | hudiutilities | `59.26% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `43.37% <0.00%> (-0.15%)` | :arrow_down: | | [...torage/row/HoodieInternalRowFileWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dGaWxlV3JpdGVyRmFjdG9yeS5qYXZh) | `51.61% <0.00%> (ø)` | | | [...io/storage/row/HoodieInternalRowParquetWriter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dQYXJxdWV0V3JpdGVyLmphdmE=) | `84.21% <0.00%> (ø)` | | | [...che/hudi/io/storage/row/Hoodie
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382478#comment-17382478 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=983) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382468#comment-17382468 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5b5c7c2) into [master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (50c2b76) will **decrease** coverage by `1.76%`. > The diff coverage is `62.06%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3247 +/- ## - Coverage 47.78% 46.01% -1.77% + Complexity 5557 4763 -794 Files 936 832 -104 Lines 4159638058-3538 Branches 4185 3809 -376 - Hits 1987717514-2363 + Misses1994918920-1029 + Partials 1770 1624 -146 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.97% <ø> (ø)` | | | hudiclient | `23.00% <0.00%> (-11.53%)` | :arrow_down: | | hudicommon | `48.67% <26.66%> (-0.03%)` | :arrow_down: | | hudiflink | `59.36% <ø> (ø)` | | | hudihadoopmr | `52.02% <ø> (ø)` | | | hudisparkdatasource | `67.12% <73.52%> (-0.26%)` | :arrow_down: | | hudisync | `55.97% <ø> (ø)` | | | huditimelineservice | `64.07% <ø> (ø)` | | | hudiutilities | `59.26% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `43.37% <0.00%> (-0.15%)` | :arrow_down: | | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `62.36% <0.00%> (-2.39%)` | :arrow_down: | | [...n/java/org/apache/hudi/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2ludGVybmFsL0RlZmF1bHRTb3VyY2UuamF2YQ==) | `0.00% <0.00%> (ø)` | | | [...org/apache/hudi/spark3/internal/DefaultSource.java](https://codecov.io/gh/apach
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382464#comment-17382464 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382463#comment-17382463 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 5e7e02ec3da3137c31e2124c88cff815bc299875 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=980) * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=983) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382455#comment-17382455 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5e7e02e) into [master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (50c2b76) will **decrease** coverage by `32.03%`. > The diff coverage is `0.00%`. > :exclamation: Current head 5e7e02e differs from pull request most recent head 5b5c7c2. Consider uploading reports for the commit 5b5c7c2 to get more accurate results [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3247 +/- ## = - Coverage 47.78% 15.75% -32.04% + Complexity 5557 493 -5064 = Files 936 284 -652 Lines 4159611832-29764 Branches 4185 981 -3204 = - Hits 19877 1864-18013 + Misses19949 9805-10144 + Partials 1770 163 -1607 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `0.00% <0.00%> (-34.53%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `59.26% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `0.00% <0.00%> (-43.52%)` | :arrow_down: | | [...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` |
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382447#comment-17382447 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5b5c7c2) into [master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (50c2b76) will **decrease** coverage by `44.95%`. > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #3247 +/- ## - Coverage 47.78% 2.83% -44.96% + Complexity 5557 85 -5472 Files 936 284 -652 Lines 41596 11832-29764 Branches 4185 981 -3204 - Hits 19877 335-19542 + Misses19949 11471 -8478 + Partials 1770 26 -1744 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `0.00% <0.00%> (-34.53%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `0.00% <0.00%> (-43.52%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comme
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382445#comment-17382445 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 5e7e02ec3da3137c31e2124c88cff815bc299875 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=980) * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382444#comment-17382444 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816784 Sorry I had squashed all commits to one so that its easier for me to rebase. I had 10+ files in conflict when I rebased w/ master. Also, I did rename quite a few files and moved some of them(reuse spark serDe class), and so went ahead and squashed it. sorry if you were planning to review just the latest commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382443#comment-17382443 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978) * 5e7e02ec3da3137c31e2124c88cff815bc299875 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=980) * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382438#comment-17382438 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671606366 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -128,14 +128,35 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) + .setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(), HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean) .initTable(sparkContext.hadoopConfiguration, path.get) tableConfig = tableMetaClient.getTableConfig + } else { +// validate table properties +val tableMetaClient = HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build() Review comment: Added a private method here in HoodiesparkSqlWriter wrt params and add a method in HoodieTableMetaclient to validate table properties. ## File path: hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala ## @@ -21,21 +21,24 @@ package org.apache.hudi import org.apache.avro.Schema import org.apache.avro.generic.GenericRecord import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hudi.client.utils.SparkRowSerDe Review comment: Moved HoodiesparkUtils, SparkAdaptorSupport and SparkAdaptor from hudi-spark module to hudi-spark-client module since we wanted to access SparkAdaptor from within BuiltInKeygen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382439#comment-17382439 ] ASF GitHub Bot commented on HUDI-2161: -- codecov-commenter commented on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5e7e02e) into [master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (50c2b76) will **decrease** coverage by `44.95%`. > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #3247 +/- ## - Coverage 47.78% 2.83% -44.96% + Complexity 5557 85 -5472 Files 936 284 -652 Lines 41596 11832-29764 Branches 4185 981 -3204 - Hits 19877 335-19542 + Misses19949 11471 -8478 + Partials 1770 26 -1744 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `0.00% <0.00%> (-34.53%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=) | `0.00% <0.00%> (ø)` | | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `0.00% <0.00%> (-43.52%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382436#comment-17382436 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978) * 5e7e02ec3da3137c31e2124c88cff815bc299875 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=980) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382435#comment-17382435 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978) * 5e7e02ec3da3137c31e2124c88cff815bc299875 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382433#comment-17382433 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382426#comment-17382426 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671600987 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -128,14 +128,35 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) + .setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(), HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean) .initTable(sparkContext.hadoopConfiguration, path.get) tableConfig = tableMetaClient.getTableConfig + } else { +// validate table properties +val tableMetaClient = HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build() Review comment: I am thinking to add the validation within HoodieTableMetaClient. Because, a) We don't instantiate WriteClient at all in row writer path as of now. b) table properties are available when tableMetaclient is instantiated. Even if we were to place it within WriteClient, we have to instantiate Metaclient to read the table props. So, we might as well place the validation method in HoodieTableMetaClient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382417#comment-17382417 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * e32037e79596e3bf19415d1af107850643ee9ee5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=976) * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382405#comment-17382405 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * e32037e79596e3bf19415d1af107850643ee9ee5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=976) * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382404#comment-17382404 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894) * e32037e79596e3bf19415d1af107850643ee9ee5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=976) * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382402#comment-17382402 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894) * e32037e79596e3bf19415d1af107850643ee9ee5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=976) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382401#comment-17382401 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894) * e32037e79596e3bf19415d1af107850643ee9ee5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382192#comment-17382192 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671383496 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/SimpleKeyGenerator.java ## @@ -72,4 +75,16 @@ public String getPartitionPath(Row row) { return RowKeyGeneratorHelper.getPartitionPathFromRow(row, getPartitionPathFields(), hiveStylePartitioning, partitionPathPositions); } + + @Override + public String getPartitionPath(InternalRow row, StructType structType) { +buildFieldDataTypesMapIfNeeded(structType); Review comment: During instantiation of KeyGen, we never know if we are ever going to invoke these methods. As you might be aware, these are invoked only in row writer path and only if meta cols are disabled. So, as of now, our constructors for keyGen are are designed in such a way. So, may be we can add public methods to set the structType and expect callers to set it appropriately before calling these methods. I took precendence from what we did when we added support for getPartitionPath(Row). Let me know how you wanna go about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382190#comment-17382190 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671381130 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java ## @@ -81,7 +105,52 @@ public String getPartitionPath(Row row) { return getKey(genericRecord).getPartitionPath(); } - void buildFieldPositionMapIfNeeded(StructType structType) { + /** + * Fetch partition path from {@link InternalRow}. + * + * @param internalRow {@link InternalRow} instance from which partition path needs to be fetched from. + * @param structType schema of the internalRow. + * @return the partition path. + */ + public String getPartitionPath(InternalRow internalRow, StructType structType) { +try { + Row row = deserializeRow(getEncoder(structType), internalRow); + return getPartitionPath(row); +} catch (Exception e) { + throw new HoodieIOException("Conversion of InternalRow to Row failed with exception " + e); +} + } + + private ExpressionEncoder getEncoder(StructType structType) { +if (encoder == null) { + encoder = getRowEncoder(structType); +} +return encoder; + } + + private static ExpressionEncoder getRowEncoder(StructType schema) { +List attributes = JavaConversions.asJavaCollection(schema.toAttributes()).stream() +.map(Attribute::toAttribute).collect(Collectors.toList()); +return RowEncoder.apply(schema) + .resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(), +SimpleAnalyzer$.MODULE$); + } + + private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row) + throws InvocationTargetException, IllegalAccessException, NoSuchMethodException, ClassNotFoundException { +// TODO remove reflection if Spark 2.x support is dropped +if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) { Review comment: I could not find any. ``` grep -irl "import org.apache.spark.sql.catalyst.expressions.Attribute" hudi-*/ hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/KeyGeneratorTestUtilities.java hudi-client//hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java hudi-integ-test//src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java hudi-spark-datasource//hudi-spark/src/test/java/org/apache/hudi/TestHoodieDatasetBulkInsertHelper.java hudi-spark-datasource//hudi-spark/src/main/java/org/apache/hudi/SparkRowWriteHelper.java hudi-spark-datasource//hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala ``` In most of these places, we do have a static method to getEncoder() , but the deserializeRow is first of its kind. We do have serializeRow which converts Row to InternalRow in KeyGeneratorTestUtilities, but its converts from Row -> InternalRow and its part of test code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382191#comment-17382191 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671381130 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java ## @@ -81,7 +105,52 @@ public String getPartitionPath(Row row) { return getKey(genericRecord).getPartitionPath(); } - void buildFieldPositionMapIfNeeded(StructType structType) { + /** + * Fetch partition path from {@link InternalRow}. + * + * @param internalRow {@link InternalRow} instance from which partition path needs to be fetched from. + * @param structType schema of the internalRow. + * @return the partition path. + */ + public String getPartitionPath(InternalRow internalRow, StructType structType) { +try { + Row row = deserializeRow(getEncoder(structType), internalRow); + return getPartitionPath(row); +} catch (Exception e) { + throw new HoodieIOException("Conversion of InternalRow to Row failed with exception " + e); +} + } + + private ExpressionEncoder getEncoder(StructType structType) { +if (encoder == null) { + encoder = getRowEncoder(structType); +} +return encoder; + } + + private static ExpressionEncoder getRowEncoder(StructType schema) { +List attributes = JavaConversions.asJavaCollection(schema.toAttributes()).stream() +.map(Attribute::toAttribute).collect(Collectors.toList()); +return RowEncoder.apply(schema) + .resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(), +SimpleAnalyzer$.MODULE$); + } + + private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row) + throws InvocationTargetException, IllegalAccessException, NoSuchMethodException, ClassNotFoundException { +// TODO remove reflection if Spark 2.x support is dropped +if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) { Review comment: I could not find any. ``` grep -irl "import org.apache.spark.sql.catalyst.expressions.Attribute" hudi-*/ hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/KeyGeneratorTestUtilities.java hudi-client//hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java hudi-integ-test//src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java hudi-spark-datasource//hudi-spark/src/test/java/org/apache/hudi/TestHoodieDatasetBulkInsertHelper.java hudi-spark-datasource//hudi-spark/src/main/java/org/apache/hudi/SparkRowWriteHelper.java hudi-spark-datasource//hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala ``` In most of these places, we do have a static method to getEncoder() , but the deserializeRow is first of its kind. We do have serializeRow which converts Row to InternalRow in KeyGeneratorTestUtilities, but its converts from Row -> InternalRow and its part of test code. Here we needed to convert InternalRow -> Row. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382189#comment-17382189 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671381130 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java ## @@ -81,7 +105,52 @@ public String getPartitionPath(Row row) { return getKey(genericRecord).getPartitionPath(); } - void buildFieldPositionMapIfNeeded(StructType structType) { + /** + * Fetch partition path from {@link InternalRow}. + * + * @param internalRow {@link InternalRow} instance from which partition path needs to be fetched from. + * @param structType schema of the internalRow. + * @return the partition path. + */ + public String getPartitionPath(InternalRow internalRow, StructType structType) { +try { + Row row = deserializeRow(getEncoder(structType), internalRow); + return getPartitionPath(row); +} catch (Exception e) { + throw new HoodieIOException("Conversion of InternalRow to Row failed with exception " + e); +} + } + + private ExpressionEncoder getEncoder(StructType structType) { +if (encoder == null) { + encoder = getRowEncoder(structType); +} +return encoder; + } + + private static ExpressionEncoder getRowEncoder(StructType schema) { +List attributes = JavaConversions.asJavaCollection(schema.toAttributes()).stream() +.map(Attribute::toAttribute).collect(Collectors.toList()); +return RowEncoder.apply(schema) + .resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(), +SimpleAnalyzer$.MODULE$); + } + + private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row) + throws InvocationTargetException, IllegalAccessException, NoSuchMethodException, ClassNotFoundException { +// TODO remove reflection if Spark 2.x support is dropped +if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) { Review comment: I could not find any. ``` grep -irl "import org.apache.spark.sql.catalyst.expressions.Attribute" hudi-*/ hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/KeyGeneratorTestUtilities.java hudi-client//hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java hudi-integ-test//src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java hudi-spark-datasource//hudi-spark/src/test/java/org/apache/hudi/TestHoodieDatasetBulkInsertHelper.java hudi-spark-datasource//hudi-spark/src/main/java/org/apache/hudi/SparkRowWriteHelper.java hudi-spark-datasource//hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala ``` In most of these places, we do have a static method to getEncode, but the deserializeRow is first of its kind. We do have serializeRow which converts Row to InternalRow in KeyGeneratorTestUtilities, but its converts from Row -> InternalRow and its part of test code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382186#comment-17382186 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671378212 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java ## @@ -81,7 +105,52 @@ public String getPartitionPath(Row row) { return getKey(genericRecord).getPartitionPath(); } - void buildFieldPositionMapIfNeeded(StructType structType) { + /** + * Fetch partition path from {@link InternalRow}. + * + * @param internalRow {@link InternalRow} instance from which partition path needs to be fetched from. + * @param structType schema of the internalRow. + * @return the partition path. + */ + public String getPartitionPath(InternalRow internalRow, StructType structType) { +try { + Row row = deserializeRow(getEncoder(structType), internalRow); + return getPartitionPath(row); +} catch (Exception e) { + throw new HoodieIOException("Conversion of InternalRow to Row failed with exception " + e); +} + } + + private ExpressionEncoder getEncoder(StructType structType) { +if (encoder == null) { Review comment: deserializeRow() is a static method and hence. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382183#comment-17382183 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671374028 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieAppendOnlyRowParquetWriteSupport.java ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.io.storage; + +import org.apache.hadoop.conf.Configuration; +import org.apache.parquet.hadoop.api.WriteSupport; +import org.apache.spark.sql.types.StructType; + +import java.util.Collections; + +/** + * Hoodie Write Support for directly writing Row to Parquet. + */ +public class HoodieAppendOnlyRowParquetWriteSupport extends HoodieRowParquetWriteSupport { Review comment: actually existing ParquetWriteSupport will handle null bloom filter. So, removing this class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382172#comment-17382172 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671365168 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/HoodieAppendOnlyRowCreateHandle.java ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.io; + +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.io.storage.HoodieInternalRowFileWriter; +import org.apache.hudi.io.storage.HoodieInternalRowFileWriterFactory; +import org.apache.hudi.table.HoodieTable; + +import org.apache.hadoop.fs.Path; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.types.StructType; + +import java.io.IOException; + +/** + * RowCreateHandle to be used when meta columns are disabled. + */ +public class HoodieAppendOnlyRowCreateHandle extends HoodieRowCreateHandle { + + public HoodieAppendOnlyRowCreateHandle(HoodieTable table, HoodieWriteConfig writeConfig, String partitionPath, String fileId, String instantTime, + int taskPartitionId, long taskId, long taskEpochId, StructType structType) { +super(table, writeConfig, partitionPath, fileId, instantTime, taskPartitionId, taskId, taskEpochId, structType); + } + + /** + * Write the incoming InternalRow as is. + * + * @param record instance of {@link InternalRow} that needs to be written to the fileWriter. + * @throws IOException + */ + @Override + public void write(InternalRow record) throws IOException { +try { + fileWriter.writeRow("", record); Review comment: Here is the reason why I did not fix it. As of now we have HoodieInternalRowFileWriter which has writeRow(recordKey, InternalRow). I did not want to introduce another interface bcoz, I can't extend this new RowCreateHandle from existing RowCreateHandle as both will impl diff interfaces. But guess, I can add another overloaded write method to the same interface and each class can call into one of them. Will fix it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382169#comment-17382169 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671364040 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/HoodieAppendOnlyRowCreateHandle.java ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.io; Review comment: sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382161#comment-17382161 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671357904 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java ## @@ -675,6 +676,11 @@ public PropertyBuilder setBootstrapBasePath(String bootstrapBasePath) { return this; } +public PropertyBuilder setPopulateMetaColumns(boolean populateMetaColumns) { Review comment: sure. will fix everywhere. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382160#comment-17382160 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671357284 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/SimpleKeyGenerator.java ## @@ -72,4 +75,16 @@ public String getPartitionPath(Row row) { return RowKeyGeneratorHelper.getPartitionPathFromRow(row, getPartitionPathFields(), hiveStylePartitioning, partitionPathPositions); } + + @Override + public String getPartitionPath(InternalRow row, StructType structType) { +buildFieldDataTypesMapIfNeeded(structType); Review comment: yes, InternalRow does not have the come with a schema unfortunately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382159#comment-17382159 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r671356705 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java ## @@ -81,7 +105,52 @@ public String getPartitionPath(Row row) { return getKey(genericRecord).getPartitionPath(); } - void buildFieldPositionMapIfNeeded(StructType structType) { + /** + * Fetch partition path from {@link InternalRow}. + * + * @param internalRow {@link InternalRow} instance from which partition path needs to be fetched from. + * @param structType schema of the internalRow. + * @return the partition path. + */ + public String getPartitionPath(InternalRow internalRow, StructType structType) { +try { + Row row = deserializeRow(getEncoder(structType), internalRow); + return getPartitionPath(row); +} catch (Exception e) { + throw new HoodieIOException("Conversion of InternalRow to Row failed with exception " + e); +} + } + + private ExpressionEncoder getEncoder(StructType structType) { +if (encoder == null) { + encoder = getRowEncoder(structType); +} +return encoder; + } + + private static ExpressionEncoder getRowEncoder(StructType schema) { +List attributes = JavaConversions.asJavaCollection(schema.toAttributes()).stream() +.map(Attribute::toAttribute).collect(Collectors.toList()); +return RowEncoder.apply(schema) Review comment: yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381629#comment-17381629 ] ASF GitHub Bot commented on HUDI-2161: -- vinothchandar commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r670175121 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/HoodieAppendOnlyRowCreateHandle.java ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.io; + +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.io.storage.HoodieInternalRowFileWriter; +import org.apache.hudi.io.storage.HoodieInternalRowFileWriterFactory; +import org.apache.hudi.table.HoodieTable; + +import org.apache.hadoop.fs.Path; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.types.StructType; + +import java.io.IOException; + +/** + * RowCreateHandle to be used when meta columns are disabled. + */ +public class HoodieAppendOnlyRowCreateHandle extends HoodieRowCreateHandle { Review comment: rename: `HoodieNoMetaRowCreateHandle` or sth? lets not leak higher level use-cases into class names low down the stack? Row create handle implies we are appending data? ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/HoodieAppendOnlyRowCreateHandle.java ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.io; + +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.io.storage.HoodieInternalRowFileWriter; +import org.apache.hudi.io.storage.HoodieInternalRowFileWriterFactory; +import org.apache.hudi.table.HoodieTable; + +import org.apache.hadoop.fs.Path; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.types.StructType; + +import java.io.IOException; + +/** + * RowCreateHandle to be used when meta columns are disabled. + */ +public class HoodieAppendOnlyRowCreateHandle extends HoodieRowCreateHandle { + + public HoodieAppendOnlyRowCreateHandle(HoodieTable table, HoodieWriteConfig writeConfig, String partitionPath, String fileId, String instantTime, + int taskPartitionId, long taskId, long taskEpochId, StructType structType) { +super(table, writeConfig, partitionPath, fileId, instantTime, taskPartitionId, taskId, taskEpochId, structType); + } + + /** + * Write the incoming InternalRow as is. + * + * @param record instance of {@link InternalRow} that needs to be written to the fileWriter. + * @throws IOException + */ + @Override + public void write(InternalRow record) throws IOException { +try { + fileWriter.writeRow("", record); Review comment: what are all the empty string? can we avoid such calls? ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieAppendOnlyInternalRowParquetWriter.java ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380180#comment-17380180 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380166#comment-17380166 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878) * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380165#comment-17380165 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878) * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379639#comment-17379639 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379630#comment-17379630 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379600#comment-17379600 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379599#comment-17379599 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379596#comment-17379596 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379595#comment-17379595 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) * e56bac615f087cec7817b846809c9f8fd0cc20a5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379580#comment-17379580 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379567#comment-17379567 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868) * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379566#comment-17379566 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868) * 8a212fd77769cbf7e248e971f66109381ba80f71 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379436#comment-17379436 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379421#comment-17379421 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * f0dd67bb360fe3fd275264127d50a9feb881479a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=851) * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379419#comment-17379419 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * f0dd67bb360fe3fd275264127d50a9feb881479a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=851) * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378810#comment-17378810 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * f0dd67bb360fe3fd275264127d50a9feb881479a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=851) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378796#comment-17378796 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * c377b8f48a7826d5eadce80849669ae47ab9aace Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=844) * f0dd67bb360fe3fd275264127d50a9feb881479a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=851) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378795#comment-17378795 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * c377b8f48a7826d5eadce80849669ae47ab9aace Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=844) * f0dd67bb360fe3fd275264127d50a9feb881479a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378565#comment-17378565 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * c377b8f48a7826d5eadce80849669ae47ab9aace Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=844) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378564#comment-17378564 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * a4799add9402d6a963689bc75078e46431fbd941 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=843) * c377b8f48a7826d5eadce80849669ae47ab9aace Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=844) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378563#comment-17378563 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * a4799add9402d6a963689bc75078e46431fbd941 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=843) * c377b8f48a7826d5eadce80849669ae47ab9aace UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378562#comment-17378562 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r667410011 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java ## @@ -81,7 +99,18 @@ public String getPartitionPath(Row row) { return getKey(genericRecord).getPartitionPath(); } - void buildFieldPositionMapIfNeeded(StructType structType) { + /** + * Fetch partition path from {@link InternalRow}. + * + * @param internalRow {@link InternalRow} instance from which partition path needs to be fetched from. + * @param structType schema of the internalRow. + * @return the partition path. + */ + public String getPartitionPath(InternalRow internalRow, StructType structType) { +throw new UnsupportedOperationException("Operation not supported. Please override if required."); Review comment: Yet to figure out a way to fix this. Will update the patch once I have the solution. But atleast for all built in key gen, have concrete impls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378557#comment-17378557 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * a4799add9402d6a963689bc75078e46431fbd941 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=843) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378556#comment-17378556 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 76409a475b27bbf5d08212b7c00ba56fb42c8d01 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=842) * a4799add9402d6a963689bc75078e46431fbd941 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=843) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378555#comment-17378555 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r667397508 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -377,6 +398,60 @@ object HoodieSparkSqlWriter { (syncHiveSuccess, common.util.Option.ofNullable(instantTime)) } + def bulkInsertAsRowNoMetaColumns(sqlContext: SQLContext, Review comment: I meant for changes in HoodieSparkSqlwriter. but both uses the same custom datasource. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378553#comment-17378553 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 76409a475b27bbf5d08212b7c00ba56fb42c8d01 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=842) * a4799add9402d6a963689bc75078e46431fbd941 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378552#comment-17378552 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r667395541 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java ## @@ -81,7 +99,55 @@ public String getPartitionPath(Row row) { return getKey(genericRecord).getPartitionPath(); } - void buildFieldPositionMapIfNeeded(StructType structType) { + /** + * Fetch partition path from {@link InternalRow}. + * + * @param internalRow {@link InternalRow} instance from which partition path needs to be fetched from. + * @param structType schema of the internalRow. + * @return the partition path. + */ + public String getPartitionPath(InternalRow internalRow, StructType structType) { +Row row = null; +try { + row = deserializeRow(getEncoder(structType), internalRow); +} catch (Exception e) { + throw new IllegalStateException("Convertion of InternalRow to Row failed with exception " + e); +} +return getPartitionPath(row); + } + + private ExpressionEncoder getEncoder(StructType structType) { +if (encoder == null) { + synchronized (this) { +encoder = getRowEncoder(structType); + } +} +return encoder; + } + + private static ExpressionEncoder getRowEncoder(StructType schema) { +List attributes = JavaConversions.asJavaCollection(schema.toAttributes()).stream() +.map(Attribute::toAttribute).collect(Collectors.toList()); +return RowEncoder.apply(schema) + .resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(), +SimpleAnalyzer$.MODULE$); + } + + private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row) Review comment: yet to test this method. I found a similar method for serializeFromRow and came up with this. But have fixed all other build in key gens like simple, complex, timestamp, custom. Will update the patch once I have the fix. Wanted to open up for reviews as I work on them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378551#comment-17378551 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 452676092193ccbbcbc9034c893010ea2ec45da7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=811) * 76409a475b27bbf5d08212b7c00ba56fb42c8d01 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=842) * a4799add9402d6a963689bc75078e46431fbd941 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378550#comment-17378550 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r667394408 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -377,6 +398,60 @@ object HoodieSparkSqlWriter { (syncHiveSuccess, common.util.Option.ofNullable(instantTime)) } + def bulkInsertAsRowNoMetaColumns(sqlContext: SQLContext, Review comment: this has lot of commonality between existing bulk_insert for row writer path. Once I get first set of reviews, will unify them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378548#comment-17378548 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r81940 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -377,6 +397,63 @@ object HoodieSparkSqlWriter { (syncHiveSuccess, common.util.Option.ofNullable(instantTime)) } + def bulkInsertAppendOnlyAsRow(sqlContext: SQLContext, Review comment: bulkInsertAsRow and bulkInsertAppendOnlyAsRow has lot of common code. I am yet to unify them. Wanted to keep it separate for first set of reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378547#comment-17378547 ] ASF GitHub Bot commented on HUDI-2161: -- nsivabalan commented on a change in pull request #3247: URL: https://github.com/apache/hudi/pull/3247#discussion_r81609 ## File path: hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/HoodieDatasetBulkInsertHelper.java ## @@ -110,6 +151,55 @@ Dataset colOrderedDataset = dedupedDf.select( JavaConverters.collectionAsScalaIterableConverter(orderedFields).asScala().toSeq()); -return bulkInsertPartitionerRows.repartitionRecords(colOrderedDataset, config.getBulkInsertShuffleParallelism()); +return Pair.of(populateMetaCols ? bulkInsertPartitionerRows.repartitionRecords(colOrderedDataset, config.getBulkInsertShuffleParallelism()) : +new NonSortPartitionerWithRows().repartitionRecords(colOrderedDataset, config.getBulkInsertShuffleParallelism()), nonPartitionedDataset); } + + /** + * Add empty meta columns and reorder such that meta columns are at the beginning. + * + * @param rows + * @return + */ + public static Dataset prepareHoodieDatasetForBulkInsertAppendOnly(Dataset rows, HoodieWriteConfig config, boolean isGlobalIndex) { Review comment: this method is not used. both flows use the previous method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)