[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384171#comment-17384171
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384131#comment-17384131
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384132#comment-17384132
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -128,14 +128,35 @@ object HoodieSparkSqlWriter {
   .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY))
   
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, 
null))
   .setPartitionColumns(partitionColumns)
+  
.setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(),
 HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean)
   .initTable(sparkContext.hadoopConfiguration, path.get)
 tableConfig = tableMetaClient.getTableConfig
+  } else {
+// validate table properties
+val tableMetaClient = 
HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build()

Review comment:
   I understand that we need one place to do the validation. 
   As of now, any callers directly to WriteClient has to make another call to 
do the validation. 
   
   I could think of two other options. 
   1. Basically fix all startCommit() methods in WriteClient to take in 
operationType and do the validation. But there are quite a few callers(200 
places) which I need to fix. If we go ahead with this approach, validation 
happens within our custom data source where we instantiate the writeClient and 
start the commit (in row writer path). 
   Once we have consensus I can make the changes. Once fixed, callers don't 
need to make any additional calls to do validations. 
   
   2. I also thought we can add the validation to MetaClient and call it from 
within 
[getTableAndInitCtx()](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L405)
 in SparkRDDWriteClient. But We need the raw properties from user. If we call 
it from within getTableAndInitCtx(), we might fetch properties from writeConfig 
which would have read the table props already. So, not sure if we can go with 
this approach. But if we can get this in neatly, no changes required for those 
using writeClient directly. For row writer path, we need to make one additional 
call from within 
[DataSourceInternalWriterHelper](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java#L68)
 to explicitly validateProps using metaClient.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384118#comment-17384118
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan merged pull request #3247:
URL: https://github.com/apache/hudi/pull/3247


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384012#comment-17384012
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383971#comment-17383971
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -128,14 +128,35 @@ object HoodieSparkSqlWriter {
   .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY))
   
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, 
null))
   .setPartitionColumns(partitionColumns)
+  
.setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(),
 HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean)
   .initTable(sparkContext.hadoopConfiguration, path.get)
 tableConfig = tableMetaClient.getTableConfig
+  } else {
+// validate table properties
+val tableMetaClient = 
HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build()

Review comment:
   I understand that we need one place to do the validation. 
   As of now, any callers directly to WriteClient has to make another call to 
do the validation. 
   
   I could think of two other options. 
   1. Basically fix all startCommit() methods in WriteClient to take in 
operationType and do the validation. But there are quite a few callers(200 
places) which I need to fix. If we go ahead with this approach, validation 
happens within our custom data source where we instantiate the writeClient and 
start the commit (in row writer path). 
   Once we have consensus I can make the changes. Once fixed, callers don't 
need to make any additional calls to do validations. 
   
   2. I also thought we can add the validation to MetaClient and call it from 
within 
[getTableAndInitCtx()](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L405)
 in SparkRDDWriteClient. But We need the raw properties from user. If we call 
it from within getTableAndInitCtx(), we might fetch properties from writeConfig 
which would have read the table props already. So, not sure if we can go with 
this approach. But if we can get this in neatly, no changes required for those 
using writeClient directly. For row writer path, we need to make one additional 
call from within 
[DataSourceInternalWriterHelper](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java#L68)
 to explicitly validateProps using metaClient.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383970#comment-17383970
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383958#comment-17383958
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan merged pull request #3247:
URL: https://github.com/apache/hudi/pull/3247


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383675#comment-17383675
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan merged pull request #3247:
URL: https://github.com/apache/hudi/pull/3247


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383672#comment-17383672
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (129b2e6) into 
[master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (50c2b76) will **decrease** coverage by `0.03%`.
   > The diff coverage is `54.30%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3247  +/-   ##
   
   - Coverage 47.78%   47.74%   -0.04% 
   - Complexity 5557 5591  +34 
   
 Files   936  938   +2 
 Lines 4159641815 +219 
 Branches   4185 4213  +28 
   
   + Hits  1987719965  +88 
   - Misses1994920063 +114 
   - Partials   1770 1787  +17 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.55% <47.22%> (+0.03%)` | :arrow_up: |
   | hudicommon | `48.65% <23.52%> (-0.04%)` | :arrow_down: |
   | hudiflink | `59.44% <ø> (+0.08%)` | :arrow_up: |
   | hudihadoopmr | `52.02% <ø> (ø)` | |
   | hudisparkdatasource | `67.18% <75.40%> (-0.19%)` | :arrow_down: |
   | hudisync | `55.97% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `59.77% <ø> (+0.50%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh)
 | `43.37% <0.00%> (-0.15%)` | :arrow_down: |
   | 
[...torage/row/HoodieInternalRowFileWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dGaWxlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `51.61% <0.00%> (ø)` | |
   | 
[...io/storage/row/HoodieInternalRowParquetWriter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dQYXJxdWV0V3JpdGVyLmphdmE=)
 | `84.21% <0.00%> (ø)` | |
   | 
[.

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383666#comment-17383666
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (129b2e6) into 
[master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (50c2b76) will **decrease** coverage by `3.79%`.
   > The diff coverage is `54.30%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3247  +/-   ##
   
   - Coverage 47.78%   43.99%   -3.80% 
   + Complexity 5557 5173 -384 
   
 Files   936  938   +2 
 Lines 4159641815 +219 
 Branches   4185 4213  +28 
   
   - Hits  1987718396-1481 
   - Misses1994921770+1821 
   + Partials   1770 1649 -121 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.55% <47.22%> (+0.03%)` | :arrow_up: |
   | hudicommon | `48.65% <23.52%> (-0.04%)` | :arrow_down: |
   | hudiflink | `59.44% <ø> (+0.08%)` | :arrow_up: |
   | hudihadoopmr | `52.02% <ø> (ø)` | |
   | hudisparkdatasource | `67.18% <75.40%> (-0.19%)` | :arrow_down: |
   | hudisync | `55.97% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `8.99% <ø> (-50.27%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh)
 | `43.37% <0.00%> (-0.15%)` | :arrow_down: |
   | 
[...torage/row/HoodieInternalRowFileWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dGaWxlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `51.61% <0.00%> (ø)` | |
   | 
[...io/storage/row/HoodieInternalRowParquetWriter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dQYXJxdWV0V3JpdGVyLmphdmE=)
 | `84.21% <0.00%> (ø)` | |
   | 

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383661#comment-17383661
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (129b2e6) into 
[master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (50c2b76) will **decrease** coverage by `30.23%`.
   > The diff coverage is `47.22%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3247   +/-   ##
   =
   - Coverage 47.78%   17.55%   -30.24% 
   + Complexity 5557  905 -4652 
   =
 Files   936  390  -546 
 Lines 4159615579-26017 
 Branches   4185 1381 -2804 
   =
   - Hits  19877 2735-17142 
   + Misses1994912658 -7291 
   + Partials   1770  186 -1584 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `21.19% <47.22%> (-13.33%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `8.99% <ø> (-50.27%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh)
 | `0.00% <0.00%> (-43.52%)` | :arrow_down: |
   | 
[...torage/row/HoodieInternalRowFileWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dGaWxlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `51.61% <0.00%> (ø)` | |
   | 
[...io/storage/row/HoodieInternalRowParquetWriter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dQYXJxdWV0V3JpdGVyLmphdmE=)
 | `84.21% <0.00%> (ø)` | |
   | 
[...che/hudi/io/storage/row/HoodieRowCreateHandle.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383656#comment-17383656
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (129b2e6) into 
[master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (50c2b76) will **decrease** coverage by `44.96%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3247   +/-   ##
   
   - Coverage 47.78%   2.82%   -44.97% 
   + Complexity 5557  85 -5472 
   
 Files   936 284  -652 
 Lines 41596   11873-29723 
 Branches   4185 986 -3199 
   
   - Hits  19877 335-19542 
   + Misses19949   11512 -8437 
   + Partials   1770  26 -1744 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.53%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `8.99% <ø> (-50.27%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh)
 | `0.00% <0.00%> (-43.52%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comme

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383565#comment-17383565
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 129b2e6b3c374ca973ea7d9f1ef3d33dbc884aa9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1024)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383543#comment-17383543
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=983)
 
   * 129b2e6b3c374ca973ea7d9f1ef3d33dbc884aa9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1024)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383541#comment-17383541
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=983)
 
   * 129b2e6b3c374ca973ea7d9f1ef3d33dbc884aa9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383408#comment-17383408
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -128,14 +128,35 @@ object HoodieSparkSqlWriter {
   .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY))
   
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, 
null))
   .setPartitionColumns(partitionColumns)
+  
.setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(),
 HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean)
   .initTable(sparkContext.hadoopConfiguration, path.get)
 tableConfig = tableMetaClient.getTableConfig
+  } else {
+// validate table properties
+val tableMetaClient = 
HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build()

Review comment:
   I understand that we need one place to do the validation. 
   As of now, any callers directly to WriteClient has to make another call to 
do the validation. 
   
   I could think of two other options. 
   1. Basically fix all startCommit() methods in WriteClient to take in 
operationType and do the validation. But there are quite a few callers(200 
places) which I need to fix. If we go ahead with this approach, validation 
happens within our custom data source where we instantiate the writeClient and 
start the commit (in row writer path). 
   Once we have consensus I can make the changes. Once fixed, callers don't 
need to make any additional calls to do validations. 
   
   2. I also thought we can add the validation to MetaClient and call it from 
within 
[getTableAndInitCtx()](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L405)
 in SparkRDDWriteClient. But We need the raw properties from user. If we call 
it from within getTableAndInitCtx(), we might fetch properties from writeConfig 
which would have read the table props already. So, not sure if we can go with 
this approach. But if we can get this in neatly, no changes required for those 
using writeClient directly. For row writer path, we need to make one additional 
call from within 
[DataSourceInternalWriterHelper](https://github.com/apache/hudi/blob/2099bf41db76e9a6e946aa41c318b7c0e18be04d/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java#L68)
 to explicitly validateProps using metaClient.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382928#comment-17382928
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671605915



##
File path: 
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala
##
@@ -21,21 +21,24 @@ package org.apache.hudi
 import org.apache.avro.Schema
 import org.apache.avro.generic.GenericRecord
 import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hudi.client.utils.SparkRowSerDe

Review comment:
   Moved HoodiesparkUtils, SparkAdaptorSupport and SparkAdaptor from 
hudi-spark module to hudi-spark-client module since we wanted to access 
SparkAdaptor from within BuiltInKeygen. 
   If we don't want to move entire HoodiesparkUtils to a diff module, I can 
move just SparkAdaptor and SparkAdaptorSupport to hudi-spark-client module and 
create another class locally similar to HoodieSparkUtils in this module and 
expose createSparkRowSerDe




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382916#comment-17382916
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -128,14 +128,35 @@ object HoodieSparkSqlWriter {
   .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY))
   
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, 
null))
   .setPartitionColumns(partitionColumns)
+  
.setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(),
 HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean)
   .initTable(sparkContext.hadoopConfiguration, path.get)
 tableConfig = tableMetaClient.getTableConfig
+  } else {
+// validate table properties
+val tableMetaClient = 
HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build()

Review comment:
   I understand that we need one place to do the validation. 
   As of now, any callers directly to WriteClient has to make another call to 
do validation. 
   
   I could think of another option. Basically fix all startCommit() methods in 
WriteClient to take in operationType and do the validation. But there are quite 
a few callers(200 places) which I need to fix. If we go ahead with this 
approach, validation happens within our custom data source where we instantiate 
the writeClient and start the commit (in row writer path). 
   Once we have consensus I can make the changes. I feel this is the right 
approach. Once fixed, callers don't need to make any additional calls to do 
validations. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382915#comment-17382915
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -128,14 +128,35 @@ object HoodieSparkSqlWriter {
   .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY))
   
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, 
null))
   .setPartitionColumns(partitionColumns)
+  
.setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(),
 HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean)
   .initTable(sparkContext.hadoopConfiguration, path.get)
 tableConfig = tableMetaClient.getTableConfig
+  } else {
+// validate table properties
+val tableMetaClient = 
HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build()

Review comment:
   I understand that we need one place to do the validation. 
   As of now, any callers directly to WriteClient has to make another call to 
do validation. 
   
   I could think of another option. Basically fix all startCommit() methods in 
WriteClient to take in operationType and do the validation. But there are quite 
a few callers(30 places) which I need to fix. If we go ahead with this 
approach, validation happens within our custom data source where we instantiate 
the writeClient and start the commit (in row writer path). 
   Once we have consensus I can make the changes. I feel this is the right 
approach. Once fixed, callers don't need to make any additional calls to do 
validations. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382914#comment-17382914
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -128,14 +128,35 @@ object HoodieSparkSqlWriter {
   .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY))
   
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, 
null))
   .setPartitionColumns(partitionColumns)
+  
.setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(),
 HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean)
   .initTable(sparkContext.hadoopConfiguration, path.get)
 tableConfig = tableMetaClient.getTableConfig
+  } else {
+// validate table properties
+val tableMetaClient = 
HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build()

Review comment:
   I understand that we need one place to do the validation. 
   As of now, any callers directly to WriteClient has to make another call to 
do validation. 
   
   I could think of another option. Basically fix all startCommit() methods in 
WriteClient to take in operationType and do the validation. But there are quite 
a few callers which I need to fix. If we go ahead with this approach, 
validation happens within our custom data source where we instantiate the 
writeClient and start the commit (in row writer path). 
   Once we have consensus I can make the changes. I feel this is the right 
approach. Once fixed, callers don't need to make any additional calls to do 
validations. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382912#comment-17382912
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671900060



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -128,14 +128,35 @@ object HoodieSparkSqlWriter {
   .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY))
   
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, 
null))
   .setPartitionColumns(partitionColumns)
+  
.setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(),
 HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean)
   .initTable(sparkContext.hadoopConfiguration, path.get)
 tableConfig = tableMetaClient.getTableConfig
+  } else {
+// validate table properties
+val tableMetaClient = 
HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build()

Review comment:
   I understand that we need one place to do the validation. 
   As of now, any callers directly to WriteClient has to make another call to 
do validation. 
   
   I could think of another option. Basically fix all startCommit() methods in 
WriteClient to take in operationType and do the validation. But there are quite 
a few callers which I need to fix. If we go ahead with this approach, 
validation happens within our custom data source where we instantiate the 
writeClient and start the commit (in row writer path). 
   Once we have consensus I can make the changes. I feel this is the right 
approach. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382487#comment-17382487
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5b5c7c2) into 
[master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (50c2b76) will **decrease** coverage by `0.10%`.
   > The diff coverage is `54.73%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3247  +/-   ##
   
   - Coverage 47.78%   47.68%   -0.11% 
   - Complexity 5557 5583  +26 
   
 Files   936  938   +2 
 Lines 4159641763 +167 
 Branches   4185 4204  +19 
   
   + Hits  1987719913  +36 
   - Misses1994920066 +117 
   - Partials   1770 1784  +14 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `34.55% <46.72%> (+0.02%)` | :arrow_up: |
   | hudicommon | `48.67% <26.66%> (-0.03%)` | :arrow_down: |
   | hudiflink | `59.36% <ø> (ø)` | |
   | hudihadoopmr | `52.02% <ø> (ø)` | |
   | hudisparkdatasource | `67.12% <73.52%> (-0.26%)` | :arrow_down: |
   | hudisync | `55.97% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh)
 | `43.37% <0.00%> (-0.15%)` | :arrow_down: |
   | 
[...torage/row/HoodieInternalRowFileWriterFactory.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dGaWxlV3JpdGVyRmFjdG9yeS5qYXZh)
 | `51.61% <0.00%> (ø)` | |
   | 
[...io/storage/row/HoodieInternalRowParquetWriter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9yb3cvSG9vZGllSW50ZXJuYWxSb3dQYXJxdWV0V3JpdGVyLmphdmE=)
 | `84.21% <0.00%> (ø)` | |
   | 
[...che/hudi/io/storage/row/Hoodie

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382478#comment-17382478
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=983)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382468#comment-17382468
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5b5c7c2) into 
[master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (50c2b76) will **decrease** coverage by `1.76%`.
   > The diff coverage is `62.06%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3247  +/-   ##
   
   - Coverage 47.78%   46.01%   -1.77% 
   + Complexity 5557 4763 -794 
   
 Files   936  832 -104 
 Lines 4159638058-3538 
 Branches   4185 3809 -376 
   
   - Hits  1987717514-2363 
   + Misses1994918920-1029 
   + Partials   1770 1624 -146 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.97% <ø> (ø)` | |
   | hudiclient | `23.00% <0.00%> (-11.53%)` | :arrow_down: |
   | hudicommon | `48.67% <26.66%> (-0.03%)` | :arrow_down: |
   | hudiflink | `59.36% <ø> (ø)` | |
   | hudihadoopmr | `52.02% <ø> (ø)` | |
   | hudisparkdatasource | `67.12% <73.52%> (-0.26%)` | :arrow_down: |
   | hudisync | `55.97% <ø> (ø)` | |
   | huditimelineservice | `64.07% <ø> (ø)` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh)
 | `43.37% <0.00%> (-0.15%)` | :arrow_down: |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `62.36% <0.00%> (-2.39%)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2ludGVybmFsL0RlZmF1bHRTb3VyY2UuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...org/apache/hudi/spark3/internal/DefaultSource.java](https://codecov.io/gh/apach

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382464#comment-17382464
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382463#comment-17382463
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 5e7e02ec3da3137c31e2124c88cff815bc299875 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=980)
 
   * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=983)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382455#comment-17382455
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5e7e02e) into 
[master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (50c2b76) will **decrease** coverage by `32.03%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head 5e7e02e differs from pull request most recent 
head 5b5c7c2. Consider uploading reports for the commit 5b5c7c2 to get more 
accurate results
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3247   +/-   ##
   =
   - Coverage 47.78%   15.75%   -32.04% 
   + Complexity 5557  493 -5064 
   =
 Files   936  284  -652 
 Lines 4159611832-29764 
 Branches   4185  981 -3204 
   =
   - Hits  19877 1864-18013 
   + Misses19949 9805-10144 
   + Partials   1770  163 -1607 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.53%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `59.26% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh)
 | `0.00% <0.00%> (-43.52%)` | :arrow_down: |
   | 
[...main/java/org/apache/hudi/metrics/HoodieGauge.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL21ldHJpY3MvSG9vZGllR2F1Z2UuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | 

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382447#comment-17382447
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5b5c7c2) into 
[master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (50c2b76) will **decrease** coverage by `44.95%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3247   +/-   ##
   
   - Coverage 47.78%   2.83%   -44.96% 
   + Complexity 5557  85 -5472 
   
 Files   936 284  -652 
 Lines 41596   11832-29764 
 Branches   4185 981 -3204 
   
   - Hits  19877 335-19542 
   + Misses19949   11471 -8478 
   + Partials   1770  26 -1744 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.53%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh)
 | `0.00% <0.00%> (-43.52%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comme

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382445#comment-17382445
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 5e7e02ec3da3137c31e2124c88cff815bc299875 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=980)
 
   * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382444#comment-17382444
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816784


   Sorry I had squashed all commits to one so that its easier for me to rebase. 
I had 10+ files in conflict when I rebased w/ master. Also, I did rename quite 
a few files and moved some of them(reuse spark serDe class), and so went ahead 
and squashed it. sorry if you were planning to review just the latest commit.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382443#comment-17382443
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978)
 
   * 5e7e02ec3da3137c31e2124c88cff815bc299875 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=980)
 
   * 5b5c7c2fcdff2a5f5e0737da8c9d03da83c4a65c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382438#comment-17382438
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671606366



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -128,14 +128,35 @@ object HoodieSparkSqlWriter {
   .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY))
   
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, 
null))
   .setPartitionColumns(partitionColumns)
+  
.setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(),
 HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean)
   .initTable(sparkContext.hadoopConfiguration, path.get)
 tableConfig = tableMetaClient.getTableConfig
+  } else {
+// validate table properties
+val tableMetaClient = 
HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build()

Review comment:
   Added a private method here in HoodiesparkSqlWriter wrt params and add a 
method in HoodieTableMetaclient to validate table properties. 

##
File path: 
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala
##
@@ -21,21 +21,24 @@ package org.apache.hudi
 import org.apache.avro.Schema
 import org.apache.avro.generic.GenericRecord
 import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hudi.client.utils.SparkRowSerDe

Review comment:
   Moved HoodiesparkUtils, SparkAdaptorSupport and SparkAdaptor from 
hudi-spark module to hudi-spark-client module since we wanted to access 
SparkAdaptor from within BuiltInKeygen. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382439#comment-17382439
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

codecov-commenter commented on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-881816086


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3247](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (5e7e02e) into 
[master](https://codecov.io/gh/apache/hudi/commit/50c2b76d725a71608a38217370b1ac45cedae405?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (50c2b76) will **decrease** coverage by `44.95%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3247/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #3247   +/-   ##
   
   - Coverage 47.78%   2.83%   -44.96% 
   + Complexity 5557  85 -5472 
   
 Files   936 284  -652 
 Lines 41596   11832-29764 
 Branches   4185 981 -3204 
   
   - Hits  19877 335-19542 
   + Misses19949   11471 -8478 
   + Partials   1770  26 -1744 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `0.00% <0.00%> (-34.53%)` | :arrow_down: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `4.88% <ø> (-51.10%)` | :arrow_down: |
   | huditimelineservice | `?` | |
   | hudiutilities | `9.11% <ø> (-50.15%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3247?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../apache/hudi/client/HoodieInternalWriteStatus.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NsaWVudC9Ib29kaWVJbnRlcm5hbFdyaXRlU3RhdHVzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh)
 | `0.00% <0.00%> (-43.52%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/3247/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382436#comment-17382436
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978)
 
   * 5e7e02ec3da3137c31e2124c88cff815bc299875 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=980)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382435#comment-17382435
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978)
 
   * 5e7e02ec3da3137c31e2124c88cff815bc299875 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382433#comment-17382433
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382426#comment-17382426
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671600987



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -128,14 +128,35 @@ object HoodieSparkSqlWriter {
   .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY))
   
.setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, 
null))
   .setPartitionColumns(partitionColumns)
+  
.setPopulateMetaColumns(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.key(),
 HoodieTableConfig.HOODIE_POPULATE_META_COLUMNS.defaultValue()).toBoolean)
   .initTable(sparkContext.hadoopConfiguration, path.get)
 tableConfig = tableMetaClient.getTableConfig
+  } else {
+// validate table properties
+val tableMetaClient = 
HoodieTableMetaClient.builder().setBasePath(path.get).setConf(sparkContext.hadoopConfiguration).build()

Review comment:
   I am thinking to add the validation within HoodieTableMetaClient. 
Because, a) We don't instantiate WriteClient at all in row writer path as of 
now. b) table properties are available when tableMetaclient is instantiated. 
Even if we were to place it within WriteClient, we have to instantiate 
Metaclient to read the table props. So, we might as well place the validation 
method in HoodieTableMetaClient. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382417#comment-17382417
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * e32037e79596e3bf19415d1af107850643ee9ee5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=976)
 
   * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=978)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382405#comment-17382405
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * e32037e79596e3bf19415d1af107850643ee9ee5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=976)
 
   * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382404#comment-17382404
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894)
 
   * e32037e79596e3bf19415d1af107850643ee9ee5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=976)
 
   * 0f4199f559dc2aa205ea7109a5b6c0c7e4d34271 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382402#comment-17382402
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894)
 
   * e32037e79596e3bf19415d1af107850643ee9ee5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=976)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382401#comment-17382401
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894)
 
   * e32037e79596e3bf19415d1af107850643ee9ee5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382192#comment-17382192
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671383496



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/SimpleKeyGenerator.java
##
@@ -72,4 +75,16 @@ public String getPartitionPath(Row row) {
 return RowKeyGeneratorHelper.getPartitionPathFromRow(row, 
getPartitionPathFields(),
 hiveStylePartitioning, partitionPathPositions);
   }
+
+  @Override
+  public String getPartitionPath(InternalRow row, StructType structType) {
+buildFieldDataTypesMapIfNeeded(structType);

Review comment:
   During instantiation of KeyGen, we never know if we are ever going to 
invoke these methods. As you might be aware, these are invoked only in row 
writer path and only if meta cols are disabled. So, as of now, our constructors 
for keyGen are are designed in such a way. So, may be we can add public methods 
to set the structType and expect callers to set it appropriately before calling 
these methods. 
   I took precendence from what we did when we added support for 
getPartitionPath(Row). 
   Let me know how you wanna go about it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382190#comment-17382190
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671381130



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##
@@ -81,7 +105,52 @@ public String getPartitionPath(Row row) {
 return getKey(genericRecord).getPartitionPath();
   }
 
-  void buildFieldPositionMapIfNeeded(StructType structType) {
+  /**
+   * Fetch partition path from {@link InternalRow}.
+   *
+   * @param internalRow {@link InternalRow} instance from which partition path 
needs to be fetched from.
+   * @param structType  schema of the internalRow.
+   * @return the partition path.
+   */
+  public String getPartitionPath(InternalRow internalRow, StructType 
structType) {
+try {
+  Row row = deserializeRow(getEncoder(structType), internalRow);
+  return getPartitionPath(row);
+} catch (Exception e) {
+  throw new HoodieIOException("Conversion of InternalRow to Row failed 
with exception " + e);
+}
+  }
+
+  private ExpressionEncoder getEncoder(StructType structType) {
+if (encoder == null) {
+  encoder = getRowEncoder(structType);
+}
+return encoder;
+  }
+
+  private static ExpressionEncoder getRowEncoder(StructType schema) {
+List attributes = 
JavaConversions.asJavaCollection(schema.toAttributes()).stream()
+.map(Attribute::toAttribute).collect(Collectors.toList());
+return RowEncoder.apply(schema)
+
.resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(),
+SimpleAnalyzer$.MODULE$);
+  }
+
+  private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row)
+  throws InvocationTargetException, IllegalAccessException, 
NoSuchMethodException, ClassNotFoundException {
+// TODO remove reflection if Spark 2.x support is dropped
+if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) {

Review comment:
   I could not find any. 
   ```
   grep -irl "import org.apache.spark.sql.catalyst.expressions.Attribute" 
hudi-*/
   
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java
   
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/KeyGeneratorTestUtilities.java
   
hudi-client//hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
   
hudi-integ-test//src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java
   
hudi-spark-datasource//hudi-spark/src/test/java/org/apache/hudi/TestHoodieDatasetBulkInsertHelper.java
   
hudi-spark-datasource//hudi-spark/src/main/java/org/apache/hudi/SparkRowWriteHelper.java
   
hudi-spark-datasource//hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
   ```
   In most of these places, we do have a static method to getEncoder() , but 
the deserializeRow is first of its kind. We do have serializeRow which converts 
Row to InternalRow in KeyGeneratorTestUtilities, but its converts from Row -> 
InternalRow and its part of test code. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382191#comment-17382191
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671381130



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##
@@ -81,7 +105,52 @@ public String getPartitionPath(Row row) {
 return getKey(genericRecord).getPartitionPath();
   }
 
-  void buildFieldPositionMapIfNeeded(StructType structType) {
+  /**
+   * Fetch partition path from {@link InternalRow}.
+   *
+   * @param internalRow {@link InternalRow} instance from which partition path 
needs to be fetched from.
+   * @param structType  schema of the internalRow.
+   * @return the partition path.
+   */
+  public String getPartitionPath(InternalRow internalRow, StructType 
structType) {
+try {
+  Row row = deserializeRow(getEncoder(structType), internalRow);
+  return getPartitionPath(row);
+} catch (Exception e) {
+  throw new HoodieIOException("Conversion of InternalRow to Row failed 
with exception " + e);
+}
+  }
+
+  private ExpressionEncoder getEncoder(StructType structType) {
+if (encoder == null) {
+  encoder = getRowEncoder(structType);
+}
+return encoder;
+  }
+
+  private static ExpressionEncoder getRowEncoder(StructType schema) {
+List attributes = 
JavaConversions.asJavaCollection(schema.toAttributes()).stream()
+.map(Attribute::toAttribute).collect(Collectors.toList());
+return RowEncoder.apply(schema)
+
.resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(),
+SimpleAnalyzer$.MODULE$);
+  }
+
+  private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row)
+  throws InvocationTargetException, IllegalAccessException, 
NoSuchMethodException, ClassNotFoundException {
+// TODO remove reflection if Spark 2.x support is dropped
+if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) {

Review comment:
   I could not find any. 
   ```
   grep -irl "import org.apache.spark.sql.catalyst.expressions.Attribute" 
hudi-*/
   
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java
   
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/KeyGeneratorTestUtilities.java
   
hudi-client//hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
   
hudi-integ-test//src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java
   
hudi-spark-datasource//hudi-spark/src/test/java/org/apache/hudi/TestHoodieDatasetBulkInsertHelper.java
   
hudi-spark-datasource//hudi-spark/src/main/java/org/apache/hudi/SparkRowWriteHelper.java
   
hudi-spark-datasource//hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
   ```
   In most of these places, we do have a static method to getEncoder() , but 
the deserializeRow is first of its kind. We do have serializeRow which converts 
Row to InternalRow in KeyGeneratorTestUtilities, but its converts from Row -> 
InternalRow and its part of test code. 
   Here we needed to convert InternalRow -> Row. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382189#comment-17382189
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671381130



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##
@@ -81,7 +105,52 @@ public String getPartitionPath(Row row) {
 return getKey(genericRecord).getPartitionPath();
   }
 
-  void buildFieldPositionMapIfNeeded(StructType structType) {
+  /**
+   * Fetch partition path from {@link InternalRow}.
+   *
+   * @param internalRow {@link InternalRow} instance from which partition path 
needs to be fetched from.
+   * @param structType  schema of the internalRow.
+   * @return the partition path.
+   */
+  public String getPartitionPath(InternalRow internalRow, StructType 
structType) {
+try {
+  Row row = deserializeRow(getEncoder(structType), internalRow);
+  return getPartitionPath(row);
+} catch (Exception e) {
+  throw new HoodieIOException("Conversion of InternalRow to Row failed 
with exception " + e);
+}
+  }
+
+  private ExpressionEncoder getEncoder(StructType structType) {
+if (encoder == null) {
+  encoder = getRowEncoder(structType);
+}
+return encoder;
+  }
+
+  private static ExpressionEncoder getRowEncoder(StructType schema) {
+List attributes = 
JavaConversions.asJavaCollection(schema.toAttributes()).stream()
+.map(Attribute::toAttribute).collect(Collectors.toList());
+return RowEncoder.apply(schema)
+
.resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(),
+SimpleAnalyzer$.MODULE$);
+  }
+
+  private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row)
+  throws InvocationTargetException, IllegalAccessException, 
NoSuchMethodException, ClassNotFoundException {
+// TODO remove reflection if Spark 2.x support is dropped
+if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) {

Review comment:
   I could not find any. 
   ```
   grep -irl "import org.apache.spark.sql.catalyst.expressions.Attribute" 
hudi-*/
   
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java
   
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/KeyGeneratorTestUtilities.java
   
hudi-client//hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
   
hudi-integ-test//src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java
   
hudi-spark-datasource//hudi-spark/src/test/java/org/apache/hudi/TestHoodieDatasetBulkInsertHelper.java
   
hudi-spark-datasource//hudi-spark/src/main/java/org/apache/hudi/SparkRowWriteHelper.java
   
hudi-spark-datasource//hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
   ```
   In most of these places, we do have a static method to getEncode, but the 
deserializeRow is first of its kind. We do have serializeRow which converts Row 
to InternalRow in KeyGeneratorTestUtilities, but its converts from Row -> 
InternalRow and its part of test code. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382186#comment-17382186
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671378212



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##
@@ -81,7 +105,52 @@ public String getPartitionPath(Row row) {
 return getKey(genericRecord).getPartitionPath();
   }
 
-  void buildFieldPositionMapIfNeeded(StructType structType) {
+  /**
+   * Fetch partition path from {@link InternalRow}.
+   *
+   * @param internalRow {@link InternalRow} instance from which partition path 
needs to be fetched from.
+   * @param structType  schema of the internalRow.
+   * @return the partition path.
+   */
+  public String getPartitionPath(InternalRow internalRow, StructType 
structType) {
+try {
+  Row row = deserializeRow(getEncoder(structType), internalRow);
+  return getPartitionPath(row);
+} catch (Exception e) {
+  throw new HoodieIOException("Conversion of InternalRow to Row failed 
with exception " + e);
+}
+  }
+
+  private ExpressionEncoder getEncoder(StructType structType) {
+if (encoder == null) {

Review comment:
   deserializeRow() is a static method and hence. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382183#comment-17382183
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671374028



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieAppendOnlyRowParquetWriteSupport.java
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.hadoop.api.WriteSupport;
+import org.apache.spark.sql.types.StructType;
+
+import java.util.Collections;
+
+/**
+ * Hoodie Write Support for directly writing Row to Parquet.
+ */
+public class HoodieAppendOnlyRowParquetWriteSupport extends 
HoodieRowParquetWriteSupport {

Review comment:
   actually existing ParquetWriteSupport will handle null bloom filter. So, 
removing this class. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382172#comment-17382172
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671365168



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/HoodieAppendOnlyRowCreateHandle.java
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io;
+
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.io.storage.HoodieInternalRowFileWriter;
+import org.apache.hudi.io.storage.HoodieInternalRowFileWriterFactory;
+import org.apache.hudi.table.HoodieTable;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.types.StructType;
+
+import java.io.IOException;
+
+/**
+ * RowCreateHandle to be used when meta columns are disabled.
+ */
+public class HoodieAppendOnlyRowCreateHandle extends HoodieRowCreateHandle {
+
+  public HoodieAppendOnlyRowCreateHandle(HoodieTable table, HoodieWriteConfig 
writeConfig, String partitionPath, String fileId, String instantTime,
+ int taskPartitionId, long taskId, 
long taskEpochId, StructType structType) {
+super(table, writeConfig, partitionPath, fileId, instantTime, 
taskPartitionId, taskId, taskEpochId, structType);
+  }
+
+  /**
+   * Write the incoming InternalRow as is.
+   *
+   * @param record instance of {@link InternalRow} that needs to be written to 
the fileWriter.
+   * @throws IOException
+   */
+  @Override
+  public void write(InternalRow record) throws IOException {
+try {
+  fileWriter.writeRow("", record);

Review comment:
   Here is the reason why I did not fix it. As of now we have 
HoodieInternalRowFileWriter which has writeRow(recordKey, InternalRow). I did 
not want to introduce another interface bcoz, I can't extend this new 
RowCreateHandle from existing RowCreateHandle as both will impl diff 
interfaces. 
   But guess, I can add another overloaded write method to the same interface 
and each class can call into one of them. 
   Will fix it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382169#comment-17382169
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671364040



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/HoodieAppendOnlyRowCreateHandle.java
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io;

Review comment:
   sure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382161#comment-17382161
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671357904



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##
@@ -675,6 +676,11 @@ public PropertyBuilder setBootstrapBasePath(String 
bootstrapBasePath) {
   return this;
 }
 
+public PropertyBuilder setPopulateMetaColumns(boolean populateMetaColumns) 
{

Review comment:
   sure. will fix everywhere. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382160#comment-17382160
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671357284



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/SimpleKeyGenerator.java
##
@@ -72,4 +75,16 @@ public String getPartitionPath(Row row) {
 return RowKeyGeneratorHelper.getPartitionPathFromRow(row, 
getPartitionPathFields(),
 hiveStylePartitioning, partitionPathPositions);
   }
+
+  @Override
+  public String getPartitionPath(InternalRow row, StructType structType) {
+buildFieldDataTypesMapIfNeeded(structType);

Review comment:
   yes, InternalRow does not have the come with a schema unfortunately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382159#comment-17382159
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671356705



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##
@@ -81,7 +105,52 @@ public String getPartitionPath(Row row) {
 return getKey(genericRecord).getPartitionPath();
   }
 
-  void buildFieldPositionMapIfNeeded(StructType structType) {
+  /**
+   * Fetch partition path from {@link InternalRow}.
+   *
+   * @param internalRow {@link InternalRow} instance from which partition path 
needs to be fetched from.
+   * @param structType  schema of the internalRow.
+   * @return the partition path.
+   */
+  public String getPartitionPath(InternalRow internalRow, StructType 
structType) {
+try {
+  Row row = deserializeRow(getEncoder(structType), internalRow);
+  return getPartitionPath(row);
+} catch (Exception e) {
+  throw new HoodieIOException("Conversion of InternalRow to Row failed 
with exception " + e);
+}
+  }
+
+  private ExpressionEncoder getEncoder(StructType structType) {
+if (encoder == null) {
+  encoder = getRowEncoder(structType);
+}
+return encoder;
+  }
+
+  private static ExpressionEncoder getRowEncoder(StructType schema) {
+List attributes = 
JavaConversions.asJavaCollection(schema.toAttributes()).stream()
+.map(Attribute::toAttribute).collect(Collectors.toList());
+return RowEncoder.apply(schema)

Review comment:
   yes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381629#comment-17381629
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

vinothchandar commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r670175121



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/HoodieAppendOnlyRowCreateHandle.java
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io;
+
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.io.storage.HoodieInternalRowFileWriter;
+import org.apache.hudi.io.storage.HoodieInternalRowFileWriterFactory;
+import org.apache.hudi.table.HoodieTable;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.types.StructType;
+
+import java.io.IOException;
+
+/**
+ * RowCreateHandle to be used when meta columns are disabled.
+ */
+public class HoodieAppendOnlyRowCreateHandle extends HoodieRowCreateHandle {

Review comment:
   rename: `HoodieNoMetaRowCreateHandle` or sth? lets not leak higher level 
use-cases into class names low down the stack? Row create handle implies we are 
appending data?

##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/HoodieAppendOnlyRowCreateHandle.java
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io;
+
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.io.storage.HoodieInternalRowFileWriter;
+import org.apache.hudi.io.storage.HoodieInternalRowFileWriterFactory;
+import org.apache.hudi.table.HoodieTable;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.types.StructType;
+
+import java.io.IOException;
+
+/**
+ * RowCreateHandle to be used when meta columns are disabled.
+ */
+public class HoodieAppendOnlyRowCreateHandle extends HoodieRowCreateHandle {
+
+  public HoodieAppendOnlyRowCreateHandle(HoodieTable table, HoodieWriteConfig 
writeConfig, String partitionPath, String fileId, String instantTime,
+ int taskPartitionId, long taskId, 
long taskEpochId, StructType structType) {
+super(table, writeConfig, partitionPath, fileId, instantTime, 
taskPartitionId, taskId, taskEpochId, structType);
+  }
+
+  /**
+   * Write the incoming InternalRow as is.
+   *
+   * @param record instance of {@link InternalRow} that needs to be written to 
the fileWriter.
+   * @throws IOException
+   */
+  @Override
+  public void write(InternalRow record) throws IOException {
+try {
+  fileWriter.writeRow("", record);

Review comment:
   what are all the empty string? can we avoid such calls?

##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieAppendOnlyInternalRowParquetWriter.java
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file 

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380180#comment-17380180
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380166#comment-17380166
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878)
 
   * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=894)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380165#comment-17380165
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878)
 
   * 0a2849c7b940ad671b152fabc2d0bb3d66c93070 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379639#comment-17379639
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379630#comment-17379630
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379600#comment-17379600
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379599#comment-17379599
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379596#comment-17379596
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379595#comment-17379595
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379580#comment-17379580
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379567#comment-17379567
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868)
 
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379566#comment-17379566
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868)
 
   * 8a212fd77769cbf7e248e971f66109381ba80f71 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379436#comment-17379436
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379421#comment-17379421
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * f0dd67bb360fe3fd275264127d50a9feb881479a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=851)
 
   * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379419#comment-17379419
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * f0dd67bb360fe3fd275264127d50a9feb881479a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=851)
 
   * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378810#comment-17378810
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * f0dd67bb360fe3fd275264127d50a9feb881479a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=851)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378796#comment-17378796
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * c377b8f48a7826d5eadce80849669ae47ab9aace Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=844)
 
   * f0dd67bb360fe3fd275264127d50a9feb881479a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=851)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378795#comment-17378795
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * c377b8f48a7826d5eadce80849669ae47ab9aace Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=844)
 
   * f0dd67bb360fe3fd275264127d50a9feb881479a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378565#comment-17378565
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * c377b8f48a7826d5eadce80849669ae47ab9aace Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=844)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378564#comment-17378564
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * a4799add9402d6a963689bc75078e46431fbd941 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=843)
 
   * c377b8f48a7826d5eadce80849669ae47ab9aace Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=844)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378563#comment-17378563
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * a4799add9402d6a963689bc75078e46431fbd941 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=843)
 
   * c377b8f48a7826d5eadce80849669ae47ab9aace UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378562#comment-17378562
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r667410011



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##
@@ -81,7 +99,18 @@ public String getPartitionPath(Row row) {
 return getKey(genericRecord).getPartitionPath();
   }
 
-  void buildFieldPositionMapIfNeeded(StructType structType) {
+  /**
+   * Fetch partition path from {@link InternalRow}.
+   *
+   * @param internalRow {@link InternalRow} instance from which partition path 
needs to be fetched from.
+   * @param structType  schema of the internalRow.
+   * @return the partition path.
+   */
+  public String getPartitionPath(InternalRow internalRow, StructType 
structType) {
+throw new UnsupportedOperationException("Operation not supported. Please 
override if required.");

Review comment:
   Yet to figure out a way to fix this. Will update the patch once I have 
the solution. But atleast for all built in key gen, have concrete impls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378557#comment-17378557
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * a4799add9402d6a963689bc75078e46431fbd941 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=843)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378556#comment-17378556
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 76409a475b27bbf5d08212b7c00ba56fb42c8d01 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=842)
 
   * a4799add9402d6a963689bc75078e46431fbd941 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=843)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378555#comment-17378555
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r667397508



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -377,6 +398,60 @@ object HoodieSparkSqlWriter {
 (syncHiveSuccess, common.util.Option.ofNullable(instantTime))
   }
 
+  def bulkInsertAsRowNoMetaColumns(sqlContext: SQLContext,

Review comment:
   I meant for changes in HoodieSparkSqlwriter. but both uses the same 
custom datasource. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378553#comment-17378553
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 76409a475b27bbf5d08212b7c00ba56fb42c8d01 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=842)
 
   * a4799add9402d6a963689bc75078e46431fbd941 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378552#comment-17378552
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r667395541



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##
@@ -81,7 +99,55 @@ public String getPartitionPath(Row row) {
 return getKey(genericRecord).getPartitionPath();
   }
 
-  void buildFieldPositionMapIfNeeded(StructType structType) {
+  /**
+   * Fetch partition path from {@link InternalRow}.
+   *
+   * @param internalRow {@link InternalRow} instance from which partition path 
needs to be fetched from.
+   * @param structType  schema of the internalRow.
+   * @return the partition path.
+   */
+  public String getPartitionPath(InternalRow internalRow, StructType 
structType) {
+Row row = null;
+try {
+  row = deserializeRow(getEncoder(structType), internalRow);
+} catch (Exception e) {
+  throw new IllegalStateException("Convertion of InternalRow to Row failed 
with exception " + e);
+}
+return getPartitionPath(row);
+  }
+
+  private ExpressionEncoder getEncoder(StructType structType) {
+if (encoder == null) {
+  synchronized (this) {
+encoder = getRowEncoder(structType);
+  }
+}
+return encoder;
+  }
+
+  private static ExpressionEncoder getRowEncoder(StructType schema) {
+List attributes = 
JavaConversions.asJavaCollection(schema.toAttributes()).stream()
+.map(Attribute::toAttribute).collect(Collectors.toList());
+return RowEncoder.apply(schema)
+
.resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(),
+SimpleAnalyzer$.MODULE$);
+  }
+
+  private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row)

Review comment:
   yet to test this method. I found a similar method for serializeFromRow 
and came up with this. But have fixed all other build in key gens like simple, 
complex, timestamp, custom. Will update the patch once I have the fix. Wanted 
to open up for reviews as I work on them. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378551#comment-17378551
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 452676092193ccbbcbc9034c893010ea2ec45da7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=811)
 
   * 76409a475b27bbf5d08212b7c00ba56fb42c8d01 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=842)
 
   * a4799add9402d6a963689bc75078e46431fbd941 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378550#comment-17378550
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r667394408



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -377,6 +398,60 @@ object HoodieSparkSqlWriter {
 (syncHiveSuccess, common.util.Option.ofNullable(instantTime))
   }
 
+  def bulkInsertAsRowNoMetaColumns(sqlContext: SQLContext,

Review comment:
   this has lot of commonality between existing bulk_insert for row writer 
path. Once I get first set of reviews, will unify them. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378548#comment-17378548
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r81940



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -377,6 +397,63 @@ object HoodieSparkSqlWriter {
 (syncHiveSuccess, common.util.Option.ofNullable(instantTime))
   }
 
+  def bulkInsertAppendOnlyAsRow(sqlContext: SQLContext,

Review comment:
   bulkInsertAsRow and bulkInsertAppendOnlyAsRow has lot of common code. I 
am yet to unify them. Wanted to keep it separate for first set of reviews. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378547#comment-17378547
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r81609



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/HoodieDatasetBulkInsertHelper.java
##
@@ -110,6 +151,55 @@
 Dataset colOrderedDataset = dedupedDf.select(
 
JavaConverters.collectionAsScalaIterableConverter(orderedFields).asScala().toSeq());
 
-return bulkInsertPartitionerRows.repartitionRecords(colOrderedDataset, 
config.getBulkInsertShuffleParallelism());
+return Pair.of(populateMetaCols ? 
bulkInsertPartitionerRows.repartitionRecords(colOrderedDataset, 
config.getBulkInsertShuffleParallelism()) :
+new NonSortPartitionerWithRows().repartitionRecords(colOrderedDataset, 
config.getBulkInsertShuffleParallelism()), nonPartitionedDataset);
   }
+
+  /**
+   * Add empty meta columns and reorder such that meta columns are at the 
beginning.
+   *
+   * @param rows
+   * @return
+   */
+  public static Dataset 
prepareHoodieDatasetForBulkInsertAppendOnly(Dataset rows, 
HoodieWriteConfig config, boolean isGlobalIndex) {

Review comment:
   this method is not used. both flows use the previous method. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)