[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2020-09-12 Thread GitBox


leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487126922



##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());

Review comment:
   > Do you need to create resolver again? Does config.getLastSchema() work 
here?
   
   I tested it E2E and found that would not get the lastSchema from config 
since the config object are different

##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());
+schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+  } catch (Exception e) {
+// ignore exception.
+schema = config.getSchema();

Review comment:
   > We are potentially reducing schema here, so I think this can lead to 
issues. Can we throw error? At the least, can you add a LOG here to make sure 
this gets noticed?
   
   it handles the case that users config update partial fields in the first 
time, my original idea is not to throw error in this case, and LOG here sounds 
reasonable to me.

##
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = 
"hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = 
BulkInsertSortMode.GLOBAL_SORT
   .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = 
"hoodie.write.delete.marker.field";

Review comment:
   good catch, not needed. will revert

##
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String 
instantTime, String
* @param config Write Config
* @return
*/
-  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, 
HoodieTable hoodieTable) {
 Schema originalSchema = new Schema.Parser().parse(config.getSchema());
 Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+boolean updatePartialFields = config.updatePartialFields();
+if (updatePartialFields) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
   sounds reasonable

##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());

Review comment:
   > Do you need to create resolver again? Does config.getLastSchema() work 
here?
   
   I tested it E2E and found that would not get the lastSchema from config 
since the config object are different

##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());
+schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+  } catch (Exception e) {
+// ignore exception.
+schema = config.getSchema();

Review comment:
   > We are potentially reducing 

[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2020-09-12 Thread GitBox


leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487126922



##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());

Review comment:
   > Do you need to create resolver again? Does config.getLastSchema() work 
here?
   
   I tested it E2E and found that would not get the lastSchema from config 
since the config object are different

##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());
+schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+  } catch (Exception e) {
+// ignore exception.
+schema = config.getSchema();

Review comment:
   > We are potentially reducing schema here, so I think this can lead to 
issues. Can we throw error? At the least, can you add a LOG here to make sure 
this gets noticed?
   
   it handles the case that users config update partial fields in the first 
time, my original idea is not to throw error in this case, and LOG here sounds 
reasonable to me.

##
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = 
"hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = 
BulkInsertSortMode.GLOBAL_SORT
   .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = 
"hoodie.write.delete.marker.field";

Review comment:
   good catch, not needed. will revert

##
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String 
instantTime, String
* @param config Write Config
* @return
*/
-  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, 
HoodieTable hoodieTable) {
 Schema originalSchema = new Schema.Parser().parse(config.getSchema());
 Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+boolean updatePartialFields = config.updatePartialFields();
+if (updatePartialFields) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
   sounds reasonable

##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());

Review comment:
   > Do you need to create resolver again? Does config.getLastSchema() work 
here?
   
   I tested it E2E and found that would not get the lastSchema from config 
since the config object are different

##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());
+schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+  } catch (Exception e) {
+// ignore exception.
+schema = config.getSchema();

Review comment:
   > We are potentially reducing 

[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2020-09-12 Thread GitBox


leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487126922



##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());

Review comment:
   > Do you need to create resolver again? Does config.getLastSchema() work 
here?
   
   I tested it E2E and found that would not get the lastSchema from config 
since the config object are different

##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());
+schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+  } catch (Exception e) {
+// ignore exception.
+schema = config.getSchema();

Review comment:
   > We are potentially reducing schema here, so I think this can lead to 
issues. Can we throw error? At the least, can you add a LOG here to make sure 
this gets noticed?
   
   it handles the case that users config update partial fields in the first 
time, my original idea is not to throw error in this case, and LOG here sounds 
reasonable to me.

##
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = 
"hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = 
BulkInsertSortMode.GLOBAL_SORT
   .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = 
"hoodie.write.delete.marker.field";

Review comment:
   good catch, not needed. will revert

##
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String 
instantTime, String
* @param config Write Config
* @return
*/
-  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, 
HoodieTable hoodieTable) {
 Schema originalSchema = new Schema.Parser().parse(config.getSchema());
 Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+boolean updatePartialFields = config.updatePartialFields();
+if (updatePartialFields) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
   sounds reasonable





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2020-09-11 Thread GitBox


leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487137203



##
File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##
@@ -90,9 +92,19 @@ protected HoodieWriteHandle(HoodieWriteConfig config, String 
instantTime, String
* @param config Write Config
* @return
*/
-  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config) {
+  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, 
HoodieTable hoodieTable) {
 Schema originalSchema = new Schema.Parser().parse(config.getSchema());
 Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+boolean updatePartialFields = config.updatePartialFields();
+if (updatePartialFields) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(hoodieTable.getMetaClient());

Review comment:
   sounds reasonable





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2020-09-11 Thread GitBox


leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487128690



##
File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -94,6 +95,11 @@
   public static final String BULKINSERT_SORT_MODE = 
"hoodie.bulkinsert.sort.mode";
   public static final String DEFAULT_BULKINSERT_SORT_MODE = 
BulkInsertSortMode.GLOBAL_SORT
   .toString();
+  public static final String DELETE_MARKER_FIELD_PROP = 
"hoodie.write.delete.marker.field";

Review comment:
   good catch, not needed. will revert





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2020-09-11 Thread GitBox


leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487128319



##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());
+schema = resolver.getTableAvroSchemaWithoutMetadataFields().toString();
+  } catch (Exception e) {
+// ignore exception.
+schema = config.getSchema();

Review comment:
   > We are potentially reducing schema here, so I think this can lead to 
issues. Can we throw error? At the least, can you add a LOG here to make sure 
this gets noticed?
   
   it handles the case that users config update partial fields in the first 
time, my original idea is not to throw error in this case, and LOG here sounds 
reasonable to me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2020-09-11 Thread GitBox


leesf commented on a change in pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#discussion_r487126922



##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -117,7 +118,17 @@ public boolean commitStats(String instantTime, 
List stats, Opti
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(metadata::addMetadata);
 }
-metadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, config.getSchema());
+String schema = config.getSchema();
+if (config.updatePartialFields()) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(table.getMetaClient());

Review comment:
   > Do you need to create resolver again? Does config.getLastSchema() work 
here?
   
   I tested it E2E and found that would not get the lastSchema from config 
since the config object are different





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org