Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-24 Thread via GitHub


yihua merged PR #11265:
URL: https://github.com/apache/hudi/pull/11265


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-24 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2249013995

   
   ## CI report:
   
   * c0f09441fc00593d5e6c5f99402247a1c1dbda7a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24981)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-24 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2249006561

   
   ## CI report:
   
   * c0f09441fc00593d5e6c5f99402247a1c1dbda7a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-24 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2248860491

   
   ## CI report:
   
   * c0f09441fc00593d5e6c5f99402247a1c1dbda7a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24981)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-24 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2248779321

   
   ## CI report:
   
   * 5cdb05c7d909edebbc74ccf75e26c07820272090 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24774)
 
   * c0f09441fc00593d5e6c5f99402247a1c1dbda7a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24981)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-24 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2248766058

   
   ## CI report:
   
   * 5cdb05c7d909edebbc74ccf75e26c07820272090 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24774)
 
   * c0f09441fc00593d5e6c5f99402247a1c1dbda7a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-24 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1690309755


##
hudi-common/src/test/java/org/apache/hudi/avro/TestMercifulJsonConverter.java:
##
@@ -55,6 +70,649 @@ public void basicConversion() throws IOException {
 Assertions.assertEquals(rec, CONVERTER.convert(json, simpleSchema));
   }
 
+  private static final String DECIMAL_AVRO_FILE_INVALID_PATH = 
"/decimal-logical-type-invalid.avsc";
+  private static final String DECIMAL_AVRO_FILE_PATH = 
"/decimal-logical-type.avsc";
+  private static final String DECIMAL_FIXED_AVRO_FILE_PATH = 
"/decimal-logical-type-fixed-type.avsc";
+  /**
+   * Covered case:
+   * Avro Logical Type: Decimal
+   * Exhaustive unsupported input coverage.
+   */
+  @ParameterizedTest
+  @MethodSource("decimalBadCases")
+  void decimalLogicalTypeInvalidCaseTest(String avroFile, String strInput, 
Double numInput,
+ boolean testFixedByteArray) throws 
IOException {
+Schema schema = SchemaTestUtil.getSchemaFromResourceFilePath(avroFile);
+
+Map data = new HashMap<>();
+if (strInput != null) {
+  data.put("decimalField", strInput);
+} else if (numInput != null) {
+  data.put("decimalField", numInput);
+} else if (testFixedByteArray) {
+  // Convert the fixed value to int array, which is used as json value 
literals.
+  int[] intArray = {0, 0, 48, 57};
+  data.put("decimalField", intArray);
+}
+String json = MAPPER.writeValueAsString(data);
+
+// Schedule with timestamp same as that of committed instant
+
assertThrows(MercifulJsonConverter.HoodieJsonToAvroConversionException.class, 
() -> {
+  CONVERTER.convert(json, schema);
+});
+  }
+
+  static Stream decimalBadCases() {
+return Stream.of(
+// Invalid schema definition.
+Arguments.of(DECIMAL_AVRO_FILE_INVALID_PATH, "123.45", null, false),
+// Schema set precision as 5, input overwhelmed the precision.
+Arguments.of(DECIMAL_AVRO_FILE_PATH, "12.45", null, false),
+Arguments.of(DECIMAL_AVRO_FILE_PATH, null, 12.45, false),
+// Schema precision set to 5, scale set to 2, so there is only 3 digit 
to accommodate integer part.
+// As we do not do rounding, any input with more than 3 digit integer 
would fail.
+Arguments.of(DECIMAL_AVRO_FILE_PATH, "1233", null, false),
+Arguments.of(DECIMAL_AVRO_FILE_PATH, null, 1233D, false),
+// Schema set scale as 2, input overwhelmed the scale.
+Arguments.of(DECIMAL_AVRO_FILE_PATH, "0.222", null, false),
+Arguments.of(DECIMAL_AVRO_FILE_PATH, null, 0.222, false),
+// Invalid string which cannot be parsed as number.
+Arguments.of(DECIMAL_AVRO_FILE_PATH, "", null, false),
+Arguments.of(DECIMAL_AVRO_FILE_PATH, "NotAValidString", null, false),
+Arguments.of(DECIMAL_AVRO_FILE_PATH, "-", null, false),
+// Schema requires byte type while input is fixed type raw data.
+Arguments.of(DECIMAL_AVRO_FILE_PATH, null, null, true)
+);
+  }
+
+  /**
+   * Covered case:
+   * Avro Logical Type: Decimal
+   * Avro type: bytes, fixed
+   * Input: Check test parameter
+   * Output: Object using Byte data type as the schema specified.
+   * */
+  @ParameterizedTest
+  @MethodSource("decimalGoodCases")
+  void decimalLogicalTypeTest(String avroFilePath, String groundTruth, String 
strInput,
+  Double numInput, boolean testFixedByteArray) 
throws IOException {
+BigDecimal bigDecimal = new BigDecimal(groundTruth);
+Map data = new HashMap<>();
+
+Schema schema = SchemaTestUtil.getSchemaFromResourceFilePath(avroFilePath);
+GenericRecord record = new GenericData.Record(schema);
+Conversions.DecimalConversion conv = new Conversions.DecimalConversion();
+Schema decimalFieldSchema = schema.getField("decimalField").schema();
+
+// Decide the decimal field input according to the test dimension.
+if (strInput != null) {
+  data.put("decimalField", strInput); // String number input
+} else if (numInput != null) {
+  data.put("decimalField", numInput); // Number input
+} else if (testFixedByteArray) {
+  // Fixed byte array input.
+  // Example: 123.45 - byte array [0, 0, 48, 57].
+  Schema fieldSchema = schema.getField("decimalField").schema();
+  GenericFixed fixedValue = new Conversions.DecimalConversion().toFixed(
+  bigDecimal, fieldSchema, fieldSchema.getLogicalType());
+  // Convert the fixed value to int array, which is used as json value 
literals.
+  byte[] byteArray = fixedValue.bytes();
+  int[] intArray = new int[byteArray.length];
+  for (int i = 0; i < byteArray.length; i++) {
+// Byte is signed in Java, int is 32-bit. Convert by & 0xFF to handle 
negative values correctly.
+intArray[i] = byteArray[i] & 0xFF;
+  }
+  data.put("decimalField", i

Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-24 Thread via GitHub


yihua commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1690295012


##
hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java:
##
@@ -187,196 +180,772 @@ private static Object convertJsonToAvroField(Object 
value, String name, Schema s
   throw new HoodieJsonToAvroConversionException(null, name, schema, 
shouldSanitize, invalidCharMask);
 }
 
-JsonToAvroFieldProcessor processor = 
FIELD_TYPE_PROCESSORS.get(schema.getType());
-if (null != processor) {
-  return processor.convertToAvro(value, name, schema, shouldSanitize, 
invalidCharMask);
-}
-throw new IllegalArgumentException("JsonConverter cannot handle type: " + 
schema.getType());
+return JsonToAvroFieldProcessorUtil.convertToAvro(value, name, schema, 
shouldSanitize, invalidCharMask);
   }
 
-  /**
-   * Base Class for converting json to avro fields.
-   */
-  private abstract static class JsonToAvroFieldProcessor implements 
Serializable {
+  private static class JsonToAvroFieldProcessorUtil {
+/**
+ * Base Class for converting json to avro fields.
+ */
+private abstract static class JsonToAvroFieldProcessor implements 
Serializable {
 
-public Object convertToAvro(Object value, String name, Schema schema, 
boolean shouldSanitize, String invalidCharMask) {
-  Pair res = convert(value, name, schema, shouldSanitize, 
invalidCharMask);
-  if (!res.getLeft()) {
-throw new HoodieJsonToAvroConversionException(value, name, schema, 
shouldSanitize, invalidCharMask);
+  public Object convertToAvro(Object value, String name, Schema schema, 
boolean shouldSanitize, String invalidCharMask) {
+Pair res = convert(value, name, schema, 
shouldSanitize, invalidCharMask);
+if (!res.getLeft()) {
+  throw new HoodieJsonToAvroConversionException(value, name, schema, 
shouldSanitize, invalidCharMask);
+}
+return res.getRight();
   }
-  return res.getRight();
+
+  protected abstract Pair convert(Object value, String 
name, Schema schema, boolean shouldSanitize, String invalidCharMask);
 }
 
-protected abstract Pair convert(Object value, String 
name, Schema schema, boolean shouldSanitize, String invalidCharMask);
-  }
+public static Object convertToAvro(Object value, String name, Schema 
schema, boolean shouldSanitize, String invalidCharMask) {
+  JsonToAvroFieldProcessor processor = getProcessorForSchema(schema);
+  return processor.convertToAvro(value, name, schema, shouldSanitize, 
invalidCharMask);
+}
+
+private static JsonToAvroFieldProcessor getProcessorForSchema(Schema 
schema) {
+  JsonToAvroFieldProcessor processor = null;
+
+  // 3 cases to consider: customized logicalType, logicalType, and type.
+  String customizedLogicalType = schema.getProp("logicalType");
+  LogicalType logicalType = schema.getLogicalType();
+  Type type = schema.getType();
+  if (customizedLogicalType != null && !customizedLogicalType.isEmpty()) {
+processor = 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS.get(customizedLogicalType);
+  } else if (logicalType != null) {
+processor = 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS.get(logicalType.getName());
+  } else {
+processor = AVRO_TYPE_FIELD_TYPE_PROCESSORS.get(type);
+  }
+
+  ValidationUtils.checkArgument(
+  processor != null, String.format("JsonConverter cannot handle type: 
%s", type));
+  return processor;
+}
+
+// Avro primitive and complex type processors.
+private static final Map 
AVRO_TYPE_FIELD_TYPE_PROCESSORS = getFieldTypeProcessors();
+// Avro logical type processors.
+private static final Map 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS = getLogicalFieldTypeProcessors();
+
+/**
+ * Build type processor map for each avro type.
+ */
+private static Map 
getFieldTypeProcessors() {
+  Map fieldTypeProcessors = new 
EnumMap<>(Schema.Type.class);
+  fieldTypeProcessors.put(Type.STRING, generateStringTypeHandler());
+  fieldTypeProcessors.put(Type.BOOLEAN, generateBooleanTypeHandler());
+  fieldTypeProcessors.put(Type.DOUBLE, generateDoubleTypeHandler());
+  fieldTypeProcessors.put(Type.FLOAT, generateFloatTypeHandler());
+  fieldTypeProcessors.put(Type.INT, generateIntTypeHandler());
+  fieldTypeProcessors.put(Type.LONG, generateLongTypeHandler());
+  fieldTypeProcessors.put(Type.ARRAY, generateArrayTypeHandler());
+  fieldTypeProcessors.put(Type.RECORD, generateRecordTypeHandler());
+  fieldTypeProcessors.put(Type.ENUM, generateEnumTypeHandler());
+  fieldTypeProcessors.put(Type.MAP, generateMapTypeHandler());
+  fieldTypeProcessors.put(Type.BYTES, generateBytesTypeHandler());
+  fieldTypeProcessors.put(Type.FIXED, generateFixedTypeHandler());
+  return Collections.unmodifiableMap(fieldTypeProcessors);
+}
 
-  private static JsonToAvroFieldProcessor generateBo

Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-24 Thread via GitHub


yihua commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1690288658


##
hudi-common/src/main/java/org/apache/hudi/avro/AvroLogicalTypeEnum.java:
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.avro;
+
+public enum AvroLogicalTypeEnum {

Review Comment:
   Sg



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-11 Thread via GitHub


Davis-Zhang-Onehouse commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2223310964

   @codope is onboarded with the change. @yihua your review is required. Please 
take a look when you get a chance (after you got bandwidth from your current 
priority)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-10 Thread via GitHub


codope commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1673287677


##
hudi-common/src/test/resources/date-type-invalid.avsc:
##


Review Comment:
   ok with separate resource files, but we should be conscious as too many 
files like these will bloat the bundle.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-10 Thread via GitHub


codope commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1673285947


##
hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java:
##
@@ -187,196 +178,774 @@ private static Object convertJsonToAvroField(Object 
value, String name, Schema s
   throw new HoodieJsonToAvroConversionException(null, name, schema, 
shouldSanitize, invalidCharMask);
 }
 
-JsonToAvroFieldProcessor processor = 
FIELD_TYPE_PROCESSORS.get(schema.getType());
-if (null != processor) {
+return JsonToAvroFieldProcessorUtil.convertToAvro(value, name, schema, 
shouldSanitize, invalidCharMask);
+  }
+
+  private static class JsonToAvroFieldProcessorUtil {
+/**
+ * Base Class for converting json to avro fields.
+ */
+private abstract static class JsonToAvroFieldProcessor implements 
Serializable {
+
+  public Object convertToAvro(Object value, String name, Schema schema, 
boolean shouldSanitize, String invalidCharMask) {
+Pair res = convert(value, name, schema, 
shouldSanitize, invalidCharMask);
+if (!res.getLeft()) {
+  throw new HoodieJsonToAvroConversionException(value, name, schema, 
shouldSanitize, invalidCharMask);
+}
+return res.getRight();
+  }
+
+  protected abstract Pair convert(Object value, String 
name, Schema schema, boolean shouldSanitize, String invalidCharMask);
+}
+
+public static Object convertToAvro(Object value, String name, Schema 
schema, boolean shouldSanitize, String invalidCharMask) {
+  JsonToAvroFieldProcessor processor = getProcessorForSchema(schema);
   return processor.convertToAvro(value, name, schema, shouldSanitize, 
invalidCharMask);
 }
-throw new IllegalArgumentException("JsonConverter cannot handle type: " + 
schema.getType());
-  }
 
-  /**
-   * Base Class for converting json to avro fields.
-   */
-  private abstract static class JsonToAvroFieldProcessor implements 
Serializable {
+private static JsonToAvroFieldProcessor getProcessorForSchema(Schema 
schema) {
+  JsonToAvroFieldProcessor processor = null;
+
+  // 3 cases to consider: customized logicalType, logicalType, and type.
+  String customizedLogicalType = schema.getProp("logicalType");
+  LogicalType logicalType = schema.getLogicalType();
+  Type type = schema.getType();
+  if (customizedLogicalType != null && !customizedLogicalType.isEmpty()) {
+processor = 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS.get(customizedLogicalType);
+  } else if (logicalType != null) {
+processor = 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS.get(logicalType.getName());
+  } else {
+processor = AVRO_TYPE_FIELD_TYPE_PROCESSORS.get(type);
+  }
 
-public Object convertToAvro(Object value, String name, Schema schema, 
boolean shouldSanitize, String invalidCharMask) {
-  Pair res = convert(value, name, schema, shouldSanitize, 
invalidCharMask);
-  if (!res.getLeft()) {
-throw new HoodieJsonToAvroConversionException(value, name, schema, 
shouldSanitize, invalidCharMask);
+  if (processor == null) {
+throw new IllegalArgumentException(String.format("JsonConverter cannot 
handle type: %s", type));
   }
-  return res.getRight();
+  return processor;
 }
 
-protected abstract Pair convert(Object value, String 
name, Schema schema, boolean shouldSanitize, String invalidCharMask);
-  }
+// Avro primitive and complex type processors.
+private static final Map 
AVRO_TYPE_FIELD_TYPE_PROCESSORS = getFieldTypeProcessors();
+// Avro logical type processors.
+private static final Map 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS = getLogicalFieldTypeProcessors();
+
+/**
+ * Build type processor map for each avro type.
+ */
+private static Map 
getFieldTypeProcessors() {
+  Map fieldTypeProcessors = new 
EnumMap<>(Schema.Type.class);
+  fieldTypeProcessors.put(Type.STRING, generateStringTypeHandler());
+  fieldTypeProcessors.put(Type.BOOLEAN, generateBooleanTypeHandler());
+  fieldTypeProcessors.put(Type.DOUBLE, generateDoubleTypeHandler());
+  fieldTypeProcessors.put(Type.FLOAT, generateFloatTypeHandler());
+  fieldTypeProcessors.put(Type.INT, generateIntTypeHandler());
+  fieldTypeProcessors.put(Type.LONG, generateLongTypeHandler());
+  fieldTypeProcessors.put(Type.ARRAY, generateArrayTypeHandler());
+  fieldTypeProcessors.put(Type.RECORD, generateRecordTypeHandler());
+  fieldTypeProcessors.put(Type.ENUM, generateEnumTypeHandler());
+  fieldTypeProcessors.put(Type.MAP, generateMapTypeHandler());
+  fieldTypeProcessors.put(Type.BYTES, generateBytesTypeHandler());
+  fieldTypeProcessors.put(Type.FIXED, generateFixedTypeHandler());
+  return Collections.unmodifiableMap(fieldTypeProcessors);
+}
 
-  private static JsonToAvroFieldProcessor generateBooleanTypeHandler() {
-return new JsonToAvroFieldProcessor() {
+private

Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-08 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2214921106

   
   ## CI report:
   
   * 5cdb05c7d909edebbc74ccf75e26c07820272090 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24774)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-08 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2214800910

   
   ## CI report:
   
   * 5699a637e30e3fafd056ab25d668640c5c8700c0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24101)
 
   * 5cdb05c7d909edebbc74ccf75e26c07820272090 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24774)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-08 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2214788106

   
   ## CI report:
   
   * 5699a637e30e3fafd056ab25d668640c5c8700c0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24101)
 
   * 5cdb05c7d909edebbc74ccf75e26c07820272090 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-08 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1668961091


##
hudi-common/src/test/resources/date-type-invalid.avsc:
##


Review Comment:
   Thanks Sagar for the comments, it might not be a good idea given:
   - Each small is small, all they added up is non trivial amount of code. If 
we have all those raw file content in the test code, it affects the readability.
   - We can add them into a separate test util, yet it is still in the form of 
"external dependency". The only benefits is reducing test runtime as we don't 
need to read from files. This is negligible benefits which I'm not sure it 
really worth extra dev efforts given other high priority task we are on.
   - In file format it's more readable. Java just does not tackle will to make 
huge raw string easy for human read.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-08 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1668966899


##
hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java:
##
@@ -187,196 +178,774 @@ private static Object convertJsonToAvroField(Object 
value, String name, Schema s
   throw new HoodieJsonToAvroConversionException(null, name, schema, 
shouldSanitize, invalidCharMask);
 }
 
-JsonToAvroFieldProcessor processor = 
FIELD_TYPE_PROCESSORS.get(schema.getType());
-if (null != processor) {
+return JsonToAvroFieldProcessorUtil.convertToAvro(value, name, schema, 
shouldSanitize, invalidCharMask);
+  }
+
+  private static class JsonToAvroFieldProcessorUtil {
+/**
+ * Base Class for converting json to avro fields.
+ */
+private abstract static class JsonToAvroFieldProcessor implements 
Serializable {
+
+  public Object convertToAvro(Object value, String name, Schema schema, 
boolean shouldSanitize, String invalidCharMask) {
+Pair res = convert(value, name, schema, 
shouldSanitize, invalidCharMask);
+if (!res.getLeft()) {
+  throw new HoodieJsonToAvroConversionException(value, name, schema, 
shouldSanitize, invalidCharMask);
+}
+return res.getRight();
+  }
+
+  protected abstract Pair convert(Object value, String 
name, Schema schema, boolean shouldSanitize, String invalidCharMask);
+}
+
+public static Object convertToAvro(Object value, String name, Schema 
schema, boolean shouldSanitize, String invalidCharMask) {
+  JsonToAvroFieldProcessor processor = getProcessorForSchema(schema);
   return processor.convertToAvro(value, name, schema, shouldSanitize, 
invalidCharMask);
 }
-throw new IllegalArgumentException("JsonConverter cannot handle type: " + 
schema.getType());
-  }
 
-  /**
-   * Base Class for converting json to avro fields.
-   */
-  private abstract static class JsonToAvroFieldProcessor implements 
Serializable {
+private static JsonToAvroFieldProcessor getProcessorForSchema(Schema 
schema) {
+  JsonToAvroFieldProcessor processor = null;
+
+  // 3 cases to consider: customized logicalType, logicalType, and type.
+  String customizedLogicalType = schema.getProp("logicalType");
+  LogicalType logicalType = schema.getLogicalType();
+  Type type = schema.getType();
+  if (customizedLogicalType != null && !customizedLogicalType.isEmpty()) {
+processor = 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS.get(customizedLogicalType);
+  } else if (logicalType != null) {
+processor = 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS.get(logicalType.getName());
+  } else {
+processor = AVRO_TYPE_FIELD_TYPE_PROCESSORS.get(type);
+  }
 
-public Object convertToAvro(Object value, String name, Schema schema, 
boolean shouldSanitize, String invalidCharMask) {
-  Pair res = convert(value, name, schema, shouldSanitize, 
invalidCharMask);
-  if (!res.getLeft()) {
-throw new HoodieJsonToAvroConversionException(value, name, schema, 
shouldSanitize, invalidCharMask);
+  if (processor == null) {
+throw new IllegalArgumentException(String.format("JsonConverter cannot 
handle type: %s", type));
   }
-  return res.getRight();
+  return processor;
 }
 
-protected abstract Pair convert(Object value, String 
name, Schema schema, boolean shouldSanitize, String invalidCharMask);
-  }
+// Avro primitive and complex type processors.
+private static final Map 
AVRO_TYPE_FIELD_TYPE_PROCESSORS = getFieldTypeProcessors();
+// Avro logical type processors.
+private static final Map 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS = getLogicalFieldTypeProcessors();
+
+/**
+ * Build type processor map for each avro type.
+ */
+private static Map 
getFieldTypeProcessors() {
+  Map fieldTypeProcessors = new 
EnumMap<>(Schema.Type.class);
+  fieldTypeProcessors.put(Type.STRING, generateStringTypeHandler());
+  fieldTypeProcessors.put(Type.BOOLEAN, generateBooleanTypeHandler());
+  fieldTypeProcessors.put(Type.DOUBLE, generateDoubleTypeHandler());
+  fieldTypeProcessors.put(Type.FLOAT, generateFloatTypeHandler());
+  fieldTypeProcessors.put(Type.INT, generateIntTypeHandler());
+  fieldTypeProcessors.put(Type.LONG, generateLongTypeHandler());
+  fieldTypeProcessors.put(Type.ARRAY, generateArrayTypeHandler());
+  fieldTypeProcessors.put(Type.RECORD, generateRecordTypeHandler());
+  fieldTypeProcessors.put(Type.ENUM, generateEnumTypeHandler());
+  fieldTypeProcessors.put(Type.MAP, generateMapTypeHandler());
+  fieldTypeProcessors.put(Type.BYTES, generateBytesTypeHandler());
+  fieldTypeProcessors.put(Type.FIXED, generateFixedTypeHandler());
+  return Collections.unmodifiableMap(fieldTypeProcessors);
+}
 
-  private static JsonToAvroFieldProcessor generateBooleanTypeHandler() {
-return new JsonToAvroFieldProcessor() 

Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-07-08 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1668961091


##
hudi-common/src/test/resources/date-type-invalid.avsc:
##


Review Comment:
   Thanks Sagar for the comments, Given the code is in a ready to ship status, 
can I create a follow up task for such refactor? Mostly for prioritization 
purpose.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-06-16 Thread via GitHub


codope commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1641999059


##
hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java:
##
@@ -187,196 +178,774 @@ private static Object convertJsonToAvroField(Object 
value, String name, Schema s
   throw new HoodieJsonToAvroConversionException(null, name, schema, 
shouldSanitize, invalidCharMask);
 }
 
-JsonToAvroFieldProcessor processor = 
FIELD_TYPE_PROCESSORS.get(schema.getType());
-if (null != processor) {
+return JsonToAvroFieldProcessorUtil.convertToAvro(value, name, schema, 
shouldSanitize, invalidCharMask);
+  }
+
+  private static class JsonToAvroFieldProcessorUtil {
+/**
+ * Base Class for converting json to avro fields.
+ */
+private abstract static class JsonToAvroFieldProcessor implements 
Serializable {
+
+  public Object convertToAvro(Object value, String name, Schema schema, 
boolean shouldSanitize, String invalidCharMask) {
+Pair res = convert(value, name, schema, 
shouldSanitize, invalidCharMask);
+if (!res.getLeft()) {
+  throw new HoodieJsonToAvroConversionException(value, name, schema, 
shouldSanitize, invalidCharMask);
+}
+return res.getRight();
+  }
+
+  protected abstract Pair convert(Object value, String 
name, Schema schema, boolean shouldSanitize, String invalidCharMask);
+}
+
+public static Object convertToAvro(Object value, String name, Schema 
schema, boolean shouldSanitize, String invalidCharMask) {
+  JsonToAvroFieldProcessor processor = getProcessorForSchema(schema);
   return processor.convertToAvro(value, name, schema, shouldSanitize, 
invalidCharMask);
 }
-throw new IllegalArgumentException("JsonConverter cannot handle type: " + 
schema.getType());
-  }
 
-  /**
-   * Base Class for converting json to avro fields.
-   */
-  private abstract static class JsonToAvroFieldProcessor implements 
Serializable {
+private static JsonToAvroFieldProcessor getProcessorForSchema(Schema 
schema) {
+  JsonToAvroFieldProcessor processor = null;
+
+  // 3 cases to consider: customized logicalType, logicalType, and type.
+  String customizedLogicalType = schema.getProp("logicalType");
+  LogicalType logicalType = schema.getLogicalType();
+  Type type = schema.getType();
+  if (customizedLogicalType != null && !customizedLogicalType.isEmpty()) {
+processor = 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS.get(customizedLogicalType);
+  } else if (logicalType != null) {
+processor = 
AVRO_LOGICAL_TYPE_FIELD_PROCESSORS.get(logicalType.getName());
+  } else {
+processor = AVRO_TYPE_FIELD_TYPE_PROCESSORS.get(type);
+  }
 
-public Object convertToAvro(Object value, String name, Schema schema, 
boolean shouldSanitize, String invalidCharMask) {
-  Pair res = convert(value, name, schema, shouldSanitize, 
invalidCharMask);
-  if (!res.getLeft()) {
-throw new HoodieJsonToAvroConversionException(value, name, schema, 
shouldSanitize, invalidCharMask);
+  if (processor == null) {
+throw new IllegalArgumentException(String.format("JsonConverter cannot 
handle type: %s", type));

Review Comment:
   Use `ValidationUtils.checkArgument` or `checkState`



##
hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java:
##
@@ -187,196 +178,774 @@ private static Object convertJsonToAvroField(Object 
value, String name, Schema s
   throw new HoodieJsonToAvroConversionException(null, name, schema, 
shouldSanitize, invalidCharMask);
 }
 
-JsonToAvroFieldProcessor processor = 
FIELD_TYPE_PROCESSORS.get(schema.getType());
-if (null != processor) {
+return JsonToAvroFieldProcessorUtil.convertToAvro(value, name, schema, 
shouldSanitize, invalidCharMask);
+  }
+
+  private static class JsonToAvroFieldProcessorUtil {
+/**
+ * Base Class for converting json to avro fields.
+ */
+private abstract static class JsonToAvroFieldProcessor implements 
Serializable {
+
+  public Object convertToAvro(Object value, String name, Schema schema, 
boolean shouldSanitize, String invalidCharMask) {
+Pair res = convert(value, name, schema, 
shouldSanitize, invalidCharMask);
+if (!res.getLeft()) {
+  throw new HoodieJsonToAvroConversionException(value, name, schema, 
shouldSanitize, invalidCharMask);
+}
+return res.getRight();
+  }
+
+  protected abstract Pair convert(Object value, String 
name, Schema schema, boolean shouldSanitize, String invalidCharMask);
+}
+
+public static Object convertToAvro(Object value, String name, Schema 
schema, boolean shouldSanitize, String invalidCharMask) {
+  JsonToAvroFieldProcessor processor = getProcessorForSchema(schema);
   return processor.convertToAvro(value, name, schema, shouldSanitize, 
invalidCharMask);
 }
-throw new IllegalArgumentException("

Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-28 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2136320701

   
   ## CI report:
   
   * 5699a637e30e3fafd056ab25d668640c5c8700c0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24101)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-28 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2136127128

   
   ## CI report:
   
   * 134dda251bbfac3cc092ec3f64b3cf3c8b17e2e1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24017)
 
   * 5699a637e30e3fafd056ab25d668640c5c8700c0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24101)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-28 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2136102419

   
   ## CI report:
   
   * 134dda251bbfac3cc092ec3f64b3cf3c8b17e2e1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24017)
 
   * 5699a637e30e3fafd056ab25d668640c5c8700c0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-28 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2136082128

   
   ## CI report:
   
   * 134dda251bbfac3cc092ec3f64b3cf3c8b17e2e1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24017)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-24 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1614082202


##
hudi-common/src/main/java/org/apache/hudi/avro/AvroLogicalTypeEnum.java:
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.avro;
+
+public enum AvroLogicalTypeEnum {

Review Comment:
   addressed offline, Ethan is on the same page that we create our own Enum as 
Avro does not have such thing published for consumption.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-21 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1608950828


##
hudi-common/src/main/java/org/apache/hudi/avro/AvroLogicalTypeEnum.java:
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.avro;
+
+public enum AvroLogicalTypeEnum {

Review Comment:
   > It's still good to use contants from LogicalTypes here.
   
   The only available constant is the java class itself from Avro side. If no 
other alternative is suggested here,
   would you please expand on the benefit of using java class as parameter, 
which is not suggested given all the concerns listed here 
https://www.perplexity.ai/search/whats-the-caveats-E1jM2RV9Rj.KlAcZjQ0ZPQ#0
   
   Java class as variables are very inconvenient to handle, expensive and 
dangerous. 
   
   > Could you add some docs and link
   Done
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-21 Thread via GitHub


yihua commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1608917941


##
hudi-common/src/main/java/org/apache/hudi/avro/AvroLogicalTypeEnum.java:
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.avro;
+
+public enum AvroLogicalTypeEnum {

Review Comment:
   It's still good to use contants from `LogicalTypes` here.
   
   > There are some logical types that the avro official doc claims have 
support (for example the uuid), while the avro lib of the same version does not 
have such logical type defined.
   
   Could you add some docs and link to the spec on what logical types are 
supported in which versions?  Hudi uses different versions of Avro depending on 
Spark and Flink versions.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-21 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1608799784


##
hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java:
##
@@ -187,196 +178,774 @@ private static Object convertJsonToAvroField(Object 
value, String name, Schema s
   throw new HoodieJsonToAvroConversionException(null, name, schema, 
shouldSanitize, invalidCharMask);
 }
 
-JsonToAvroFieldProcessor processor = 
FIELD_TYPE_PROCESSORS.get(schema.getType());
-if (null != processor) {
+return JsonToAvroFieldProcessorUtil.convertToAvro(value, name, schema, 
shouldSanitize, invalidCharMask);
+  }
+
+  private static class JsonToAvroFieldProcessorUtil {

Review Comment:
   
   All code is written by myself, except some part is code refactoring of some 
existing class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-21 Thread via GitHub


hudi-bot commented on PR #11265:
URL: https://github.com/apache/hudi/pull/11265#issuecomment-2123301071

   
   ## CI report:
   
   * c284b2fce8c40e3c5b7448a6cf2f47f3b5ae50c4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24007)
 
   * 134dda251bbfac3cc092ec3f64b3cf3c8b17e2e1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-21 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1608801275


##
hudi-common/src/test/java/org/apache/hudi/avro/TestMercifulJsonConverter.java:
##
@@ -55,6 +70,649 @@ public void basicConversion() throws IOException {
 Assertions.assertEquals(rec, CONVERTER.convert(json, simpleSchema));
   }
 
+  private static final String DECIMAL_AVRO_FILE_INVALID_PATH = 
"/decimal-logical-type-invalid.avsc";
+  private static final String DECIMAL_AVRO_FILE_PATH = 
"/decimal-logical-type.avsc";
+  private static final String DECIMAL_FIXED_AVRO_FILE_PATH = 
"/decimal-logical-type-fixed-type.avsc";
+  /**
+   * Covered case:
+   * Avro Logical Type: Decimal
+   * Exhaustive unsupported input coverage.
+   */
+  @ParameterizedTest
+  @MethodSource("decimalBadCases")
+  void decimalLogicalTypeInvalidCaseTest(String avroFile, String strInput, 
Double numInput,

Review Comment:
   Yes, @the-other-tim-brown has confirmed on the type of logical types that we 
support. Test achieves at least 95% of branch coverage
   
   For the code porting, it is replied in the previous thread.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-21 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1608799784


##
hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java:
##
@@ -187,196 +178,774 @@ private static Object convertJsonToAvroField(Object 
value, String name, Schema s
   throw new HoodieJsonToAvroConversionException(null, name, schema, 
shouldSanitize, invalidCharMask);
 }
 
-JsonToAvroFieldProcessor processor = 
FIELD_TYPE_PROCESSORS.get(schema.getType());
-if (null != processor) {
+return JsonToAvroFieldProcessorUtil.convertToAvro(value, name, schema, 
shouldSanitize, invalidCharMask);
+  }
+
+  private static class JsonToAvroFieldProcessorUtil {

Review Comment:
   All the code is from the same code change that are pending rolling out to 
hudi internal
   
   hudi internal PR 
https://github.com/onehouseinc/hudi-internal/pull/697/commits/115dcef8d681bdfd4cc1f343619669600e912623
   
   All code is written by myself, except some part is code refactoring of some 
existing class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-21 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1608797800


##
hudi-common/src/main/java/org/apache/hudi/avro/AvroLogicalTypeEnum.java:
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.avro;
+
+public enum AvroLogicalTypeEnum {

Review Comment:
   added comments
   
   It is not a good idea to use the LogicalTypes because:
   1. There is no enum/const string from avro side that give us an overview of 
all logical types that avro support by default. The LogicalTypes are java 
class, which are not a good choice to use class as parameters and pass it 
everywhere.
   2. There are some logical types that the avro official doc claims have 
support (for example the uuid), while the avro lib of the same version does not 
have such logical type defined.
   3. Logical types are intended for extension from client side. In the future 
if new customized logical type to be added, the enum approach is more friendly 
to be extended.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-21 Thread via GitHub


yihua commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1608752328


##
hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java:
##
@@ -187,196 +178,774 @@ private static Object convertJsonToAvroField(Object 
value, String name, Schema s
   throw new HoodieJsonToAvroConversionException(null, name, schema, 
shouldSanitize, invalidCharMask);
 }
 
-JsonToAvroFieldProcessor processor = 
FIELD_TYPE_PROCESSORS.get(schema.getType());
-if (null != processor) {
+return JsonToAvroFieldProcessorUtil.convertToAvro(value, name, schema, 
shouldSanitize, invalidCharMask);
+  }
+
+  private static class JsonToAvroFieldProcessorUtil {

Review Comment:
   What's the reason of porting the code?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7774] Add Avro Logical type support for Merciful Java convertor [hudi]

2024-05-21 Thread via GitHub


yihua commented on code in PR #11265:
URL: https://github.com/apache/hudi/pull/11265#discussion_r1608735139


##
hudi-common/src/main/java/org/apache/hudi/avro/AvroLogicalTypeEnum.java:
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.avro;
+
+public enum AvroLogicalTypeEnum {

Review Comment:
   nit: Javadocs.  Does this work with all Avro spec versions?



##
hudi-common/src/main/java/org/apache/hudi/avro/AvroLogicalTypeEnum.java:
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.avro;
+
+public enum AvroLogicalTypeEnum {

Review Comment:
   Also, Avro library defines the `LogicalTypes`.  Can we use that instead of 
creating own enums?
   
https://github.com/apache/avro/blob/main/lang/java/avro/src/main/java/org/apache/avro/LogicalTypes.java



##
hudi-common/src/main/java/org/apache/hudi/avro/MercifulJsonConverter.java:
##
@@ -187,196 +178,774 @@ private static Object convertJsonToAvroField(Object 
value, String name, Schema s
   throw new HoodieJsonToAvroConversionException(null, name, schema, 
shouldSanitize, invalidCharMask);
 }
 
-JsonToAvroFieldProcessor processor = 
FIELD_TYPE_PROCESSORS.get(schema.getType());
-if (null != processor) {
+return JsonToAvroFieldProcessorUtil.convertToAvro(value, name, schema, 
shouldSanitize, invalidCharMask);
+  }
+
+  private static class JsonToAvroFieldProcessorUtil {

Review Comment:
   Are you porting the logic from another library?  If so, could you reference 
the source?



##
hudi-common/src/test/java/org/apache/hudi/avro/TestMercifulJsonConverter.java:
##
@@ -55,6 +70,649 @@ public void basicConversion() throws IOException {
 Assertions.assertEquals(rec, CONVERTER.convert(json, simpleSchema));
   }
 
+  private static final String DECIMAL_AVRO_FILE_INVALID_PATH = 
"/decimal-logical-type-invalid.avsc";
+  private static final String DECIMAL_AVRO_FILE_PATH = 
"/decimal-logical-type.avsc";
+  private static final String DECIMAL_FIXED_AVRO_FILE_PATH = 
"/decimal-logical-type-fixed-type.avsc";
+  /**
+   * Covered case:
+   * Avro Logical Type: Decimal
+   * Exhaustive unsupported input coverage.
+   */
+  @ParameterizedTest
+  @MethodSource("decimalBadCases")
+  void decimalLogicalTypeInvalidCaseTest(String avroFile, String strInput, 
Double numInput,

Review Comment:
   Similar for the tests on code porting.  And do these cover all cases of 
logical types?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org