[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator
nsivabalan commented on a change in pull request #2923: URL: https://github.com/apache/hudi/pull/2923#discussion_r645985325 ## File path: hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java ## @@ -380,12 +380,12 @@ public void testUpsertWithDelete() throws Exception { @Test public void testInsertWithMiniBatches() throws Exception { // reset the config option -conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.0006); // 630 bytes batch size +conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.00075); // 786 bytes batch size Review comment: Also, nishith left a comment else where about backwards compatability. Can you check that out as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator
nsivabalan commented on a change in pull request #2923: URL: https://github.com/apache/hudi/pull/2923#discussion_r645984981 ## File path: hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java ## @@ -380,12 +380,12 @@ public void testUpsertWithDelete() throws Exception { @Test public void testInsertWithMiniBatches() throws Exception { // reset the config option -conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.0006); // 630 bytes batch size +conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.00075); // 786 bytes batch size Review comment: @danny0405 @leesf @yanghua : hey folks. Can one of you check why these tests are failing for this patch and why the fix is required. I also tried locally and it is failing w/ this patch, but don't have lot of context around tests. I expected these tests should not be affected by the changes in this patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator
nsivabalan commented on a change in pull request #2923: URL: https://github.com/apache/hudi/pull/2923#discussion_r642433833 ## File path: hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java ## @@ -380,12 +380,12 @@ public void testUpsertWithDelete() throws Exception { @Test public void testInsertWithMiniBatches() throws Exception { // reset the config option -conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.0006); // 630 bytes batch size +conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.00075); // 786 bytes batch size Review comment: hmmm, strange. I see quite a few PRs shows green. so unlikely this fails consistently. also, if there are any such unrelated failures, lets isolate them to separate PR. I will try to reproduce the issue locally in master as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator
nsivabalan commented on a change in pull request #2923: URL: https://github.com/apache/hudi/pull/2923#discussion_r642192631 ## File path: hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java ## @@ -380,12 +380,12 @@ public void testUpsertWithDelete() throws Exception { @Test public void testInsertWithMiniBatches() throws Exception { // reset the config option -conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.0006); // 630 bytes batch size +conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.00075); // 786 bytes batch size Review comment: are these the tests you were mentioning were failing in CI? ideally these tests should not be impacted by this patch right? ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java ## @@ -62,7 +62,7 @@ } else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) { return ""; } else { - return kvArray[1]; + return String.join(":", Arrays.copyOfRange(kvArray, 1, kvArray.length)); Review comment: sounds good. CC @n3nash -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator
nsivabalan commented on a change in pull request #2923: URL: https://github.com/apache/hudi/pull/2923#discussion_r638867446 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java ## @@ -62,7 +62,7 @@ } else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) { return ""; } else { - return kvArray[1]; + return String.join(":", Arrays.copyOfRange(kvArray, 1, kvArray.length)); Review comment: can you help me explain why we need this fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator
nsivabalan commented on a change in pull request #2923: URL: https://github.com/apache/hudi/pull/2923#discussion_r637669259 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -485,6 +488,10 @@ private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, Object return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema, LogicalTypes.decimal(dc.getPrecision(), dc.getScale())); } +} else if (fieldSchema.getLogicalType() == LogicalTypes.timestampMicros()) { + return Instant.EPOCH.plus(Long.parseLong(fieldValue.toString()), ChronoUnit.MICROS).atZone(ZoneId.systemDefault()).toLocalDateTime(); +} else if (fieldSchema.getLogicalType() == LogicalTypes.timestampMillis()) { Review comment: thanks. looks like we don't have coverage for LogicalTypes.timestampMicros(). have left a comment below. ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestTimestampBasedKeyGenerator.java ## @@ -378,4 +382,46 @@ public void test_ExpectsMatch_MultipleInputFormats_ShortDate_OutputCustomDate() baseRow = genericRecordToRow(baseRecord); assertEquals("04/01/2020", keyGen.getPartitionPath(baseRow)); } + + @Test + public void test_ExpectsMatch_LogicalType_Date() throws IOException { +LocalDate today = LocalDate.now(); +baseRecord.put("dob", (int) today.toEpochDay()); +properties = this.getBaseKeyConfig( +"DATE", +"-MM-dd'T'HH:mm:ssZ,-MM-dd'T'HH:mm:ss.SSSZ,MMdd", +"", +null, +"-MM-dd", +DateTimeZone.getDefault().getID()); + +properties.setProperty(KeyGeneratorOptions.PARTITIONPATH_FIELD_OPT_KEY, "dob"); +BuiltinKeyGenerator keyGen = new TimestampBasedKeyGenerator(properties); +HoodieKey hk1 = keyGen.getKey(baseRecord); +Assertions.assertEquals(today.toString(), hk1.getPartitionPath()); + +baseRow = genericRecordToRow(baseRecord); +assertEquals(today.toString(), keyGen.getPartitionPath(baseRow)); + } + + @Test + public void test_ExpectsMatch_LogicalType_Timestamp() throws IOException { Review comment: thanks for clarifying ## File path: hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java ## @@ -79,6 +82,18 @@ + "{\"name\": \"nullable_field\",\"type\": [\"null\" ,\"string\"],\"default\": null}," + "{\"name\": \"non_nullable_field_with_default\",\"type\": \"string\", \"default\": \"dummy\"}]}"; + private static String SCHEMA_WITH_LOGICAL_TYPES = "{\n" + + " \"namespace\": \"example.avro\",\n" + + " \"type\": \"record\",\n" + + " \"name\": \"User\",\n" + + " \"fields\": [\n" + + "{\"name\": \"field1\", \"type\": [\"null\", \"string\"], \"default\": null},\n" + + "{\"name\": \"createTime\", \"type\": [\"null\", \"long\"], \"default\": null},\n" + + "{\"name\": \"dob\", \"type\": {\"type\": \"int\" ,\"logicalType\": \"date\"}},\n" + + "{\"name\": \"updatedAt\", \"type\": {\"type\": \"long\", \"logicalType\": \"timestamp-millis\"}}\n" Review comment: Can you add a field for timestamp-micros as well and enhance testGetNestedFieldValForLogicalTypes for the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator
nsivabalan commented on a change in pull request #2923: URL: https://github.com/apache/hudi/pull/2923#discussion_r636207743 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -485,6 +488,10 @@ private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, Object return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema, LogicalTypes.decimal(dc.getPrecision(), dc.getScale())); } +} else if (fieldSchema.getLogicalType() == LogicalTypes.timestampMicros()) { + return Instant.EPOCH.plus(Long.parseLong(fieldValue.toString()), ChronoUnit.MICROS).atZone(ZoneId.systemDefault()).toLocalDateTime(); +} else if (fieldSchema.getLogicalType() == LogicalTypes.timestampMillis()) { Review comment: can you please point me to testcase which is covering this. ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestTimestampBasedKeyGenerator.java ## @@ -378,4 +382,46 @@ public void test_ExpectsMatch_MultipleInputFormats_ShortDate_OutputCustomDate() baseRow = genericRecordToRow(baseRecord); assertEquals("04/01/2020", keyGen.getPartitionPath(baseRow)); } + + @Test + public void test_ExpectsMatch_LogicalType_Date() throws IOException { +LocalDate today = LocalDate.now(); +baseRecord.put("dob", (int) today.toEpochDay()); +properties = this.getBaseKeyConfig( +"DATE", +"-MM-dd'T'HH:mm:ssZ,-MM-dd'T'HH:mm:ss.SSSZ,MMdd", +"", +null, +"-MM-dd", +DateTimeZone.getDefault().getID()); + +properties.setProperty(KeyGeneratorOptions.PARTITIONPATH_FIELD_OPT_KEY, "dob"); +BuiltinKeyGenerator keyGen = new TimestampBasedKeyGenerator(properties); +HoodieKey hk1 = keyGen.getKey(baseRecord); +Assertions.assertEquals(today.toString(), hk1.getPartitionPath()); + +baseRow = genericRecordToRow(baseRecord); +assertEquals(today.toString(), keyGen.getPartitionPath(baseRow)); + } + + @Test + public void test_ExpectsMatch_LogicalType_Timestamp() throws IOException { Review comment: I see we have added LocalDate, LocalDatetime, Date and Timestamp support to source code, but tests are added only for 2 of these. can we please add for rest as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org