[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-06-05 Thread GitBox


nsivabalan commented on a change in pull request #2923:
URL: https://github.com/apache/hudi/pull/2923#discussion_r645985325



##
File path: 
hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java
##
@@ -380,12 +380,12 @@ public void testUpsertWithDelete() throws Exception {
   @Test
   public void testInsertWithMiniBatches() throws Exception {
 // reset the config option
-conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.0006); // 630 bytes batch 
size
+conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.00075); // 786 bytes batch 
size

Review comment:
   Also, nishith left a comment else where about backwards compatability. 
Can you check that out as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-06-05 Thread GitBox


nsivabalan commented on a change in pull request #2923:
URL: https://github.com/apache/hudi/pull/2923#discussion_r645984981



##
File path: 
hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java
##
@@ -380,12 +380,12 @@ public void testUpsertWithDelete() throws Exception {
   @Test
   public void testInsertWithMiniBatches() throws Exception {
 // reset the config option
-conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.0006); // 630 bytes batch 
size
+conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.00075); // 786 bytes batch 
size

Review comment:
   @danny0405 @leesf @yanghua : hey folks. Can one of you check why these 
tests are failing for this patch and why the fix is required. I also tried 
locally and it is failing w/ this patch, but don't have lot of context around 
tests. I expected these tests should not be affected by the changes in this 
patch. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-05-31 Thread GitBox


nsivabalan commented on a change in pull request #2923:
URL: https://github.com/apache/hudi/pull/2923#discussion_r642433833



##
File path: 
hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java
##
@@ -380,12 +380,12 @@ public void testUpsertWithDelete() throws Exception {
   @Test
   public void testInsertWithMiniBatches() throws Exception {
 // reset the config option
-conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.0006); // 630 bytes batch 
size
+conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.00075); // 786 bytes batch 
size

Review comment:
   hmmm, strange. I see quite a few PRs shows green. so unlikely this fails 
consistently. also, if there are any such unrelated failures, lets isolate them 
to separate PR. I will try to reproduce the issue locally in master as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-05-30 Thread GitBox


nsivabalan commented on a change in pull request #2923:
URL: https://github.com/apache/hudi/pull/2923#discussion_r642192631



##
File path: 
hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java
##
@@ -380,12 +380,12 @@ public void testUpsertWithDelete() throws Exception {
   @Test
   public void testInsertWithMiniBatches() throws Exception {
 // reset the config option
-conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.0006); // 630 bytes batch 
size
+conf.setDouble(FlinkOptions.WRITE_BATCH_SIZE, 0.00075); // 786 bytes batch 
size

Review comment:
   are these the tests you were mentioning were failing in CI? ideally 
these tests should not be impacted by this patch right? 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java
##
@@ -62,7 +62,7 @@
 } else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) {
   return "";
 } else {
-  return kvArray[1];
+  return String.join(":", Arrays.copyOfRange(kvArray, 1, 
kvArray.length));

Review comment:
   sounds good. 
   CC @n3nash




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-05-25 Thread GitBox


nsivabalan commented on a change in pull request #2923:
URL: https://github.com/apache/hudi/pull/2923#discussion_r638867446



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java
##
@@ -62,7 +62,7 @@
 } else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) {
   return "";
 } else {
-  return kvArray[1];
+  return String.join(":", Arrays.copyOfRange(kvArray, 1, 
kvArray.length));

Review comment:
   can you help me explain why we need this fix




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-05-23 Thread GitBox


nsivabalan commented on a change in pull request #2923:
URL: https://github.com/apache/hudi/pull/2923#discussion_r637669259



##
File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
##
@@ -485,6 +488,10 @@ private static Object 
convertValueForAvroLogicalTypes(Schema fieldSchema, Object
 return decimalConversion.fromBytes((ByteBuffer) fieldValue, 
fieldSchema,
 LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
   }
+} else if (fieldSchema.getLogicalType() == LogicalTypes.timestampMicros()) 
{
+  return Instant.EPOCH.plus(Long.parseLong(fieldValue.toString()), 
ChronoUnit.MICROS).atZone(ZoneId.systemDefault()).toLocalDateTime();
+} else if (fieldSchema.getLogicalType() == LogicalTypes.timestampMillis()) 
{

Review comment:
   thanks. looks like we don't have coverage for 
LogicalTypes.timestampMicros(). have left a comment below. 

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestTimestampBasedKeyGenerator.java
##
@@ -378,4 +382,46 @@ public void 
test_ExpectsMatch_MultipleInputFormats_ShortDate_OutputCustomDate()
 baseRow = genericRecordToRow(baseRecord);
 assertEquals("04/01/2020", keyGen.getPartitionPath(baseRow));
   }
+
+  @Test
+  public void test_ExpectsMatch_LogicalType_Date() throws IOException {
+LocalDate today = LocalDate.now();
+baseRecord.put("dob", (int) today.toEpochDay());
+properties = this.getBaseKeyConfig(
+"DATE",
+"-MM-dd'T'HH:mm:ssZ,-MM-dd'T'HH:mm:ss.SSSZ,MMdd",
+"",
+null,
+"-MM-dd",
+DateTimeZone.getDefault().getID());
+
+properties.setProperty(KeyGeneratorOptions.PARTITIONPATH_FIELD_OPT_KEY, 
"dob");
+BuiltinKeyGenerator keyGen = new TimestampBasedKeyGenerator(properties);
+HoodieKey hk1 = keyGen.getKey(baseRecord);
+Assertions.assertEquals(today.toString(), hk1.getPartitionPath());
+
+baseRow = genericRecordToRow(baseRecord);
+assertEquals(today.toString(), keyGen.getPartitionPath(baseRow));
+  }
+
+  @Test
+  public void test_ExpectsMatch_LogicalType_Timestamp() throws IOException {

Review comment:
   thanks for clarifying

##
File path: 
hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java
##
@@ -79,6 +82,18 @@
   + "{\"name\": \"nullable_field\",\"type\": [\"null\" 
,\"string\"],\"default\": null},"
   + "{\"name\": \"non_nullable_field_with_default\",\"type\": \"string\", 
\"default\": \"dummy\"}]}";
 
+  private static String SCHEMA_WITH_LOGICAL_TYPES = "{\n"
+  + "  \"namespace\": \"example.avro\",\n"
+  + "  \"type\": \"record\",\n"
+  + "  \"name\": \"User\",\n"
+  + "  \"fields\": [\n"
+  + "{\"name\": \"field1\", \"type\": [\"null\", \"string\"], 
\"default\": null},\n"
+  + "{\"name\": \"createTime\", \"type\": [\"null\", \"long\"], 
\"default\": null},\n"
+  + "{\"name\": \"dob\", \"type\": {\"type\": \"int\" 
,\"logicalType\": \"date\"}},\n"
+  + "{\"name\": \"updatedAt\", \"type\": {\"type\": \"long\", 
\"logicalType\": \"timestamp-millis\"}}\n"

Review comment:
   Can you add a field for timestamp-micros as well and enhance 
testGetNestedFieldValForLogicalTypes for the same. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2923: [HUDI-1864] Added support for Date, Timestamp, LocalDate and LocalDateTime in TimestampBasedAvroKeyGenerator

2021-05-20 Thread GitBox


nsivabalan commented on a change in pull request #2923:
URL: https://github.com/apache/hudi/pull/2923#discussion_r636207743



##
File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
##
@@ -485,6 +488,10 @@ private static Object 
convertValueForAvroLogicalTypes(Schema fieldSchema, Object
 return decimalConversion.fromBytes((ByteBuffer) fieldValue, 
fieldSchema,
 LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
   }
+} else if (fieldSchema.getLogicalType() == LogicalTypes.timestampMicros()) 
{
+  return Instant.EPOCH.plus(Long.parseLong(fieldValue.toString()), 
ChronoUnit.MICROS).atZone(ZoneId.systemDefault()).toLocalDateTime();
+} else if (fieldSchema.getLogicalType() == LogicalTypes.timestampMillis()) 
{

Review comment:
   can you please point me to testcase which is covering this.

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestTimestampBasedKeyGenerator.java
##
@@ -378,4 +382,46 @@ public void 
test_ExpectsMatch_MultipleInputFormats_ShortDate_OutputCustomDate()
 baseRow = genericRecordToRow(baseRecord);
 assertEquals("04/01/2020", keyGen.getPartitionPath(baseRow));
   }
+
+  @Test
+  public void test_ExpectsMatch_LogicalType_Date() throws IOException {
+LocalDate today = LocalDate.now();
+baseRecord.put("dob", (int) today.toEpochDay());
+properties = this.getBaseKeyConfig(
+"DATE",
+"-MM-dd'T'HH:mm:ssZ,-MM-dd'T'HH:mm:ss.SSSZ,MMdd",
+"",
+null,
+"-MM-dd",
+DateTimeZone.getDefault().getID());
+
+properties.setProperty(KeyGeneratorOptions.PARTITIONPATH_FIELD_OPT_KEY, 
"dob");
+BuiltinKeyGenerator keyGen = new TimestampBasedKeyGenerator(properties);
+HoodieKey hk1 = keyGen.getKey(baseRecord);
+Assertions.assertEquals(today.toString(), hk1.getPartitionPath());
+
+baseRow = genericRecordToRow(baseRecord);
+assertEquals(today.toString(), keyGen.getPartitionPath(baseRow));
+  }
+
+  @Test
+  public void test_ExpectsMatch_LogicalType_Timestamp() throws IOException {

Review comment:
   I see we have added LocalDate, LocalDatetime, Date and Timestamp support 
to source code, but tests are added only for 2 of these. can we please add for 
rest as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org