[GitHub] [hudi] slfan1989 commented on pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.

2023-04-11 Thread via GitHub


slfan1989 commented on PR #8435:
URL: https://github.com/apache/hudi/pull/8435#issuecomment-1504700781

   @danny0405 Can you help review this pr? Thank you very much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8299: [HUDI-5990]Avoid missing data during incremental queries

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8299:
URL: https://github.com/apache/hudi/pull/8299#issuecomment-1504676930

   
   ## CI report:
   
   * 71963bcf055f63179dfdcc235478aff8487bd328 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15955)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15988)
 
   * cbb28bbdfc0434564d7ddd363bb42405b9771ed1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16278)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8299: [HUDI-5990]Avoid missing data during incremental queries

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8299:
URL: https://github.com/apache/hudi/pull/8299#issuecomment-1504669117

   
   ## CI report:
   
   * 71963bcf055f63179dfdcc235478aff8487bd328 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15955)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15988)
 
   * cbb28bbdfc0434564d7ddd363bb42405b9771ed1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Mulavar commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-11 Thread via GitHub


Mulavar commented on PR #8385:
URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504603719

   > @Mulavar : This requires a table version change also as we need to create 
.aux files when we downgrade to older version. Can you add relevant 
UpgradeDowngrade handlers for this.
   
   @bvaradar thanks, done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8335: [HUDI-6009] Let the jetty server in TimelineService create daemon threads

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8335:
URL: https://github.com/apache/hudi/pull/8335#issuecomment-1504598719

   
   ## CI report:
   
   * 9bcbb85e4b2bb803e03900b8f01c938833bb1185 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16108)
 
   * 919882e2014728df9d3299fd239c250bf166608c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16277)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8335: [HUDI-6009] Let the jetty server in TimelineService create daemon threads

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8335:
URL: https://github.com/apache/hudi/pull/8335#issuecomment-1504590323

   
   ## CI report:
   
   * 9bcbb85e4b2bb803e03900b8f01c938833bb1185 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16108)
 
   * 919882e2014728df9d3299fd239c250bf166608c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8385:
URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504582669

   
   ## CI report:
   
   * 19ff36f7635f289b752b46cf692014b22f4b9ab8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16254)
 
   * 3874447e48c21cb336f28625e1682b8f229f623c UNKNOWN
   * 768ffaabf5934199e1afa1c0b6b37f9bb665b989 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16276)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

2023-04-11 Thread via GitHub


danny0405 commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1163570948


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##
@@ -61,13 +61,8 @@ public JavaRDD> 
repartitionRecords(JavaRDD> reco
 final boolean consistentLogicalTimestampEnabled = 
this.consistentLogicalTimestampEnabled;
 return records.sortBy(
 record -> {
-  Object recordValue = record.getColumnValues(schema.get(), 
sortColumns, consistentLogicalTimestampEnabled);
-  // null values are replaced with empty string for null_first order
-  if (recordValue == null) {
-return StringUtils.EMPTY_STRING;
-  } else {
-return StringUtils.objToString(recordValue);
-  }
+  Object[] columnValues = record.getColumnValues(schema.get(), 
sortColumns, consistentLogicalTimestampEnabled);
+  return FlatLists.ofComparableArray(columnValues);

Review Comment:
   We should fix `JavaCustomColumnsSortPartitioner` too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

2023-04-11 Thread via GitHub


danny0405 commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1163569337


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##
@@ -61,13 +61,8 @@ public JavaRDD> 
repartitionRecords(JavaRDD> reco
 final boolean consistentLogicalTimestampEnabled = 
this.consistentLogicalTimestampEnabled;
 return records.sortBy(
 record -> {
-  Object recordValue = record.getColumnValues(schema.get(), 
sortColumns, consistentLogicalTimestampEnabled);
-  // null values are replaced with empty string for null_first order
-  if (recordValue == null) {
-return StringUtils.EMPTY_STRING;
-  } else {
-return StringUtils.objToString(recordValue);
-  }
+  Object[] columnValues = record.getColumnValues(schema.get(), 
sortColumns, consistentLogicalTimestampEnabled);
+  return FlatLists.ofComparableArray(columnValues);

Review Comment:
   The default behavior is null_last, the original comment is wrong, it 
returned empty string for nulls, empty string should be always smaller than non 
empty strings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8385:
URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504537773

   
   ## CI report:
   
   * 19ff36f7635f289b752b46cf692014b22f4b9ab8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16254)
 
   * 3874447e48c21cb336f28625e1682b8f229f623c UNKNOWN
   * 768ffaabf5934199e1afa1c0b6b37f9bb665b989 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8435:
URL: https://github.com/apache/hudi/pull/8435#issuecomment-1504529199

   
   ## CI report:
   
   * fda3847c439a2d889bc29c6511ce26bdd922d13c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16274)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8429:
URL: https://github.com/apache/hudi/pull/8429#issuecomment-1504529115

   
   ## CI report:
   
   * 18f438577f444c75e8060a20b7fdf59e40e9ab7e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16262)
 
   * acd347c5e6cd019ee98b7f1fa435b95153e71238 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16273)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8385:
URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504528832

   
   ## CI report:
   
   * 19ff36f7635f289b752b46cf692014b22f4b9ab8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16254)
 
   * 3874447e48c21cb336f28625e1682b8f229f623c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8338: [HUDI-5996] Verify the consistency of bucket num at job sta…

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8338:
URL: https://github.com/apache/hudi/pull/8338#issuecomment-1504528612

   
   ## CI report:
   
   * fccdb147c249b08d856819e028986d76603828e9 UNKNOWN
   * 7abfb144f1c76d65cf00115d5bfa9a82b1a28846 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16253)
 
   * 86a201c099bbc2016d53d987aa481b779464c9c2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16272)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #2701: [HUDI 1623] New Hoodie Instant on disk format with end time and milliseconds granularity

2023-04-11 Thread via GitHub


danny0405 commented on PR #2701:
URL: https://github.com/apache/hudi/pull/2701#issuecomment-1504527704

   A valuable PR especially for the use case in incremental style: incremental 
streaming read, incremental cleaning, incremental meta sync, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #7827: [SUPPORT]Errors are thrown when querying rt table of no deltalogs count(1)/count(*) by presto

2023-04-11 Thread via GitHub


ad1happy2go commented on issue #7827:
URL: https://github.com/apache/hudi/issues/7827#issuecomment-1504522724

   @silencily Can you please confirm if upgrading to newer presto version fixed 
your issue? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8429:
URL: https://github.com/apache/hudi/pull/8429#issuecomment-1504521437

   
   ## CI report:
   
   * 18f438577f444c75e8060a20b7fdf59e40e9ab7e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16262)
 
   * acd347c5e6cd019ee98b7f1fa435b95153e71238 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8435:
URL: https://github.com/apache/hudi/pull/8435#issuecomment-1504521522

   
   ## CI report:
   
   * fda3847c439a2d889bc29c6511ce26bdd922d13c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8338: [HUDI-5996] Verify the consistency of bucket num at job sta…

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8338:
URL: https://github.com/apache/hudi/pull/8338#issuecomment-1504521063

   
   ## CI report:
   
   * fccdb147c249b08d856819e028986d76603828e9 UNKNOWN
   * 7abfb144f1c76d65cf00115d5bfa9a82b1a28846 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16253)
 
   * 86a201c099bbc2016d53d987aa481b779464c9c2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8144: [SUPPORT]Unable to connect to an s3 hudi table

2023-04-11 Thread via GitHub


ad1happy2go commented on issue #8144:
URL: https://github.com/apache/hudi/issues/8144#issuecomment-1504517165

   @peter-mccabe 
   Are you still facing this issue? If yes, can you share complete stack trace 
of the error? 
   Are you setting up S3 keys properly in Hadoop fs configuration to connect to 
S3. (props like - fs.s3a.access.key)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8434:
URL: https://github.com/apache/hudi/pull/8434#issuecomment-1504513257

   
   ## CI report:
   
   * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16269)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on pull request #7143: [HUDI-5175] Improving FileIndex load performance in PARALLELISM mode

2023-04-11 Thread via GitHub


zhangyue19921010 commented on PR #7143:
URL: https://github.com/apache/hudi/pull/7143#issuecomment-1504508775

   Hey Hey! @bvaradar Sorry for missing this PR. And Appreciate for your 
attention and review. 
   Sure, will address this PR later this week. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6064) Improve JDBCExecutor#getTableSchema Use ColName

2023-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6064:
-
Labels: pull-request-available  (was: )

> Improve JDBCExecutor#getTableSchema Use ColName
> ---
>
> Key: HUDI-6064
> URL: https://issues.apache.org/jira/browse/HUDI-6064
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code 
> reading, use ColName instead of ColIndex.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] LinMingQiang commented on pull request #8338: [HUDI-5996] Verify the consistency of bucket num at job sta…

2023-04-11 Thread via GitHub


LinMingQiang commented on PR #8338:
URL: https://github.com/apache/hudi/pull/8338#issuecomment-1504502481

   add test cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] slfan1989 opened a new pull request, #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.

2023-04-11 Thread via GitHub


slfan1989 opened a new pull request, #8435:
URL: https://github.com/apache/hudi/pull/8435

   ### Change Logs
   
   JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code 
reading, use ColName instead of ColIndex.
   
   ### Impact
   
   none.
   
   ### Risk level (write none, low medium or high below)
   
   none.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   none.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6064) Improve JDBCExecutor#getTableSchema Use ColName

2023-04-11 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-6064:


 Summary: Improve JDBCExecutor#getTableSchema Use ColName
 Key: HUDI-6064
 URL: https://issues.apache.org/jira/browse/HUDI-6064
 Project: Apache Hudi
  Issue Type: Improvement
  Components: hive
Reporter: Shilun Fan
Assignee: Shilun Fan


JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code 
reading, use ColName instead of ColIndex.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6064) Improve JDBCExecutor#getTableSchema Use ColName

2023-04-11 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-6064:
-
Status: In Progress  (was: Open)

> Improve JDBCExecutor#getTableSchema Use ColName
> ---
>
> Key: HUDI-6064
> URL: https://issues.apache.org/jira/browse/HUDI-6064
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code 
> reading, use ColName instead of ColIndex.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xccui commented on issue #8325: [SUPPORT] spark read hudi error: Unable to instantiate HFileBootstrapIndex

2023-04-11 Thread via GitHub


xccui commented on issue #8325:
URL: https://github.com/apache/hudi/issues/8325#issuecomment-1504472901

   Got some time today to take a closer look at the errors. 
`HFileBootstrapIndex` needs to access some remote data during initialization. 
There should be some connection issues (e.g. file system closed or connection 
interrupted due to some reason) causing the initialization to fail. Shouldn't 
be a compatibility problem.
   
   Maybe we could move some logic out of the constructor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8433: [minor] avoid synchronized block in ReflectionUtils if key is present in cache

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8433:
URL: https://github.com/apache/hudi/pull/8433#issuecomment-1504463584

   
   ## CI report:
   
   * 106eefb312139bfec944a5693e2e3608f0a11bd1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16268)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on a diff in pull request #8378: [HUDI-6031] fix bug: checkpoint lost after changing cow to mor

2023-04-11 Thread via GitHub


bvaradar commented on code in PR #8378:
URL: https://github.com/apache/hudi/pull/8378#discussion_r1163506777


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java:
##
@@ -650,16 +648,19 @@ private JavaRDD 
getTransformedRDD(Dataset rowDataset, boolea
 
   /**
* Process previous commit metadata and checkpoint configs set by user to 
determine the checkpoint to resume from.
-   * @param commitTimelineOpt commit timeline of interest.
+   *
+   * @param commitsTimelineOpt commits timeline of interest, including .commit 
and .deltacommit.
* @return the checkpoint to resume from if applicable.
* @throws IOException
*/
-  private Option getCheckpointToResume(Option 
commitTimelineOpt) throws IOException {
+  private Option getCheckpointToResume(Option 
commitsTimelineOpt) throws IOException {
 Option resumeCheckpointStr = Option.empty();
-Option lastCommit = commitTimelineOpt.get().lastInstant();
+// try get checkpoint from commits(including commit and deltacommit)
+// in COW migrating to MOR case, the first batch of the deltastreamer will 
lost the checkpoint from COW table, cause the dataloss
+Option lastCommit = commitsTimelineOpt.get().lastInstant();

Review Comment:
   For MOR table, we need to only read .deltacommit files if there is atleast 
one .deltacommit in the timeline. Otherwise, pick the latest .commit file. 
   This is safe approach . 
   If there are no .deltacommit, then this table is either empty or just being 
converted from COW to MOR. In this case, pick the latest .commit and read 
checkpoint from there. 
   So, the pseudo-code is something like 
   
   ```
   boolean hasNoDeltaCommit = commitsTimelineOpt.filter(instant -> 
instant.action.equals(HoodieTimeline.DELTA_COMMIT_ACTION).empty()
   if (isMOR && hasNoDeltaCommit) {
   commitsTimelineOpt = commitsTimelineOpt.filter(instant -> 
!instant.action.equals(HoodieTimeline.DELTA_COMMIT_ACTION) 
   }
   /// Rest of the code
   ```
   
   Let me know if you have questions. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wuwenchi commented on pull request #7834: [HUDI-5690] Add simpleBucketPartitioner to support using the simple bucket index under bulkinsert

2023-04-11 Thread via GitHub


wuwenchi commented on PR #7834:
URL: https://github.com/apache/hudi/pull/7834#issuecomment-1504421161

   > @wuwenchi : Please ping me in this PR once you have addressed all comments 
and is ready for review.
   
   @bvaradar All comments have now been corrected, it's ready to review now, 
thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on pull request #8424: [HUDI-6057] Fix deltastreamer shutdown when post write termination strategy enabled

2023-04-11 Thread via GitHub


codope commented on PR #8424:
URL: https://github.com/apache/hudi/pull/8424#issuecomment-1504419143

   @LiJie20190102 Have you checked if this fix solves your problem? 
https://github.com/apache/hudi/pull/8173


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #7834: [HUDI-5690] Add simpleBucketPartitioner to support using the simple bucket index under bulkinsert

2023-04-11 Thread via GitHub


bvaradar commented on PR #7834:
URL: https://github.com/apache/hudi/pull/7834#issuecomment-1504414285

   @wuwenchi : Please ping me in this PR once you have addressed all comments 
and is ready for review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #7143: [HUDI-5175] Improving FileIndex load performance in PARALLELISM mode

2023-04-11 Thread via GitHub


bvaradar commented on PR #7143:
URL: https://github.com/apache/hudi/pull/7143#issuecomment-1504411183

   @zhangyue19921010 : Pinging to see if you can address review comments ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #7913: Adding support for EPOCHMICROSECONDS in TimestampBasedAvroKeyGenerator

2023-04-11 Thread via GitHub


bvaradar commented on PR #7913:
URL: https://github.com/apache/hudi/pull/7913#issuecomment-1504409552

   @sydneybeal : Pinging to see if you can address the comments ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (8f014c033c3 -> 10040de05ad)

2023-04-11 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 8f014c033c3 [HUDI-6014] Remove unused import in hudi-spark (#8350)
 add 10040de05ad [HUDI-5389] Remove Hudi Cli Duplicates Code. (#8360)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/cli/commands/RepairsCommand.java   |   2 +-
 .../org/apache/hudi/cli/commands/SparkMain.java|   4 +-
 .../scala/org/apache/hudi/cli/DeDupeType.scala |  28 ---
 .../scala/org/apache/hudi/cli/DedupeSparkJob.scala | 248 -
 .../scala/org/apache/hudi/cli/SparkHelpers.scala   | 147 
 .../org/apache/spark/sql/hudi/DedupeSparkJob.scala |   4 +-
 6 files changed, 5 insertions(+), 428 deletions(-)
 delete mode 100644 hudi-cli/src/main/scala/org/apache/hudi/cli/DeDupeType.scala
 delete mode 100644 
hudi-cli/src/main/scala/org/apache/hudi/cli/DedupeSparkJob.scala
 delete mode 100644 
hudi-cli/src/main/scala/org/apache/hudi/cli/SparkHelpers.scala



[GitHub] [hudi] bvaradar merged pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

2023-04-11 Thread via GitHub


bvaradar merged PR #8360:
URL: https://github.com/apache/hudi/pull/8360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #7680: [HUDI-5548] spark sql show | update hudi's table properties

2023-04-11 Thread via GitHub


bvaradar commented on PR #7680:
URL: https://github.com/apache/hudi/pull/7680#issuecomment-1504392751

   @XuQianJin-Stars : Can you let us know if you will be able to look at the 
failing test and also rebase please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder

2023-04-11 Thread via GitHub


bvaradar commented on PR #8385:
URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504389808

   @Mulavar : We need to create a higher version 6  and write upgrade/downgrade 
to handle transition from current version (5). You can look at 
https://github.com/apache/hudi/pull/6248 as example of how to do this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-11 Thread via GitHub


hudi-bot commented on PR #7881:
URL: https://github.com/apache/hudi/pull/7881#issuecomment-1504379935

   
   ## CI report:
   
   * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN
   * a2a75f077cf831e05b5659eaf0990ebc4865622e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16267)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-6014] Remove unused import in hudi-spark (#8350)

2023-04-11 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 8f014c033c3 [HUDI-6014] Remove unused import in hudi-spark (#8350)
8f014c033c3 is described below

commit 8f014c033c3b7332d91fa09fce0631e5a59600d7
Author: huangxiaoping <1754789...@qq.com>
AuthorDate: Wed Apr 12 09:16:38 2023 +0800

[HUDI-6014] Remove unused import in hudi-spark (#8350)
---
 .../spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala| 2 --
 .../apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala   | 1 -
 .../sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala | 1 -
 .../src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala   | 3 ---
 4 files changed, 7 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala
index c6c39b73989..97930432e4e 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala
@@ -32,8 +32,6 @@ import org.apache.hudi.common.table.timeline.{HoodieInstant, 
HoodieTimeline, Tim
 import org.apache.hudi.exception.HoodieException
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.Row
-import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable
 import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, 
StructType}
 import java.io.File
 import java.util
diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala
index ca8b3fc95bc..43d636b65ec 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala
@@ -22,7 +22,6 @@ import org.apache.hudi.client.SparkRDDWriteClient
 import org.apache.hudi.common.table.timeline.HoodieActiveTimeline
 import org.apache.hudi.common.util.JsonUtils
 import org.apache.hudi.config.HoodieCleanConfig
-import org.apache.hudi.table.action.clean.CleaningTriggerStrategy
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, 
StructType}
diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala
index d75df07fc9d..e245159c849 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala
@@ -17,7 +17,6 @@
 
 package org.apache.spark.sql.hudi.command.procedures
 
-import org.apache.hudi.HoodieCLIUtils
 import org.apache.hudi.common.table.HoodieTableMetaClient
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, 
StructType}
diff --git 
a/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala
 
b/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala
index 9759356b720..c99f2b197a1 100644
--- 
a/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala
+++ 
b/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala
@@ -18,12 +18,9 @@
 
 package org.apache.hudi
 
-import org.apache.hudi.HoodieUnsafeRDD
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.expressions.AttributeReference
 import org.apache.spark.sql.execution.datasources.{FilePartition, FileScanRDD, 
PartitionedFile}
-import org.apache.spark.sql.types.StructType
 
 class Spark2HoodieFileScanRDD(@transient private val sparkSession: 
SparkSession,
   read: PartitionedFile => Iterator[InternalRow],



[GitHub] [hudi] bvaradar merged pull request #8350: [HUDI-6014] Remove unused import in hudi-spark

2023-04-11 Thread via GitHub


bvaradar merged PR #8350:
URL: https://github.com/apache/hudi/pull/8350


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8434:
URL: https://github.com/apache/hudi/pull/8434#issuecomment-1504336034

   
   ## CI report:
   
   * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16269)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8433: [minor] avoid synchronized block in ReflectionUtils if key is present in cache

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8433:
URL: https://github.com/apache/hudi/pull/8433#issuecomment-1504335986

   
   ## CI report:
   
   * 106eefb312139bfec944a5693e2e3608f0a11bd1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16268)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8434:
URL: https://github.com/apache/hudi/pull/8434#issuecomment-1504328434

   
   ## CI report:
   
   * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8433: [minor] avoid synchronized block in ReflectionUtils if key is present in cache

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8433:
URL: https://github.com/apache/hudi/pull/8433#issuecomment-1504328379

   
   ## CI report:
   
   * 106eefb312139bfec944a5693e2e3608f0a11bd1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6063) Modify logging errors In JDBCExecutor

2023-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6063:
-
Labels: pull-request-available  (was: )

> Modify logging errors In JDBCExecutor
> -
>
> Key: HUDI-6063
> URL: https://issues.apache.org/jira/browse/HUDI-6063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> There is a logging error in JDBCExecutor. During the process of drop 
> partitions, the log prints add partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] slfan1989 opened a new pull request, #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.

2023-04-11 Thread via GitHub


slfan1989 opened a new pull request, #8434:
URL: https://github.com/apache/hudi/pull/8434

   ### Change Logs
   
   There is a logging error in JDBCExecutor. During the process of drop 
partitions, the log prints add partitions.
   
   ### Impact
   
   none.
   
   ### Risk level (write none, low medium or high below)
   
   none.
   
   ### Documentation Update
   
   none.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-6063) Modify logging errors In JDBCExecutor

2023-04-11 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-6063:


Assignee: Shilun Fan

> Modify logging errors In JDBCExecutor
> -
>
> Key: HUDI-6063
> URL: https://issues.apache.org/jira/browse/HUDI-6063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> There is a logging error in JDBCExecutor. During the process of drop 
> partitions, the log prints add partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6063) Modify logging errors In JDBCExecutor

2023-04-11 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-6063:
-
Status: In Progress  (was: Open)

> Modify logging errors In JDBCExecutor
> -
>
> Key: HUDI-6063
> URL: https://issues.apache.org/jira/browse/HUDI-6063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> There is a logging error in JDBCExecutor. During the process of drop 
> partitions, the log prints add partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6063) Modify logging errors In JDBCExecutor

2023-04-11 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-6063:


 Summary: Modify logging errors In JDBCExecutor
 Key: HUDI-6063
 URL: https://issues.apache.org/jira/browse/HUDI-6063
 Project: Apache Hudi
  Issue Type: Bug
  Components: hive
Reporter: Shilun Fan


There is a logging error in JDBCExecutor. During the process of drop 
partitions, the log prints add partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] slfan1989 commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

2023-04-11 Thread via GitHub


slfan1989 commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1163442102


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##
@@ -258,13 +258,28 @@ protected void syncHoodieTable(String tableName, boolean 
useRealtimeInputFormat,
   lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
 }
 LOG.info("Last commit time synced was found to be " + 
lastCommitTimeSynced.orElse("null"));
-List writtenPartitionsSince = 
syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
-LOG.info("Storage partitions scan complete. Found " + 
writtenPartitionsSince.size());
 
-// Sync the partitions if needed
-// find dropped partitions, if any, in the latest commit
-Set droppedPartitions = 
syncClient.getDroppedPartitionsSince(lastCommitTimeSynced);
-boolean partitionsChanged = syncPartitions(tableName, 
writtenPartitionsSince, droppedPartitions);
+boolean partitionsChanged;
+if (!lastCommitTimeSynced.isPresent()
+|| 
syncClient.getActiveTimeline().isBeforeTimelineStarts(lastCommitTimeSynced.get()))
 {
+  // If the last commit time synced is before the start of the active 
timeline,
+  // the Hive sync falls back to list all partitions on storage, instead of
+  // reading active and archived timelines for written partitions.
+  LOG.info("Sync all partitions given the last commit time synced is empty 
or "
+  + "before the start of the active timeline. Listing all partitions 
in "
+  + config.getString(META_SYNC_BASE_PATH)
+  + ", file system: " + config.getHadoopFileSystem());
+  partitionsChanged = syncAllPartitions(tableName);
+} else {
+  List writtenPartitionsSince = 
syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
+  LOG.info("Storage partitions scan complete. Found " + 
writtenPartitionsSince.size());

Review Comment:
   LOG.info("Storage partitions scan complete.  Found {}.", 
writtenPartitionsSince.size());



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] slfan1989 commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

2023-04-11 Thread via GitHub


slfan1989 commented on code in PR #8388:
URL: https://github.com/apache/hudi/pull/8388#discussion_r1163441854


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##
@@ -258,13 +258,28 @@ protected void syncHoodieTable(String tableName, boolean 
useRealtimeInputFormat,
   lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName);
 }
 LOG.info("Last commit time synced was found to be " + 
lastCommitTimeSynced.orElse("null"));
-List writtenPartitionsSince = 
syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
-LOG.info("Storage partitions scan complete. Found " + 
writtenPartitionsSince.size());
 
-// Sync the partitions if needed
-// find dropped partitions, if any, in the latest commit
-Set droppedPartitions = 
syncClient.getDroppedPartitionsSince(lastCommitTimeSynced);
-boolean partitionsChanged = syncPartitions(tableName, 
writtenPartitionsSince, droppedPartitions);
+boolean partitionsChanged;
+if (!lastCommitTimeSynced.isPresent()
+|| 
syncClient.getActiveTimeline().isBeforeTimelineStarts(lastCommitTimeSynced.get()))
 {
+  // If the last commit time synced is before the start of the active 
timeline,
+  // the Hive sync falls back to list all partitions on storage, instead of
+  // reading active and archived timelines for written partitions.
+  LOG.info("Sync all partitions given the last commit time synced is empty 
or "
+  + "before the start of the active timeline. Listing all partitions 
in "
+  + config.getString(META_SYNC_BASE_PATH)
+  + ", file system: " + config.getHadoopFileSystem());
+  partitionsChanged = syncAllPartitions(tableName);
+} else {
+  List writtenPartitionsSince = 
syncClient.getWrittenPartitionsSince(lastCommitTimeSynced);
+  LOG.info("Storage partitions scan complete. Found " + 
writtenPartitionsSince.size());

Review Comment:
   Our logging has changed to slf4j, can we use {}?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown opened a new pull request, #8433: [minor] avoid synchronized block in ReflectionUtils if key is present in cache

2023-04-11 Thread via GitHub


the-other-tim-brown opened a new pull request, #8433:
URL: https://github.com/apache/hudi/pull/8433

   ### Change Logs
   
   Avoids acquiring a lock to check whether a value is present in a cache to 
allow better performance when the value is already in the cache.
   
   ### Impact
   
   This method is invoked on all rows in the DeltaStreamer when building the 
payload class. This should provide a minor improvement in execution time.
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-11 Thread via GitHub


hudi-bot commented on PR #7881:
URL: https://github.com/apache/hudi/pull/7881#issuecomment-1504268255

   
   ## CI report:
   
   * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN
   * 8fd9b3a58eb63e330b306ed70843e677dfbc4a2d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16223)
 
   * a2a75f077cf831e05b5659eaf0990ebc4865622e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16267)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-11 Thread via GitHub


hudi-bot commented on PR #7881:
URL: https://github.com/apache/hudi/pull/7881#issuecomment-1504260452

   
   ## CI report:
   
   * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN
   * 8fd9b3a58eb63e330b306ed70843e677dfbc4a2d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16223)
 
   * a2a75f077cf831e05b5659eaf0990ebc4865622e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8432: Fix NPE when upsert merger and null map or array

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8432:
URL: https://github.com/apache/hudi/pull/8432#issuecomment-1504253664

   
   ## CI report:
   
   * f4502dad350e0dc84299dc0bd5889506420b0f49 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16266)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5997) Support DFS Schema Provider with S3/GCS EventsHoodieIncrSource

2023-04-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HUDI-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Léo Biscassi updated HUDI-5997:
---
Status: In Progress  (was: Open)

> Support DFS Schema Provider with S3/GCS EventsHoodieIncrSource
> --
>
> Key: HUDI-5997
> URL: https://issues.apache.org/jira/browse/HUDI-5997
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Sagar Sumit
>Assignee: Léo Biscassi
>Priority: Major
> Fix For: 0.14.0
>
>
> See for more details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5997) Support DFS Schema Provider with S3/GCS EventsHoodieIncrSource

2023-04-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HUDI-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Léo Biscassi reassigned HUDI-5997:
--

Assignee: Léo Biscassi

> Support DFS Schema Provider with S3/GCS EventsHoodieIncrSource
> --
>
> Key: HUDI-5997
> URL: https://issues.apache.org/jira/browse/HUDI-5997
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Sagar Sumit
>Assignee: Léo Biscassi
>Priority: Major
> Fix For: 0.14.0
>
>
> See for more details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-11 Thread via GitHub


jonvex commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1163391536


##
hudi-common/src/test/java/org/apache/hudi/common/config/TestConfigProperty.java:
##
@@ -171,4 +171,28 @@ public void testAdvancedValue() {
 assertTrue(FAKE_BOOLEAN_CONFIG.markAdvanced().isAdvanced());
 assertTrue(FAKE_BOOLEAN_CONFIG_NO_DEFAULT.markAdvanced().isAdvanced());
   }
+
+  @EnumDescription("Test enum description.")
+  public enum TestEnum {

Review Comment:
   That has to happen in the getter and/or the setter, it won't happen in 
configproperty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8082:
URL: https://github.com/apache/hudi/pull/8082#issuecomment-1504203496

   
   ## CI report:
   
   * f43a772d2efe7d19657b44d2ce8b92b8fcee390f UNKNOWN
   * 58edb8dbad6d6e4dd7455bcabc5e5f70369493ab Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16265)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-11 Thread via GitHub


jonvex commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1163375572


##
hudi-common/src/main/java/org/apache/hudi/common/config/ConfigProperty.java:
##
@@ -139,6 +144,49 @@ public ConfigProperty withDocumentation(String doc) {
 return new ConfigProperty<>(key, defaultValue, docOnDefaultValue, doc, 
sinceVersion, deprecatedVersion, inferFunction, validValues, advanced, 
alternatives);
   }
 
+  public > ConfigProperty withDocumentation(Class e) {
+return withDocumentation(e,"");
+  }
+
+  private > boolean isDefaultField(Class e, Field f) {
+if (!hasDefaultValue()) {
+  return false;
+}
+if (defaultValue() instanceof String) {
+  return f.getName().equals(defaultValue());
+}
+return Enum.valueOf(e, f.getName()).equals(defaultValue());
+  }
+
+  public > ConfigProperty withDocumentation(Class e, 
String doc) {

Review Comment:
   Why? Sometimes the config needs some extra explanation that the enum doesn't 
provide



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-11 Thread via GitHub


jonvex commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1163352137


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -168,23 +168,16 @@ public class HoodieWriteConfig extends HoodieConfig {
 
   public static final ConfigProperty WRITE_EXECUTOR_TYPE = 
ConfigProperty
   .key("hoodie.write.executor.type")
-  .defaultValue(SIMPLE.name())
-  
.withValidValues(Arrays.stream(ExecutorType.values()).map(Enum::name).toArray(String[]::new))
-  .sinceVersion("0.13.0")
-  .withDocumentation("Set executor which orchestrates concurrent producers 
and consumers communicating through a message queue."
-  + "BOUNDED_IN_MEMORY: Use LinkedBlockingQueue as a bounded in-memory 
queue, this queue will use extra lock to balance producers and consumer"
-  + "DISRUPTOR: Use disruptor which a lock free message queue as inner 
message, this queue may gain better writing performance if lock was the 
bottleneck. "
-  + "SIMPLE(default): Executor with no inner message queue and no 
inner lock. Consuming and writing records from iterator directly. Compared with 
BIM and DISRUPTOR, "
-  + "this queue has no need for additional memory and cpu resources 
due to lock or multithreading, but also lost some benefits such as speed limit. 
"
-  + "Although DISRUPTOR is still experimental.");
+  .defaultValue(ExecutorType.SIMPLE.name())
+  .withDocumentation(ExecutorType.class)
+  .sinceVersion("0.13.0");
 
   public static final ConfigProperty KEYGENERATOR_TYPE = ConfigProperty
   .key("hoodie.datasource.write.keygenerator.type")
   .defaultValue(KeyGeneratorType.SIMPLE.name())
-  .withDocumentation("Easily configure one the built-in key generators, 
instead of specifying the key generator class."
-  + "Currently supports SIMPLE, COMPLEX, TIMESTAMP, CUSTOM, 
NON_PARTITION, GLOBAL_DELETE. "
-  + "**Note** This is being actively worked on. Please use "
-  + "`hoodie.datasource.write.keygenerator.class` instead.");
+  .withDocumentation(KeyGeneratorType.class,

Review Comment:
   This seems correct to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-11 Thread via GitHub


jonvex commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1163346107


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##
@@ -732,40 +712,17 @@ public String getValue() {
 }
   }
 
+  @EnumDescription("Clustering mode to use.")
   public enum ClusteringOperator {
 
-/**
- * only schedule the clustering plan
- */
-SCHEDULE("schedule"),
-
-/**
- * only execute then pending clustering plans
- */
-EXECUTE("execute"),
-
-/**
- * schedule cluster first, and execute all pending clustering plans
- */
-SCHEDULE_AND_EXECUTE("scheduleandexecute");
+@EnumFieldDescription("Only schedule the clustering plan.")
+SCHEDULE,
 
-private static final Map VALUE_TO_ENUM_MAP =
-TypeUtils.getValueToEnumMap(ClusteringOperator.class, e -> 
e.value);
+@EnumFieldDescription("Only execute pending clustering plans.")
+EXECUTE,
 
-private final String value;
-
-ClusteringOperator(String value) {
-  this.value = value;
-}
-
-@Nonnull
-public static ClusteringOperator fromValue(String value) {
-  ClusteringOperator enumValue = VALUE_TO_ENUM_MAP.get(value);
-  if (enumValue == null) {
-throw new HoodieException(String.format("Invalid value (%s)", value));
-  }
-  return enumValue;
-}
+@EnumFieldDescription("Schedule cluster first, and execute all pending 
clustering plans.")
+SCHEDULE_AND_EXECUTE;

Review Comment:
   Yeah, I reverted it. I mentioned it in the jira issue I created



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8432: Fix NPE when upsert merger and null map or array

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8432:
URL: https://github.com/apache/hudi/pull/8432#issuecomment-1504104875

   
   ## CI report:
   
   * f4502dad350e0dc84299dc0bd5889506420b0f49 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16266)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8430: [HUDI-6060] Added a config to backup instants before deletion during rollbacks and restores.

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8430:
URL: https://github.com/apache/hudi/pull/8430#issuecomment-1504104816

   
   ## CI report:
   
   * d357330a200b9c5ad7f719d9985d40ef2e604d51 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16263)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6062) Update LayoutOptimizationStrategy and ClusteringOperator to use standard enum notation

2023-04-11 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-6062:
-

 Summary: Update LayoutOptimizationStrategy and ClusteringOperator 
to use standard enum notation
 Key: HUDI-6062
 URL: https://issues.apache.org/jira/browse/HUDI-6062
 Project: Apache Hudi
  Issue Type: Improvement
  Components: clustering, code-quality, configs
Reporter: Jonathan Vexler


ClusteringOperator and LayoutOptimizationStrategy have enums with values that 
are not capitalized snake case like every other config. We need to maintain 
backwards compatibility so we can't just change this. To make this change, we 
need to have the old values be translated to the updated values so that if a 
user uses the old values, it will still work. For example if the 
hoodie.layout.optimize.strategy config is set to "z-order" we need to translate 
it to "ZORDER" and then use "ZORDER" internally. But the user could also set 
the config to "ZORDER" of course.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6061) NPE with nullable MapType and new hudi merger

2023-04-11 Thread nicolas paris (Jira)
nicolas paris created HUDI-6061:
---

 Summary: NPE with nullable MapType and new hudi merger
 Key: HUDI-6061
 URL: https://issues.apache.org/jira/browse/HUDI-6061
 Project: Apache Hudi
  Issue Type: Bug
  Components: core
Reporter: nicolas paris
 Fix For: 0.13.1


In 0.13.0, when dealing with null map values during an upsert with the new hudi 
merger api, then null pointer raises. AFAIK, it happens when both MapTypes are 
containing null in different maner.

 

See [issue]([https://github.com/apache/hudi/issues/8431)] for details

See [PR]([https://github.com/apache/hudi/pull/8432)] for details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-11 Thread via GitHub


jonvex commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1163302008


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/bootstrap/BootstrapMode.java:
##
@@ -18,18 +18,28 @@
 
 package org.apache.hudi.client.bootstrap;
 
+import org.apache.hudi.common.config.EnumDescription;
+import org.apache.hudi.common.config.EnumFieldDescription;
+
 /**
  * Identifies different types of bootstrap.
  */
+@EnumDescription("Bootstrap mode to apply for partition paths that match the 
regex set in `hoodie.bootstrap.mode.selector.regex`.")

Review Comment:
   You don't need to use regex selector. It's also used in uniform selector



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8432: Fix NPE when upsert merger and null map or array

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8432:
URL: https://github.com/apache/hudi/pull/8432#issuecomment-150405

   
   ## CI report:
   
   * f4502dad350e0dc84299dc0bd5889506420b0f49 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni opened a new pull request, #8432: Fix NPE when upsert merger and null map or array

2023-04-11 Thread via GitHub


parisni opened a new pull request, #8432:
URL: https://github.com/apache/hudi/pull/8432

   ### Change Logs
   
   Fixes #8431
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8429:
URL: https://github.com/apache/hudi/pull/8429#issuecomment-1504039432

   
   ## CI report:
   
   * 18f438577f444c75e8060a20b7fdf59e40e9ab7e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16262)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni opened a new issue, #8431: [SUPPORT] NPE with MapType and new hudi merger

2023-04-11 Thread via GitHub


parisni opened a new issue, #8431:
URL: https://github.com/apache/hudi/issues/8431

   **Describe the problem you faced**
   
   When dealing with null map values during an upsert with the new hudi merger 
api, then null pointer raises. AFAIK, it happens when both MapTypes are 
containing null in different maner.
   
   **To Reproduce**
   
   ```python
   from pyspark.sql.types import StructType, StructField, IntegerType, 
StringType, MapType, ArrayType, TimestampType
   
   tableName = 'test_hudi'
   basePath = "/tmp/{tableName}".format(tableName=tableName)
   
   data = [ ("a", None, 1, 'b'),
   ]
   schema = StructType( [
   
   StructField("event_id", StringType(), True),
   StructField(
   "mp",
   MapType(StringType(), ArrayType(TimestampType(), False), False)
   ),
   StructField("version", IntegerType(), True),
   StructField("event_date", StringType(), True),
   ]
   )
   df = (
   spark.createDataFrame(data=data, schema=schema)
   )
   #
   # INIT THE TABLE WITH INSERT
   #
   hudi_options = {
   "hoodie.table.name": tableName,
   "hoodie.datasource.write.recordkey.field": "event_id",
   "hoodie.datasource.write.partitionpath.field": "event_date",
   "hoodie.datasource.write.table.name": tableName,
   "hoodie.datasource.write.operation": "insert",
   "hoodie.datasource.write.precombine.field": "version",
   "hoodie.upsert.shuffle.parallelism": 1,
   "hoodie.insert.shuffle.parallelism": 1,
   "hoodie.delete.shuffle.parallelism": 1,
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.write.hive_style_partitioning": "true",
   "hoodie.datasource.hive_sync.database": "default",
   "hoodie.datasource.hive_sync.table": tableName,
   "hoodie.datasource.hive_sync.mode": "jdbc",
   "hoodie.combine.before.insert":"true",
   "hoodie.datasource.hive_sync.enable": "false",
   "hoodie.datasource.hive_sync.partition_fields": "event_date",
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.hive_sync.partition_extractor_class": 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
   'hoodie.datasource.hive_sync.use_jdbc': False,
   "hoodie.merge.allow.duplicate.on.inserts":"true",
   "hoodie.metadata.enable": "true",
   #"hoodie.datasource.write.payload.class": 
"org.apache.hudi.common.model.DefaultHoodieRecordPayload",
   "hoodie.payload.ordering.field": "version",
   "hoodie.payload.event.time.field": "version",
   "hoodie.datasource.write.record.merger.impls": 
"org.apache.hudi.HoodieSparkRecordMerger"
   }
   
(df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basePath))
   spark.read.format("hudi").load(basePath).printSchema()
   
   
   
   
   data = [ ("a", None, 1, 'b'),
   ]
   schema = StructType( [
   
   StructField("event_id", StringType(), True),
   StructField(
   "mp",
   MapType(StringType(), ArrayType(TimestampType(), True), False)
   ),
   StructField("version", IntegerType(), True),
   StructField("event_date", StringType(), True),
   ]
   )
   df = (
   spark.createDataFrame(data=data, schema=schema)
   )
   
   #
   # NOW UPSERT DATA WITH A DIFFERENT SCHEMA
   # 
   hudi_options = {
   "hoodie.table.name": tableName,
   "hoodie.datasource.write.recordkey.field": "event_id",
   "hoodie.datasource.write.partitionpath.field": "event_date",
   "hoodie.datasource.write.table.name": tableName,
   "hoodie.datasource.write.operation": "upsert",
   "hoodie.datasource.write.precombine.field": "version",
   "hoodie.upsert.shuffle.parallelism": 1,
   "hoodie.insert.shuffle.parallelism": 1,
   "hoodie.delete.shuffle.parallelism": 1,
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.write.hive_style_partitioning": "true",
   "hoodie.datasource.hive_sync.database": "default",
   "hoodie.datasource.hive_sync.table": tableName,
   "hoodie.datasource.hive_sync.mode": "jdbc",
   "hoodie.combine.before.insert":"true",
   "hoodie.datasource.hive_sync.enable": "false",
   "hoodie.datasource.hive_sync.partition_fields": "event_date",
   "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
   "hoodie.datasource.hive_sync.partition_extractor_class": 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
   'hoodie.datasource.hive_sync.use_jdbc': False,
   "hoodie.merge.allow.duplicate.on.inserts":"true",
   "hoodie.metadata.enable": "true",
   "hoodie.payload.ordering.field": "version",
   "hoodie.payload.event.time.field": "version",
   "hoodie.datasource.write.record.merger.impls": 

[GitHub] [hudi] CTTY commented on a diff in pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-04-11 Thread via GitHub


CTTY commented on code in PR #8082:
URL: https://github.com/apache/hudi/pull/8082#discussion_r1163228965


##
hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark332PlusHoodieParquetFileFormat.scala:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources.PartitionedFile
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.types.StructType
+
+class Spark332PlusHoodieParquetFileFormat(override protected val 
shouldAppendPartitionValues: Boolean) extends 
Spark32PlusHoodieParquetFileFormat(shouldAppendPartitionValues) {

Review Comment:
   With this class under `hudi-spark3.3.x`, Hudi won't be able to compile with 
Spark 3.3.1 anymore



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkFileReaderFactory.java:
##
@@ -33,6 +33,7 @@ protected HoodieFileReader newParquetFileReader(Configuration 
conf, Path path) {
 conf.setIfUnset(SQLConf.PARQUET_INT96_AS_TIMESTAMP().key(),
 SQLConf.PARQUET_INT96_AS_TIMESTAMP().defaultValueString());
 conf.setIfUnset(SQLConf.CASE_SENSITIVE().key(), 
SQLConf.CASE_SENSITIVE().defaultValueString());
+conf.setIfUnset("spark.sql.legacy.parquet.nanosAsLong", "false");

Review Comment:
   nit: Can we add a comment to explain why we put a plain string here?



##
hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark332PlusHoodieParquetFileFormat.scala:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources.PartitionedFile
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.types.StructType
+
+class Spark332PlusHoodieParquetFileFormat(override protected val 
shouldAppendPartitionValues: Boolean) extends 
Spark32PlusHoodieParquetFileFormat(shouldAppendPartitionValues) {
+
+  override def buildReaderWithPartitionValues(sparkSession: SparkSession,
+  dataSchema: StructType,
+  partitionSchema: StructType,
+  requiredSchema: StructType,
+  filters: Seq[Filter],
+  options: Map[String, String],
+  hadoopConf: Configuration): 
PartitionedFile => Iterator[InternalRow] = {
+// Sets flags for `ParquetToSparkSchemaConverter`
+hadoopConf.setBoolean(SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.key, 
sparkSession.sessionState.conf.legacyParquetNanosAsLong)

Review Comment:
   Maybe use string here for property name would help build issues with Spark 
3.3.1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to 

[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1503970614

   
   ## CI report:
   
   * b7ab237090a715521e580113486849489d1bf00c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8082:
URL: https://github.com/apache/hudi/pull/8082#issuecomment-1503894762

   
   ## CI report:
   
   * f43a772d2efe7d19657b44d2ce8b92b8fcee390f UNKNOWN
   * 46a2c22795b5e1be2bc74f92090cd1a496ea9a39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15754)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16264)
 
   * 58edb8dbad6d6e4dd7455bcabc5e5f70369493ab Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16265)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8082:
URL: https://github.com/apache/hudi/pull/8082#issuecomment-1503886479

   
   ## CI report:
   
   * f43a772d2efe7d19657b44d2ce8b92b8fcee390f UNKNOWN
   * 46a2c22795b5e1be2bc74f92090cd1a496ea9a39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15754)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16264)
 
   * 58edb8dbad6d6e4dd7455bcabc5e5f70369493ab UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8082:
URL: https://github.com/apache/hudi/pull/8082#issuecomment-1503877790

   
   ## CI report:
   
   * f43a772d2efe7d19657b44d2ce8b92b8fcee390f UNKNOWN
   * 46a2c22795b5e1be2bc74f92090cd1a496ea9a39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15754)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16264)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8358: [HUDI-6017] Sort the results of Call help Procedure with no params

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8358:
URL: https://github.com/apache/hudi/pull/8358#issuecomment-1503878485

   
   ## CI report:
   
   * 2c0e780e2dce3717fc3586417b5110dea2ca028c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16259)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11

2023-04-11 Thread via GitHub


nsivabalan commented on PR #8429:
URL: https://github.com/apache/hudi/pull/8429#issuecomment-1503874676

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8016: Inline Clustering : Clustering failed to write to files

2023-04-11 Thread via GitHub


ad1happy2go commented on issue #8016:
URL: https://github.com/apache/hudi/issues/8016#issuecomment-1503856835

   @raghavant-git Did you got a chance to test with those parameters? Are you 
still facing this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-04-11 Thread via GitHub


jonvex commented on code in PR #8303:
URL: https://github.com/apache/hudi/pull/8303#discussion_r1163162168


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##
@@ -270,6 +271,21 @@ object DefaultSource {
 }
   }
 
+  private def resolveHoodieBootstrapRelation(sqlContext: SQLContext,
+ globPaths: Seq[Path],
+ userSchema: Option[StructType],
+ metaClient: HoodieTableMetaClient,
+ parameters: Map[String, String]): 
BaseRelation = {
+val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, 
sqlContext.sparkSession.sessionState.conf,
+  ENABLE_HOODIE_FILE_INDEX.key, 
ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
+if (!enableFileIndex || globPaths.nonEmpty || 
parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != 
"true") {

Review Comment:
   When I set a breakpoint here, userschema was null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Madan16 commented on issue #8428: [SUPPORT]: When trying to UPSERT, Getting issues like : An error occurred while calling o168.save. org/apache/spark/sql/avro/SchemaConverters$ AND

2023-04-11 Thread via GitHub


Madan16 commented on issue #8428:
URL: https://github.com/apache/hudi/issues/8428#issuecomment-1503853073

   > @Madan16 I wanted to ask were you using AWS Glue version : Glue 3.0 only 
from start. (When the job is successful)
   > 
   > My guess is somehow the version mismatch might be happening which is 
resulting in ClassNot Found for SchemaConverters which is not present in older 
avro versions.
   
   @ad1happy2go : yeah Glue 3.0 version since beginning


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8017: [SUPPORT] Parquet file size is small after running deltastreamer in BULK_INSERT which results in large number of files under same partitioning

2023-04-11 Thread via GitHub


ad1happy2go commented on issue #8017:
URL: https://github.com/apache/hudi/issues/8017#issuecomment-1503852321

   @ROOBALJINDAL Are you still facing this issue? If yes, can you provide 
reproducible script if possible. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-04-11 Thread via GitHub


jonvex commented on code in PR #7881:
URL: https://github.com/apache/hudi/pull/7881#discussion_r1163153414


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -194,16 +187,18 @@ public class HoodieWriteConfig extends HoodieConfig {
 
   public static final ConfigProperty TIMELINE_LAYOUT_VERSION_NUM = 
ConfigProperty
   .key("hoodie.timeline.layout.version")
-  .defaultValue(Integer.toString(TimelineLayoutVersion.VERSION_1))
+  .defaultValue(Integer.toString(TimelineLayoutVersion.CURR_VERSION))
+  
.withValidValues(Integer.toString(TimelineLayoutVersion.VERSION_0),Integer.toString(TimelineLayoutVersion.VERSION_1))
   .sinceVersion("0.5.1")
   .withDocumentation("Controls the layout of the timeline. Version 0 
relied on renames, Version 1 (default) models "
   + "the timeline as an immutable log relying only on atomic writes 
for object storage.");
 
   public static final ConfigProperty BASE_FILE_FORMAT = 
ConfigProperty
   .key("hoodie.table.base.file.format")
   .defaultValue(HoodieFileFormat.PARQUET)
-  .withAlternatives("hoodie.table.ro.file.format")
-  .withDocumentation("Base file format to store all the base file data.");
+  .withValidValues(HoodieFileFormat.PARQUET.name(), 
HoodieFileFormat.ORC.name(), HoodieFileFormat.HFILE.name())
+  .withDocumentation(HoodieFileFormat.class, "File format to store all the 
base file data.")

Review Comment:
   That one doesn't work because the enum has HOODIE_LOG as a value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8428: [SUPPORT]: When trying to UPSERT, Getting issues like : An error occurred while calling o168.save. org/apache/spark/sql/avro/SchemaConverters$

2023-04-11 Thread via GitHub


ad1happy2go commented on issue #8428:
URL: https://github.com/apache/hudi/issues/8428#issuecomment-1503843944

   @Madan16 I wanted to ask were you using AWS Glue version : Glue 3.0 only 
from start. (When the job is successful)
   
   My guess is somehow the version mismatch might be happening which is 
resulting in ClassNot Found for SchemaConverters which is not present in older 
avro versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] rahil-c commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-04-11 Thread via GitHub


rahil-c commented on PR #8082:
URL: https://github.com/apache/hudi/pull/8082#issuecomment-1503840585

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8430: [HUDI-6060] Added a config to backup instants before deletion during rollbacks and restores.

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8430:
URL: https://github.com/apache/hudi/pull/8430#issuecomment-1503828676

   
   ## CI report:
   
   * d357330a200b9c5ad7f719d9985d40ef2e604d51 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16263)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zyclove commented on issue #8244: [SUPPORT] Is there any plan to support metadata management, index and table optimization services?

2023-04-11 Thread via GitHub


zyclove commented on issue #8244:
URL: https://github.com/apache/hudi/issues/8244#issuecomment-1503828199

   Do you have contact with the arctic project team? 
   As the arctic project very much hopes to support hudi metadata management 
and data optimization.
   
   > @zyclove Do you need any other help as part of this ticket or can we close 
the same?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8130: Spark java.util.NoSuchElementException: FileID * partition path p_c=CN_1 does not exist.

2023-04-11 Thread via GitHub


ad1happy2go commented on issue #8130:
URL: https://github.com/apache/hudi/issues/8130#issuecomment-1503826578

   @18511327133 
   
   Couldn't able to reproduce the issue. Can you provide exact reproducible 
script with your datasets.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8430: [HUDI-6060] Added a config to backup instants before deletion during rollbacks and restores.

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8430:
URL: https://github.com/apache/hudi/pull/8430#issuecomment-1503818107

   
   ## CI report:
   
   * d357330a200b9c5ad7f719d9985d40ef2e604d51 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8244: [SUPPORT] Is there any plan to support metadata management, index and table optimization services?

2023-04-11 Thread via GitHub


ad1happy2go commented on issue #8244:
URL: https://github.com/apache/hudi/issues/8244#issuecomment-1503814032

   @zyclove Do you need any other help as part of this ticket or can we close 
the same?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8236: [SUPPORT]Duplicate data in MOR table Hudi

2023-04-11 Thread via GitHub


ad1happy2go commented on issue #8236:
URL: https://github.com/apache/hudi/issues/8236#issuecomment-1503813179

   @xiagupqin Can you please let us know if you got this issue again with the 
fix or disabling metadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6060) Add config to backup instants before deletion during rollbacks

2023-04-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6060:
-
Labels: pull-request-available  (was: )

> Add config to backup instants before deletion during rollbacks
> --
>
> Key: HUDI-6060
> URL: https://issues.apache.org/jira/browse/HUDI-6060
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>
> When rollbacks / restores are performed, instants are deleted from the 
> .hoodie folder. Keeping a copy of such instants is useful for debugging 
> issues like the following:
>  # File left over without any commits
>  # Bugs which leave files during the failed commit which is rolled back
> The implementation provides a config (off by default) which when enabled 
> would backup the instants before deletion to a backup directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] prashantwason opened a new pull request, #8430: [HUDI-6060] Added a config to backup instants before deletion during rollbacks and restores.

2023-04-11 Thread via GitHub


prashantwason opened a new pull request, #8430:
URL: https://github.com/apache/hudi/pull/8430

   [HUDI-6060] Added a config to backup instants before deletion during 
rollbacks and restores.
   
   ### Change Logs
   
   1. Added config to enable backing up instants
   2. Added config for location of backup directory
   3. Added code to backup
   
   
   ### Impact
   
   None. New feature is off by default.
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   Config has the necessary docstring.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8355: [HUDI-6016] HoodieCLIUtils supports creating HoodieClient with non-default database

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8355:
URL: https://github.com/apache/hudi/pull/8355#issuecomment-1503798063

   
   ## CI report:
   
   * 61a2efa806a9004721966f885f84d95f1b882dbd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16258)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6060) Add config to backup instants before deletion during rollbacks

2023-04-11 Thread Prashant Wason (Jira)
Prashant Wason created HUDI-6060:


 Summary: Add config to backup instants before deletion during 
rollbacks
 Key: HUDI-6060
 URL: https://issues.apache.org/jira/browse/HUDI-6060
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Prashant Wason
Assignee: Prashant Wason


When rollbacks / restores are performed, instants are deleted from the .hoodie 
folder. Keeping a copy of such instants is useful for debugging issues like the 
following:
 # File left over without any commits
 # Bugs which leave files during the failed commit which is rolled back

The implementation provides a config (off by default) which when enabled would 
backup the instants before deletion to a backup directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8429:
URL: https://github.com/apache/hudi/pull/8429#issuecomment-1503736037

   
   ## CI report:
   
   * 18f438577f444c75e8060a20b7fdf59e40e9ab7e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16262)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11

2023-04-11 Thread via GitHub


hudi-bot commented on PR #8429:
URL: https://github.com/apache/hudi/pull/8429#issuecomment-1503717961

   
   ## CI report:
   
   * 18f438577f444c75e8060a20b7fdf59e40e9ab7e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Madan16 commented on issue #8428: [SUPPORT]: When trying to UPSERT, Getting issues like : An error occurred while calling o168.save. org/apache/spark/sql/avro/SchemaConverters$ AND

2023-04-11 Thread via GitHub


Madan16 commented on issue #8428:
URL: https://github.com/apache/hudi/issues/8428#issuecomment-1503690930

   > @Madan16 Looks like the avro library mismatch issue, as you are saying 
this is running fine for 2 months do you know if any aws lib or any other 
updated recently.
   
   @ad1happy2go : Sorry but I could not understand your question. Can you 
please be more specific so that I can provide more details. Note***: I am using 
this (pyspark) code in AWS glue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >