[GitHub] [hudi] hudi-bot removed a comment on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4660:
URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018227679


   
   ## CI report:
   
   * 590944041ba967d5390e5cc3d9b937226b6705af Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5405)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4660:
URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018262142


   
   ## CI report:
   
   * 590944041ba967d5390e5cc3d9b937226b6705af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5405)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018227649


   
   ## CI report:
   
   * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403)
 
   * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018258609


   
   ## CI report:
   
   * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403)
 
   * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5410)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Guanpx commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

2022-01-20 Thread GitBox


Guanpx commented on issue #4658:
URL: https://github.com/apache/hudi/issues/4658#issuecomment-1018254469


   > So you use the `upsert` mode right ? And the hoodie table has a pk there ?
   
   we use insert (append) mode, not have a unique key, does data will 
deduplicate?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2022-01-20 Thread GitBox


hudi-bot commented on pull request #3866:
URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018248383


   
   ## CI report:
   
   * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN
   * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN
   * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN
   * 6ded004f02b3a5ca4b8314f66df59a1abc9bf5a3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5409)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #3866:
URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018246488


   
   ## CI report:
   
   * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN
   * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN
   * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN
   * 7e96f0f751a745f3a77bed4461099aee2c00f697 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5402)
 
   * 6ded004f02b3a5ca4b8314f66df59a1abc9bf5a3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #3866:
URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018216050


   
   ## CI report:
   
   * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN
   * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN
   * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN
   * 7e96f0f751a745f3a77bed4461099aee2c00f697 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5402)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2022-01-20 Thread GitBox


hudi-bot commented on pull request #3866:
URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018246488


   
   ## CI report:
   
   * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN
   * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN
   * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN
   * 7e96f0f751a745f3a77bed4461099aee2c00f697 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5402)
 
   * 6ded004f02b3a5ca4b8314f66df59a1abc9bf5a3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] watermelon12138 commented on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …

2022-01-20 Thread GitBox


watermelon12138 commented on pull request #4645:
URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018234112


   @nsivabalan
   Ok, Thank you very much. These are some very good advice and I will try to 
land them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4662: [HUDI-3293] Fixing default value for clustering small file config

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4662:
URL: https://github.com/apache/hudi/pull/4662#issuecomment-1018230462


   
   ## CI report:
   
   * 789ecb457d2f5424674d512dd62d64480edc8c36 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4662: [HUDI-3293] Fixing default value for clustering small file config

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4662:
URL: https://github.com/apache/hudi/pull/4662#issuecomment-1018232163


   
   ## CI report:
   
   * 789ecb457d2f5424674d512dd62d64480edc8c36 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5408)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2903: [HUDI-1850][HUDI-3234] Fixing read of a empty table but with failed write

2022-01-20 Thread GitBox


nsivabalan commented on pull request #2903:
URL: https://github.com/apache/hudi/pull/2903#issuecomment-1018232403


   @YannByron : can you review the patch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4661: [HUDI-3292] Enabling lazy read by default for log blocks during compaction

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4661:
URL: https://github.com/apache/hudi/pull/4661#issuecomment-1018232140


   
   ## CI report:
   
   * aa1156a61a9a6f5559597eda6231567bf55fde42 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5407)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4661: [HUDI-3292] Enabling lazy read by default for log blocks during compaction

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4661:
URL: https://github.com/apache/hudi/pull/4661#issuecomment-1018230440


   
   ## CI report:
   
   * aa1156a61a9a6f5559597eda6231567bf55fde42 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4659:
URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018230413


   
   ## CI report:
   
   * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404)
 
   * cc6512086b494976e154cf2db10597953d3c71d4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5406)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4662: [HUDI-3293] Fixing default value for clustering small file config

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4662:
URL: https://github.com/apache/hudi/pull/4662#issuecomment-1018230462


   
   ## CI report:
   
   * 789ecb457d2f5424674d512dd62d64480edc8c36 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4661: [HUDI-3292] Enabling lazy read by default for log blocks during compaction

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4661:
URL: https://github.com/apache/hudi/pull/4661#issuecomment-1018230440


   
   ## CI report:
   
   * aa1156a61a9a6f5559597eda6231567bf55fde42 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4659:
URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018229037


   
   ## CI report:
   
   * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404)
 
   * cc6512086b494976e154cf2db10597953d3c71d4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wxplovecc commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true

2022-01-20 Thread GitBox


wxplovecc commented on a change in pull request #4654:
URL: https://github.com/apache/hudi/pull/4654#discussion_r789394982



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
##
@@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception {
*/
   private void waitForBootstrapReady(int taskID) {
 int taskNum = getRuntimeContext().getNumberOfParallelSubtasks();
+int attemptNum = getRuntimeContext().getAttemptNumber();
 int readyTaskNum = 1;
 while (taskNum != readyTaskNum) {
   try {
-readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new 
BootstrapAggFunction());
-LOG.info("Waiting for other bootstrap tasks to complete, taskId = 
{}.", taskID);
+readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + 
attemptNum, taskID, new BootstrapAggFunction());
+LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, 
attemptNum = {}.", taskID, attemptNum);

Review comment:
   yes, you are right,  after fail over `updateGlobalAggregate` function 
return previous accumulator info




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4659:
URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018229037


   
   ## CI report:
   
   * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404)
 
   * cc6512086b494976e154cf2db10597953d3c71d4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4659:
URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018227672


   
   ## CI report:
   
   * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3292) Enable lazy read of log blocks for compaction

2022-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3292:
-
Labels: pull-request-available  (was: )

> Enable lazy read of log blocks for compaction
> -
>
> Key: HUDI-3292
> URL: https://issues.apache.org/jira/browse/HUDI-3292
> Project: Apache Hudi
>  Issue Type: Task
>  Components: compaction
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] nsivabalan opened a new pull request #4661: [HUDI-3292] Enabling lazy read by default for log blocks during compaction

2022-01-20 Thread GitBox


nsivabalan opened a new pull request #4661:
URL: https://github.com/apache/hudi/pull/4661


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan opened a new pull request #4662: [HUDI-3293] Fixing default value for clustering small file config

2022-01-20 Thread GitBox


nsivabalan opened a new pull request #4662:
URL: https://github.com/apache/hudi/pull/4662


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3293) Fix default value for clustering small file size

2022-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3293:
-
Labels: pull-request-available  (was: )

> Fix default value for clustering small file size
> 
>
> Key: HUDI-3293
> URL: https://issues.apache.org/jira/browse/HUDI-3293
> Project: Apache Hudi
>  Issue Type: Task
>  Components: clustering
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3293) Fix default value for clustering small file size

2022-01-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3293:
--
Fix Version/s: 0.11.0

> Fix default value for clustering small file size
> 
>
> Key: HUDI-3293
> URL: https://issues.apache.org/jira/browse/HUDI-3293
> Project: Apache Hudi
>  Issue Type: Task
>  Components: clustering
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3293) Fix default value for clustering small file size

2022-01-20 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3293:
-

 Summary: Fix default value for clustering small file size
 Key: HUDI-3293
 URL: https://issues.apache.org/jira/browse/HUDI-3293
 Project: Apache Hudi
  Issue Type: Task
  Components: clustering
Reporter: sivabalan narayanan






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4660:
URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018226185


   
   ## CI report:
   
   * 590944041ba967d5390e5cc3d9b937226b6705af UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018227649


   
   ## CI report:
   
   * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403)
 
   * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4660:
URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018227679


   
   ## CI report:
   
   * 590944041ba967d5390e5cc3d9b937226b6705af Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5405)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018226144


   
   ## CI report:
   
   * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374)
 
   * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403)
 
   * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4659:
URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018227672


   
   ## CI report:
   
   * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4659:
URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018226166


   
   ## CI report:
   
   * 1ec3d9b036d2a743243dd75556f6eb3492e0126f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4659:
URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018226166


   
   ## CI report:
   
   * 1ec3d9b036d2a743243dd75556f6eb3492e0126f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4660:
URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018226185


   
   ## CI report:
   
   * 590944041ba967d5390e5cc3d9b937226b6705af UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018226144


   
   ## CI report:
   
   * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374)
 
   * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403)
 
   * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018224665


   
   ## CI report:
   
   * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374)
 
   * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3292) Enable lazy read of log blocks for compaction

2022-01-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3292:
--
Fix Version/s: 0.11.0

> Enable lazy read of log blocks for compaction
> -
>
> Key: HUDI-3292
> URL: https://issues.apache.org/jira/browse/HUDI-3292
> Project: Apache Hudi
>  Issue Type: Task
>  Components: compaction
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3292) Enable lazy read of log blocks for compaction

2022-01-20 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3292:
-

 Summary: Enable lazy read of log blocks for compaction
 Key: HUDI-3292
 URL: https://issues.apache.org/jira/browse/HUDI-3292
 Project: Apache Hudi
  Issue Type: Task
  Components: compaction
Reporter: sivabalan narayanan






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018224665


   
   ## CI report:
   
   * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374)
 
   * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018223472


   
   ## CI report:
   
   * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374)
 
   * c44a34bc4a46da4918493ca95967cf0fbddbfe70 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3291) Flip Default record paylod to DefaultHoodieRecordPayload

2022-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3291:
-
Labels: pull-request-available  (was: )

> Flip Default record paylod to DefaultHoodieRecordPayload
> 
>
> Key: HUDI-3291
> URL: https://issues.apache.org/jira/browse/HUDI-3291
> Project: Apache Hudi
>  Issue Type: Task
>  Components: writer-core
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3091) Make simple index as the default hoodie.index.type

2022-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3091:
-
Labels: pull-request-available  (was: )

> Make simple index as the default hoodie.index.type
> --
>
> Key: HUDI-3091
> URL: https://issues.apache.org/jira/browse/HUDI-3091
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: index
>Reporter: Vinoth Govindarajan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> When performing upserts with derived datasets, we often run into an OOM issue 
> with the bloom filter, hence we changed all the dataset index types to simple 
> to resolve the issue.
>  
> Some of the tables were non-partitioned tables for which bloom index is not 
> the right choice.
> I'm proposing to make a simple index as the default value and on case-by-case 
> basics, folks can choose the bloom filter for additional performance gains 
> offered by bloom filters.
>  
> I agree that the performance will not be optimal but for regular use cases 
> simple index would not break and give them sub-optimal read/write performance 
> but it won't break any ingestion/derived jobs.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] nsivabalan opened a new pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload

2022-01-20 Thread GitBox


nsivabalan opened a new pull request #4660:
URL: https://github.com/apache/hudi/pull/4660


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan opened a new pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type

2022-01-20 Thread GitBox


nsivabalan opened a new pull request #4659:
URL: https://github.com/apache/hudi/pull/4659


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3291) Flip Default record paylod to DefaultHoodieRecordPayload

2022-01-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3291:
--
Fix Version/s: 0.11.0

> Flip Default record paylod to DefaultHoodieRecordPayload
> 
>
> Key: HUDI-3291
> URL: https://issues.apache.org/jira/browse/HUDI-3291
> Project: Apache Hudi
>  Issue Type: Task
>  Components: writer-core
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] wxplovecc commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true

2022-01-20 Thread GitBox


wxplovecc commented on a change in pull request #4654:
URL: https://github.com/apache/hudi/pull/4654#discussion_r789394982



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
##
@@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception {
*/
   private void waitForBootstrapReady(int taskID) {
 int taskNum = getRuntimeContext().getNumberOfParallelSubtasks();
+int attemptNum = getRuntimeContext().getAttemptNumber();
 int readyTaskNum = 1;
 while (taskNum != readyTaskNum) {
   try {
-readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new 
BootstrapAggFunction());
-LOG.info("Waiting for other bootstrap tasks to complete, taskId = 
{}.", taskID);
+readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + 
attemptNum, taskID, new BootstrapAggFunction());
+LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, 
attemptNum = {}.", taskID, attemptNum);

Review comment:
   yes, you are right




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3291) Flip Default record paylod to DefaultHoodieRecordPayload

2022-01-20 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3291:
-

 Summary: Flip Default record paylod to DefaultHoodieRecordPayload
 Key: HUDI-3291
 URL: https://issues.apache.org/jira/browse/HUDI-3291
 Project: Apache Hudi
  Issue Type: Task
  Components: writer-core
Reporter: sivabalan narayanan






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018223472


   
   ## CI report:
   
   * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374)
 
   * c44a34bc4a46da4918493ca95967cf0fbddbfe70 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#issuecomment-1017460532


   
   ## CI report:
   
   * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3290) Make the .hoodie-partition-metadata as empty parquet file

2022-01-20 Thread Vinoth Govindarajan (Jira)
Vinoth Govindarajan created HUDI-3290:
-

 Summary: Make the .hoodie-partition-metadata as empty parquet file
 Key: HUDI-3290
 URL: https://issues.apache.org/jira/browse/HUDI-3290
 Project: Apache Hudi
  Issue Type: New Feature
  Components: metadata
Reporter: Vinoth Govindarajan
Assignee: Vinoth Govindarajan


For BigQuery and Snowflake integration, we can't able to create external tables 
when the partition folder has a non-parquet file `.hoodie-partition-metadata`.

I understand this is an important file to find the .hoodie folder from within 
the partition folder, the long term solution is to get rid of this file, but as 
a short term solution if we can convert this to an empty parquet file and add 
the necessary depth information in the footer, then it will pass the 
BigQuery/Snowflake external table validation and allow us to create an external 
parquet table on top of hudi folder structure.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-3278) Make Simple Index the default index type

2022-01-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-3278.
-
Resolution: Duplicate

> Make Simple Index the default index type
> 
>
> Key: HUDI-3278
> URL: https://issues.apache.org/jira/browse/HUDI-3278
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3091) Make simple index as the default hoodie.index.type

2022-01-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3091:
--
Priority: Blocker  (was: Major)

> Make simple index as the default hoodie.index.type
> --
>
> Key: HUDI-3091
> URL: https://issues.apache.org/jira/browse/HUDI-3091
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: index
>Reporter: Vinoth Govindarajan
>Assignee: sivabalan narayanan
>Priority: Blocker
>
> When performing upserts with derived datasets, we often run into an OOM issue 
> with the bloom filter, hence we changed all the dataset index types to simple 
> to resolve the issue.
>  
> Some of the tables were non-partitioned tables for which bloom index is not 
> the right choice.
> I'm proposing to make a simple index as the default value and on case-by-case 
> basics, folks can choose the bloom filter for additional performance gains 
> offered by bloom filters.
>  
> I agree that the performance will not be optimal but for regular use cases 
> simple index would not break and give them sub-optimal read/write performance 
> but it won't break any ingestion/derived jobs.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3091) Make simple index as the default hoodie.index.type

2022-01-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-3091:
-

Assignee: sivabalan narayanan

> Make simple index as the default hoodie.index.type
> --
>
> Key: HUDI-3091
> URL: https://issues.apache.org/jira/browse/HUDI-3091
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: index
>Reporter: Vinoth Govindarajan
>Assignee: sivabalan narayanan
>Priority: Major
>
> When performing upserts with derived datasets, we often run into an OOM issue 
> with the bloom filter, hence we changed all the dataset index types to simple 
> to resolve the issue.
>  
> Some of the tables were non-partitioned tables for which bloom index is not 
> the right choice.
> I'm proposing to make a simple index as the default value and on case-by-case 
> basics, folks can choose the bloom filter for additional performance gains 
> offered by bloom filters.
>  
> I agree that the performance will not be optimal but for regular use cases 
> simple index would not break and give them sub-optimal read/write performance 
> but it won't break any ingestion/derived jobs.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #3866:
URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018214437


   
   ## CI report:
   
   * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN
   * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN
   * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN
   * c047f394e58415a14c5a4070627fd90a7d1106b6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2974)
 
   * 7e96f0f751a745f3a77bed4461099aee2c00f697 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3091) Make simple index as the default hoodie.index.type

2022-01-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3091:
--
Fix Version/s: 0.11.0

> Make simple index as the default hoodie.index.type
> --
>
> Key: HUDI-3091
> URL: https://issues.apache.org/jira/browse/HUDI-3091
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: index
>Reporter: Vinoth Govindarajan
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>
> When performing upserts with derived datasets, we often run into an OOM issue 
> with the bloom filter, hence we changed all the dataset index types to simple 
> to resolve the issue.
>  
> Some of the tables were non-partitioned tables for which bloom index is not 
> the right choice.
> I'm proposing to make a simple index as the default value and on case-by-case 
> basics, folks can choose the bloom filter for additional performance gains 
> offered by bloom filters.
>  
> I agree that the performance will not be optimal but for regular use cases 
> simple index would not break and give them sub-optimal read/write performance 
> but it won't break any ingestion/derived jobs.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-2978) Change default index type to Simple

2022-01-20 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-2978.
-
Resolution: Duplicate

> Change default index type to Simple
> ---
>
> Key: HUDI-2978
> URL: https://issues.apache.org/jira/browse/HUDI-2978
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
>  Labels: release-notes, sev:high
> Fix For: 0.11.0
>
>
> Toady the default index type is Bloom. For the read-update-all workloads, 
> simple index is performant compared to the Bloom index. Better to have Simple 
> as the default index type and choose Bloom only based on workloads.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2022-01-20 Thread GitBox


hudi-bot commented on pull request #3866:
URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018216050


   
   ## CI report:
   
   * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN
   * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN
   * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN
   * 7e96f0f751a745f3a77bed4461099aee2c00f697 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5402)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

2022-01-20 Thread GitBox


danny0405 commented on issue #4658:
URL: https://github.com/apache/hudi/issues/4658#issuecomment-1018215386


   So you use the `upsert` mode right ? And the hoodie table has a pk there ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #3866:
URL: https://github.com/apache/hudi/pull/3866#issuecomment-961589833


   
   ## CI report:
   
   * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN
   * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN
   * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN
   * c047f394e58415a14c5a4070627fd90a7d1106b6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2974)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2022-01-20 Thread GitBox


hudi-bot commented on pull request #3866:
URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018214437


   
   ## CI report:
   
   * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN
   * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN
   * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN
   * c047f394e58415a14c5a4070627fd90a7d1106b6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2974)
 
   * 7e96f0f751a745f3a77bed4461099aee2c00f697 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true

2022-01-20 Thread GitBox


danny0405 commented on a change in pull request #4654:
URL: https://github.com/apache/hudi/pull/4654#discussion_r789387075



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
##
@@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception {
*/
   private void waitForBootstrapReady(int taskID) {
 int taskNum = getRuntimeContext().getNumberOfParallelSubtasks();
+int attemptNum = getRuntimeContext().getAttemptNumber();
 int readyTaskNum = 1;
 while (taskNum != readyTaskNum) {
   try {
-readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new 
BootstrapAggFunction());
-LOG.info("Waiting for other bootstrap tasks to complete, taskId = 
{}.", taskID);
+readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + 
attemptNum, taskID, new BootstrapAggFunction());
+LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, 
attemptNum = {}.", taskID, attemptNum);

Review comment:
   Only when the accumulator received all the task bootstrap info, the 
`readyTaskNum` matches and returns true, does that work for your case ? Because 
the fail over retry does not increase the `readyTaskNum` right ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs

2022-01-20 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380198#comment-17380198
 ] 

sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:13 AM:
-

 -[High Priority] marker based rollback should be on-

 
{code:java}
public static final ConfigProperty ROLLBACK_USING_MARKERS = 
ConfigProperty
 .key("hoodie.rollback.using.markers")
 .defaultValue("false")
 .withDocumentation("Enables a more efficient mechanism for rollbacks based on 
the marker files generated "
 + "during the writes. Turned off by default.");{code}


was (Author: vc):
 [High Priority] marker based rollback should be on

 
{code:java}
public static final ConfigProperty ROLLBACK_USING_MARKERS = 
ConfigProperty
 .key("hoodie.rollback.using.markers")
 .defaultValue("false")
 .withDocumentation("Enables a more efficient mechanism for rollbacks based on 
the marker files generated "
 + "during the writes. Turned off by default.");{code}

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Code Cleanup, docs, writer-core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs

2022-01-20 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380174#comment-17380174
 ] 

sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:11 AM:
-

 [High Priority]Need to ensure this is actually 1, going forward.

 
{code:java}
public static final ConfigProperty 
HOODIE_TABLE_VERSION_PROP = ConfigProperty
 .key("hoodie.table.version")
 .defaultValue(HoodieTableVersion.ZERO)
 .withDocumentation("");{code}
Update: above default value is not used anywhere. default value is picked up 
from 

HoodieTableVersion.current()


was (Author: vc):
 [High Priority]Need to ensure this is actually 1, going forward.

 
{code:java}
public static final ConfigProperty 
HOODIE_TABLE_VERSION_PROP = ConfigProperty
 .key("hoodie.table.version")
 .defaultValue(HoodieTableVersion.ZERO)
 .withDocumentation("");{code}

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Code Cleanup, docs, writer-core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] leesf commented on a change in pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


leesf commented on a change in pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#discussion_r789385383



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##
@@ -17,16 +17,15 @@
 
 package org.apache.hudi
 
+import org.apache.hadoop.fs.{GlobPattern, Path}

Review comment:
   please revert the import change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

2022-01-20 Thread GitBox


leesf commented on a change in pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#discussion_r789385270



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/TableSchemaResolverUtils.java
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.avro.Schema;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+
+public final class TableSchemaResolverUtils {

Review comment:
   we would avoid introducing a new util class and put the util method into 
TableSchemaResolver




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs

2022-01-20 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377665#comment-17377665
 ] 

sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:05 AM:
-

 -[High Priority] Timeline layout version should now be 1-

 
{code:java}
public static final ConfigProperty TIMELINE_LAYOUT_VERSION = 
ConfigProperty
 .key("hoodie.timeline.layout.version"){code}


was (Author: vc):
 [High Priority] Timeline layout version should now be 1

 
{code:java}
public static final ConfigProperty TIMELINE_LAYOUT_VERSION = 
ConfigProperty
 .key("hoodie.timeline.layout.version"){code}

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Code Cleanup, docs, writer-core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs

2022-01-20 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377617#comment-17377617
 ] 

sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:04 AM:
-

-Is file listing parallelism too high?- already set to 200
{code:java}
public static final ConfigProperty FILE_LISTING_PARALLELISM_PROP = 
ConfigProperty
 .key("hoodie.file.listing.parallelism")
 .defaultValue(1500){code}


was (Author: vc):
Is file listing parallelism too high?
{code:java}
public static final ConfigProperty FILE_LISTING_PARALLELISM_PROP = 
ConfigProperty
 .key("hoodie.file.listing.parallelism")
 .defaultValue(1500){code}

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Code Cleanup, docs, writer-core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs

2022-01-20 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377663#comment-17377663
 ] 

sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:04 AM:
-

-[High Priority] Rollback using markers-
{code:java}
public static final ConfigProperty ROLLBACK_USING_MARKERS = 
ConfigProperty
 .key("hoodie.rollback.using.markers")
 .defaultValue("false"){code}


was (Author: vc):
[High Priority] Rollback using markers
{code:java}
public static final ConfigProperty ROLLBACK_USING_MARKERS = 
ConfigProperty
 .key("hoodie.rollback.using.markers")
 .defaultValue("false"){code}

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Code Cleanup, docs, writer-core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] wxplovecc commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true

2022-01-20 Thread GitBox


wxplovecc commented on a change in pull request #4654:
URL: https://github.com/apache/hudi/pull/4654#discussion_r789373079



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
##
@@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception {
*/
   private void waitForBootstrapReady(int taskID) {
 int taskNum = getRuntimeContext().getNumberOfParallelSubtasks();
+int attemptNum = getRuntimeContext().getAttemptNumber();
 int readyTaskNum = 1;
 while (taskNum != readyTaskNum) {
   try {
-readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new 
BootstrapAggFunction());
-LOG.info("Waiting for other bootstrap tasks to complete, taskId = 
{}.", taskID);
+readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + 
attemptNum, taskID, new BootstrapAggFunction());
+LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, 
attemptNum = {}.", taskID, attemptNum);

Review comment:
   Ok,once flink job with index.bootstrap=true failed like taskmanager lost
   if the job restart with the same GlobalAggregate name, it will reuse the 
`accumulators` in JobMaster
   and then, some parallelism of BootstrapOperator that faster then others will 
send records downstream
   without wait for all bootstrap task done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-3242) Checkpoint 0 is ignored -Partial parquet file discovery after the first commit

2022-01-20 Thread Harsha Teja Kanna (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479835#comment-17479835
 ] 

Harsha Teja Kanna commented on HUDI-3242:
-

Input: monthly partitions

partition=2021/01

file1_1 - timestamp1
file2_1 - timestamp2
file3_1 - timestamp3

partition=2021/02

file1_2 - timestamp1
file2_2 - timestamp2
file3_2 - timestamp3

Now I want to run Deltastreamer partition after partition to create Hudi table

> Checkpoint 0 is ignored -Partial parquet file discovery after the first commit
> --
>
> Key: HUDI-3242
> URL: https://issues.apache.org/jira/browse/HUDI-3242
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark, writer-core
>Affects Versions: 0.10.1
> Environment: AWS
> EMR 6.4.0
> Spark 3.1.2
> Hudi - 0.10.1-rc
>Reporter: Harsha Teja Kanna
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: hudi-on-call, sev:critical, user-support-issues
> Attachments: Screen Shot 2022-01-13 at 2.40.55 AM.png, Screen Shot 
> 2022-01-13 at 2.55.35 AM.png, Screen Shot 2022-01-20 at 1.36.48 PM.png
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Hi, I am testing release branch 0.10.1 as I needed few bug fixes from it.
> However, I see for a certain table. Only partial discovery of files happening 
> after the initial commit of the table.
> But if the second partition is given as input for the first commit, all the 
> files are getting discovered.
> First partition : 2021/01 has 744 files and all of them are discovered
> Second partition: 2021/02 has 762 files but only 72 are discovered.
> Checkpoint is set to 0. 
> No errors in the logs.
> {code:java}
> spark-submit \
> --master yarn \
> --deploy-mode cluster \
> --driver-cores 30 \
> --driver-memory 32g \
> --executor-cores 5 \
> --executor-memory 32g \
> --num-executors 120 \
> --jars 
> s3://bucket/apps/datalake/jars/unused-1.0.0.jar,s3://bucket/apps/datalake/jars/spark-avro_2.12-3.1.2.jar
>  \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer 
> s3://bucket/apps/datalake/jars/hudi-0.10.0/hudi-utilities-bundle_2.12-0.10.0.jar
>  \
> --table-type COPY_ON_WRITE \
> --source-ordering-field timestamp \
> --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
> --target-base-path s3a://datalake-hudi/datastream/v1/sessions_by_date \
> --target-table sessions_by_date \
> --transformer-class 
> org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
> --op INSERT \
> --checkpoint 0 \
> --hoodie-conf hoodie.clean.automatic=true \
> --hoodie-conf hoodie.cleaner.commits.retained=1 \
> --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS \
> --hoodie-conf hoodie.clustering.inline=false \
> --hoodie-conf hoodie.clustering.inline.max.commits=1 \
> --hoodie-conf 
> hoodie.clustering.plan.strategy.class=org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy
>  \
> --hoodie-conf hoodie.clustering.plan.strategy.max.num.groups=100 \
> --hoodie-conf hoodie.clustering.plan.strategy.small.file.limit=25000 \
> --hoodie-conf hoodie.clustering.plan.strategy.sort.columns=sid,id \
> --hoodie-conf hoodie.clustering.plan.strategy.target.file.max.bytes=268435456 
> \
> --hoodie-conf hoodie.clustering.preserve.commit.metadata=true \
> --hoodie-conf hoodie.datasource.hive_sync.database=datalake-hudi \
> --hoodie-conf hoodie.datasource.hive_sync.enable=false \
> --hoodie-conf hoodie.datasource.hive_sync.ignore_exceptions=true \
> --hoodie-conf hoodie.datasource.hive_sync.mode=hms \
> --hoodie-conf 
> hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.HiveStylePartitionValueExtractor
>  \
> --hoodie-conf hoodie.datasource.hive_sync.table=sessions_by_date \
> --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false \
> --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \
> --hoodie-conf 
> hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
>  \
> --hoodie-conf hoodie.datasource.write.operation=insert \
> --hoodie-conf hoodie.datasource.write.partitionpath.field=date:TIMESTAMP \
> --hoodie-conf hoodie.datasource.write.precombine.field=timestamp \
> --hoodie-conf hoodie.datasource.write.recordkey.field=id,qid,aid \
> --hoodie-conf 
> hoodie.deltastreamer.keygen.timebased.input.dateformat=/MM/dd \
> --hoodie-conf hoodie.deltastreamer.keygen.timebased.input.timezone=GMT \
> --hoodie-conf 
> hoodie.deltastreamer.keygen.timebased.output.dateformat=/MM/dd \
> --hoodie-conf hoodie.deltastreamer.keygen.timebased.output.timezone=GMT \
> --hoodie-conf 
> hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING \
> --hoodie-con

[jira] [Commented] (HUDI-3242) Checkpoint 0 is ignored -Partial parquet file discovery after the first commit

2022-01-20 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479833#comment-17479833
 ] 

sivabalan narayanan commented on HUDI-3242:
---

I don't understand this statement of yours "Also few of my unloaded datasets 
have non linear timestamps across partitions and I create the hudi table 
partition after partition and set checkpoint to 0." sorry. can you please 
clarify.

I am trying to understand whats your intention to explicitly set checkpoint 
value to 0? 

> Checkpoint 0 is ignored -Partial parquet file discovery after the first commit
> --
>
> Key: HUDI-3242
> URL: https://issues.apache.org/jira/browse/HUDI-3242
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark, writer-core
>Affects Versions: 0.10.1
> Environment: AWS
> EMR 6.4.0
> Spark 3.1.2
> Hudi - 0.10.1-rc
>Reporter: Harsha Teja Kanna
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: hudi-on-call, sev:critical, user-support-issues
> Attachments: Screen Shot 2022-01-13 at 2.40.55 AM.png, Screen Shot 
> 2022-01-13 at 2.55.35 AM.png, Screen Shot 2022-01-20 at 1.36.48 PM.png
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Hi, I am testing release branch 0.10.1 as I needed few bug fixes from it.
> However, I see for a certain table. Only partial discovery of files happening 
> after the initial commit of the table.
> But if the second partition is given as input for the first commit, all the 
> files are getting discovered.
> First partition : 2021/01 has 744 files and all of them are discovered
> Second partition: 2021/02 has 762 files but only 72 are discovered.
> Checkpoint is set to 0. 
> No errors in the logs.
> {code:java}
> spark-submit \
> --master yarn \
> --deploy-mode cluster \
> --driver-cores 30 \
> --driver-memory 32g \
> --executor-cores 5 \
> --executor-memory 32g \
> --num-executors 120 \
> --jars 
> s3://bucket/apps/datalake/jars/unused-1.0.0.jar,s3://bucket/apps/datalake/jars/spark-avro_2.12-3.1.2.jar
>  \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer 
> s3://bucket/apps/datalake/jars/hudi-0.10.0/hudi-utilities-bundle_2.12-0.10.0.jar
>  \
> --table-type COPY_ON_WRITE \
> --source-ordering-field timestamp \
> --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
> --target-base-path s3a://datalake-hudi/datastream/v1/sessions_by_date \
> --target-table sessions_by_date \
> --transformer-class 
> org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
> --op INSERT \
> --checkpoint 0 \
> --hoodie-conf hoodie.clean.automatic=true \
> --hoodie-conf hoodie.cleaner.commits.retained=1 \
> --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS \
> --hoodie-conf hoodie.clustering.inline=false \
> --hoodie-conf hoodie.clustering.inline.max.commits=1 \
> --hoodie-conf 
> hoodie.clustering.plan.strategy.class=org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy
>  \
> --hoodie-conf hoodie.clustering.plan.strategy.max.num.groups=100 \
> --hoodie-conf hoodie.clustering.plan.strategy.small.file.limit=25000 \
> --hoodie-conf hoodie.clustering.plan.strategy.sort.columns=sid,id \
> --hoodie-conf hoodie.clustering.plan.strategy.target.file.max.bytes=268435456 
> \
> --hoodie-conf hoodie.clustering.preserve.commit.metadata=true \
> --hoodie-conf hoodie.datasource.hive_sync.database=datalake-hudi \
> --hoodie-conf hoodie.datasource.hive_sync.enable=false \
> --hoodie-conf hoodie.datasource.hive_sync.ignore_exceptions=true \
> --hoodie-conf hoodie.datasource.hive_sync.mode=hms \
> --hoodie-conf 
> hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.HiveStylePartitionValueExtractor
>  \
> --hoodie-conf hoodie.datasource.hive_sync.table=sessions_by_date \
> --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false \
> --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \
> --hoodie-conf 
> hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
>  \
> --hoodie-conf hoodie.datasource.write.operation=insert \
> --hoodie-conf hoodie.datasource.write.partitionpath.field=date:TIMESTAMP \
> --hoodie-conf hoodie.datasource.write.precombine.field=timestamp \
> --hoodie-conf hoodie.datasource.write.recordkey.field=id,qid,aid \
> --hoodie-conf 
> hoodie.deltastreamer.keygen.timebased.input.dateformat=/MM/dd \
> --hoodie-conf hoodie.deltastreamer.keygen.timebased.input.timezone=GMT \
> --hoodie-conf 
> hoodie.deltastreamer.keygen.timebased.output.dateformat=/MM/dd \
> --hoodie-conf hoodie.deltastreamer.keygen.timebased.output.timezone=GMT \
> --hoodie-conf 
> hoodie.deltastreamer.k

[GitHub] [hudi] YannByron commented on pull request #4644: [HUDI-3282] Fix delete exception for Spark SQL when sync Hive

2022-01-20 Thread GitBox


YannByron commented on pull request #4644:
URL: https://github.com/apache/hudi/pull/4644#issuecomment-1018191017


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #3929: [HUDI-1881] Make multi table delta streamer to use thread pool for table sync asynchronously.

2022-01-20 Thread GitBox


nsivabalan commented on a change in pull request #3929:
URL: https://github.com/apache/hudi/pull/3929#discussion_r789367919



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java
##
@@ -378,16 +383,23 @@ private static String resetTarget(Config configuration, 
String database, String
   /**
* Creates actual HoodieDeltaStreamer objects for every table/topic and does 
incremental sync.
*/
-  public void sync() {
-for (TableExecutionContext context : tableExecutionContexts) {
-  try {
-new HoodieDeltaStreamer(context.getConfig(), jssc, 
Option.ofNullable(context.getProperties())).sync();
-successTables.add(Helpers.getTableWithDatabase(context));
-  } catch (Exception e) {
-logger.error("error while running MultiTableDeltaStreamer for table: " 
+ context.getTableName(), e);
-failedTables.add(Helpers.getTableWithDatabase(context));
-  }
-}
+  public void sync() throws InterruptedException {
+ExecutorService executorService = 
Executors.newFixedThreadPool(tableExecutionContexts.size());
+tableExecutionContexts.forEach(context -> {
+  executorService.execute(new Runnable() {
+@Override
+public void run() {
+  try {
+new HoodieDeltaStreamer(context.getConfig(), jssc, 
Option.ofNullable(context.getProperties())).sync();
+successTables.add(Helpers.getTableWithDatabase(context));
+  } catch (Exception e) {
+logger.error("error while running MultiTableDeltaStreamer for 
table: " + context.getTableName(), e);
+failedTables.add(Helpers.getTableWithDatabase(context));
+  }
+}
+  });
+});
+executorService.shutdown();

Review comment:
   should we add awaitTermination here ? we can't proceed w/ next batch 
until all tables in current batch is completed right? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …

2022-01-20 Thread GitBox


nsivabalan commented on a change in pull request #4645:
URL: https://github.com/apache/hudi/pull/4645#discussion_r789365060



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java
##
@@ -370,50 +441,124 @@ public static void main(String[] args) throws 
IOException {
   private static String resetTarget(Config configuration, String database, 
String tableName) {
 String basePathPrefix = configuration.basePathPrefix;
 basePathPrefix = basePathPrefix.charAt(basePathPrefix.length() - 1) == '/' 
? basePathPrefix.substring(0, basePathPrefix.length() - 1) : basePathPrefix;
-String targetBasePath = basePathPrefix + Constants.FILE_DELIMITER + 
database + Constants.FILE_DELIMITER + tableName;
-configuration.targetTableName = database + Constants.DELIMITER + tableName;
+String targetBasePath = basePathPrefix + Constants.PATH_SEPARATOR + 
database + Constants.PATH_SEPARATOR + tableName;
+configuration.targetTableName = database + Constants.PATH_CUR_DIR + 
tableName;
 return targetBasePath;
   }
 
   /**
* Creates actual HoodieDeltaStreamer objects for every table/topic and does 
incremental sync.
*/
   public void sync() {
+List hdsObjectList = new ArrayList<>();
+
+// The sync function is not executed when multiple sources update the same 
target.

Review comment:
   probably we can have a big if else  blocks for single source vs multiple 
sources for one hudi table. would be easy to reason about and maintain. 
   existing code will go into if block and new code for multiple source will go 
into else block. 

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java
##
@@ -370,50 +441,124 @@ public static void main(String[] args) throws 
IOException {
   private static String resetTarget(Config configuration, String database, 
String tableName) {
 String basePathPrefix = configuration.basePathPrefix;
 basePathPrefix = basePathPrefix.charAt(basePathPrefix.length() - 1) == '/' 
? basePathPrefix.substring(0, basePathPrefix.length() - 1) : basePathPrefix;
-String targetBasePath = basePathPrefix + Constants.FILE_DELIMITER + 
database + Constants.FILE_DELIMITER + tableName;
-configuration.targetTableName = database + Constants.DELIMITER + tableName;
+String targetBasePath = basePathPrefix + Constants.PATH_SEPARATOR + 
database + Constants.PATH_SEPARATOR + tableName;
+configuration.targetTableName = database + Constants.PATH_CUR_DIR + 
tableName;
 return targetBasePath;
   }
 
   /**
* Creates actual HoodieDeltaStreamer objects for every table/topic and does 
incremental sync.
*/
   public void sync() {
+List hdsObjectList = new ArrayList<>();
+
+// The sync function is not executed when multiple sources update the same 
target.
 for (TableExecutionContext context : tableExecutionContexts) {
   try {
-new HoodieDeltaStreamer(context.getConfig(), jssc, 
Option.ofNullable(context.getProperties())).sync();
+HoodieDeltaStreamer hds = new HoodieDeltaStreamer(context.getConfig(), 
jssc, Option.ofNullable(context.getProperties()));
+
+// Add object of HoodieDeltaStreamer temporarily to hdsObjectList when 
multiple sources update the same target.
+if 
(!StringUtils.isNullOrEmpty(context.getProperties().getProperty(Constants.SOURCES_TO_BE_BOUND)))
 {
+  hdsObjectList.add(hds);
+  continue;
+}
+
+hds.sync();
 successTables.add(Helpers.getTableWithDatabase(context));
   } catch (Exception e) {
-logger.error("error while running MultiTableDeltaStreamer for table: " 
+ context.getTableName(), e);
+logger.error("Error while running MultiTableDeltaStreamer for table: " 
+ context.getTableName(), e);
 failedTables.add(Helpers.getTableWithDatabase(context));
   }
 }
 
-logger.info("Ingestion was successful for topics: " + successTables);
-if (!failedTables.isEmpty()) {
-  logger.info("Ingestion failed for topics: " + failedTables);
+// If hdsObjectList is empty, it indicates that all source sync operations 
have been completed. In this case, directly return.
+if (hdsObjectList.isEmpty()) {
+  logger.info("Ingestion was successful for topics: " + successTables);
+  if (!failedTables.isEmpty()) {
+logger.info("Ingestion failed for topics: " + failedTables);
+  }
+  return;
 }
+
+// The sync function is executing here when multiple sources update the 
same target.
+boolean isContinuousMode = hdsObjectList.get(0).cfg.continuousMode;

Review comment:
   I guess we need to move this to L488 as 
   ```
   boolean isContinuousMode = hdsObjectList.get(i).cfg.continuousMode;
   ```
   essentially we can't have continuous mode enabled for any tables right.
   

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/ut

[GitHub] [hudi] Guanpx opened a new issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)

2022-01-20 Thread GitBox


Guanpx opened a new issue #4658:
URL: https://github.com/apache/hudi/issues/4658


   **Describe the problem you faced**
   
   Data lose with Flink write COW insert table, Flink web UI show Records 
Received was different with HIVE count(1) 
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Flink write and sync to hive
   2. and face that Flink web UI show Records Received was different with 
HIVE(impala) count(1) 
   
   **Expected behavior**
   
   
![image](https://user-images.githubusercontent.com/29246713/150461634-237e705c-1bff-4183-bf8a-be7222b7d917.png)
   ![Uploading image.png…]()
   
   **Environment Description**
   
   * Hudi version : 0.10.0
   
   * Flink version : 1.13.2
   
   * Hive version : 2.1.1-cdh6
   
   * Hadoop version : 3.0.0-cdh6
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Flink writre config
   
   ```
   'connector' = 'hudi',
 'path' = 'hdfs://nameservice-ha/hudi/rds/event_log_origin',
 'table.type' = 'COPY_ON_WRITE',
   
 'hoodie.datasource.write.recordkey.field' = 'distinct_id',  
   
 'hive_sync.enable'='true', 
 'hive_sync.table'='hudi_event_log_origin',
 'hive_sync.db'='default', 
 'hive_sync.mode' = 'hms', 
 'hive_sync.metastore.uris' = '',   
 'hive_sync.skip_ro_suffix' = 'true',   
   
 'hoodie.datasource.write.operation' = 'insert',-- append模式
 'write.tasks' = '2',   
 'write.bucket_assign.tasks' = '2',  
 'write.insert.cluster' = 'true',  
 'write.ignore.failed' = 'false',   
 'clean.async.enabled' = 'true',  
 'clean.retain_commits' = '4',
 'archive.min_commits' = '6',   
 'archive.max_commits' = '12',  
 'hoodie.cleaner.commits.retained' = '4',
 'hoodie.keep.min.commits' = '5',
 'hoodie.keep.max.commits' = '10'
   ```
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-2563) Refactor CompactionTriggerStrategy.

2022-01-20 Thread RocMarshal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

RocMarshal closed HUDI-2563.

Resolution: Abandoned

> Refactor CompactionTriggerStrategy.
> ---
>
> Key: HUDI-2563
> URL: https://issues.apache.org/jira/browse/HUDI-2563
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli, compaction, writer-core
>Reporter: RocMarshal
>Assignee: RocMarshal
>Priority: Minor
>  Labels: pull-request-available
>
>  
>  # Replace conditional in ScheduleCompactionActionExecutor with polymorphsim 
> of CompactionTriggerStrategy class.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4559:
URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018129948


   
   ## CI report:
   
   * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN
   * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350)
 
   * 0aa3cea08224b3a86843251ec43ffd5e22e086ed Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5401)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4559:
URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018160149


   
   ## CI report:
   
   * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN
   * 0aa3cea08224b3a86843251ec43ffd5e22e086ed Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5401)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception

2022-01-20 Thread GitBox


VIKASPATID commented on issue #4635:
URL: https://github.com/apache/hudi/issues/4635#issuecomment-1018159576


   Bulk Write is running without any failures with single writer, but we want 
to write bunch of files, so we need multi writer to decrease total write time. 
Is there anything we are missing for multi writer or any way to fix it for 
multi writer ?
   That's all we have in the stack trace.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true

2022-01-20 Thread GitBox


danny0405 commented on a change in pull request #4654:
URL: https://github.com/apache/hudi/pull/4654#discussion_r789339843



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
##
@@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception {
*/
   private void waitForBootstrapReady(int taskID) {
 int taskNum = getRuntimeContext().getNumberOfParallelSubtasks();
+int attemptNum = getRuntimeContext().getAttemptNumber();
 int readyTaskNum = 1;
 while (taskNum != readyTaskNum) {
   try {
-readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new 
BootstrapAggFunction());
-LOG.info("Waiting for other bootstrap tasks to complete, taskId = 
{}.", taskID);
+readyTaskNum = 
aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + 
attemptNum, taskID, new BootstrapAggFunction());
+LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, 
attemptNum = {}.", taskID, attemptNum);

Review comment:
   Hello, can you explain why we need this change ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] cdmikechen commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2022-01-20 Thread GitBox


cdmikechen commented on pull request #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1018140612


   @lucasmo  
   You can try this pr, but it looks like there are some conflicts after I push 
this commit. I will resolve the conflicts later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4559:
URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018129948


   
   ## CI report:
   
   * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN
   * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350)
 
   * 0aa3cea08224b3a86843251ec43ffd5e22e086ed Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5401)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4559:
URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018121606


   
   ## CI report:
   
   * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN
   * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350)
 
   * 0aa3cea08224b3a86843251ec43ffd5e22e086ed UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4559:
URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018121606


   
   ## CI report:
   
   * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN
   * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350)
 
   * 0aa3cea08224b3a86843251ec43ffd5e22e086ed UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4559:
URL: https://github.com/apache/hudi/pull/4559#issuecomment-1016940234


   
   ## CI report:
   
   * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN
   * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-3221) Support querying a table as of a savepoint

2022-01-20 Thread Forward Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479792#comment-17479792
 ] 

Forward Xu commented on HUDI-3221:
--

hi [~fedsp]  Thanks

> Support querying a table as of a savepoint
> --
>
> Key: HUDI-3221
> URL: https://issues.apache.org/jira/browse/HUDI-3221
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: hive, reader-core, spark, writer-core
>Reporter: Ethan Guo
>Assignee: Forward Xu
>Priority: Blocker
>  Labels: user-support-issues
> Fix For: 0.11.0
>
>
> Right now point-in-time queries are limited to what's retained by the 
> cleaner. If we fix this and expose via SQL, then it's a gap we close.
> Dataframe read path support this option but not for SQL read path
> https://hudi.apache.org/docs/quick-start-guide/#time-travel-query



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[hudi] branch asf-site updated: [MINOR] [DOCS] fix a typo in Spark quick start example (#4657)

2022-01-20 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0368b03  [MINOR] [DOCS] fix a typo in Spark quick start example (#4657)
0368b03 is described below

commit 0368b038c34fac0e13a575c4ed3696914baaf6cc
Author: 董可伦 
AuthorDate: Fri Jan 21 10:52:03 2022 +0800

[MINOR] [DOCS] fix a typo in Spark quick start example (#4657)
---
 website/docs/quick-start-guide.md  | 2 +-
 website/versioned_docs/version-0.10.0/quick-start-guide.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/website/docs/quick-start-guide.md 
b/website/docs/quick-start-guide.md
index 619f773..685492e 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -258,7 +258,7 @@ create table hudi_mor_tbl (
   ts bigint
 ) using hudi
 tblproperties (
-  type = 'cow',
+  type = 'mor',
   primaryKey = 'id',
   preCombineField = 'ts'
 );
diff --git a/website/versioned_docs/version-0.10.0/quick-start-guide.md 
b/website/versioned_docs/version-0.10.0/quick-start-guide.md
index 7550712..e3f3844 100644
--- a/website/versioned_docs/version-0.10.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.10.0/quick-start-guide.md
@@ -258,7 +258,7 @@ create table hudi_mor_tbl (
   ts bigint
 ) using hudi
 tblproperties (
-  type = 'cow',
+  type = 'mor',
   primaryKey = 'id',
   preCombineField = 'ts'
 );


[GitHub] [hudi] xushiyan merged pull request #4657: [MINOR] [DOCS] fix a typo in Spark quick start example

2022-01-20 Thread GitBox


xushiyan merged pull request #4657:
URL: https://github.com/apache/hudi/pull/4657


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2941) Show _hoodie_operation in spark sql results

2022-01-20 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2941:
-
Status: In Progress  (was: Open)

> Show _hoodie_operation in spark sql results
> ---
>
> Key: HUDI-2941
> URL: https://issues.apache.org/jira/browse/HUDI-2941
> Project: Apache Hudi
>  Issue Type: Task
>  Components: spark-sql
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: hudi-on-call, pull-request-available, sev:critical, 
> user-support-issues
> Fix For: 0.11.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Details in
> [https://github.com/apache/hudi/issues/4160]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2941) Show _hoodie_operation in spark sql results

2022-01-20 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2941:
-
Epic Link: HUDI-1658

> Show _hoodie_operation in spark sql results
> ---
>
> Key: HUDI-2941
> URL: https://issues.apache.org/jira/browse/HUDI-2941
> Project: Apache Hudi
>  Issue Type: Task
>  Components: spark-sql
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: hudi-on-call, pull-request-available, sev:critical, 
> user-support-issues
> Fix For: 0.11.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Details in
> [https://github.com/apache/hudi/issues/4160]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2941) Show _hoodie_operation in spark sql results

2022-01-20 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2941:
-
Sprint: Cont' improve -  2021/01/10, Cont' improve -  2021/01/18  (was: 
Cont' improve -  2021/01/10, Cont' improve -  2021/01/24)

> Show _hoodie_operation in spark sql results
> ---
>
> Key: HUDI-2941
> URL: https://issues.apache.org/jira/browse/HUDI-2941
> Project: Apache Hudi
>  Issue Type: Task
>  Components: spark-sql
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: hudi-on-call, pull-request-available, sev:critical, 
> user-support-issues
> Fix For: 0.11.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Details in
> [https://github.com/apache/hudi/issues/4160]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4645:
URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018114106


   
   ## CI report:
   
   * ee9f2eaa28c5836977ea980a1d50b1d65ce342ef Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5380)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4645:
URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018115438


   
   ## CI report:
   
   * ee9f2eaa28c5836977ea980a1d50b1d65ce342ef Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5380)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …

2022-01-20 Thread GitBox


hudi-bot commented on pull request #4645:
URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018114106


   
   ## CI report:
   
   * ee9f2eaa28c5836977ea980a1d50b1d65ce342ef Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5380)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun opened a new pull request #4657: [MINOR] fix typos

2022-01-20 Thread GitBox


dongkelun opened a new pull request #4657:
URL: https://github.com/apache/hudi/pull/4657


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …

2022-01-20 Thread GitBox


hudi-bot removed a comment on pull request #4645:
URL: https://github.com/apache/hudi/pull/4645#issuecomment-1017556536


   
   ## CI report:
   
   * ee9f2eaa28c5836977ea980a1d50b1d65ce342ef Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5380)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] watermelon12138 commented on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …

2022-01-20 Thread GitBox


watermelon12138 commented on pull request #4645:
URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018113861


   @hudi-bot run azure re-run the last Azure build


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #4643: [HUDI-3281][Performance]Tuning performance of getAllPartitionPaths API in FileSystemBackedTableMetadata

2022-01-20 Thread GitBox


zhangyue19921010 commented on pull request #4643:
URL: https://github.com/apache/hudi/pull/4643#issuecomment-1018100236


   @nsivabalan Thanks a lot for your help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   >