[GitHub] [hudi] hudi-bot commented on pull request #4187: [HUDI-2912] Fix CompactionPlanOperator typo

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4187:
URL: https://github.com/apache/hudi/pull/4187#issuecomment-984366349


   
   ## CI report:
   
   * 548a418f486494091e1001aec0a733fd89ca8fbe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3937)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4187: [HUDI-2912] Fix CompactionPlanOperator typo

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4187:
URL: https://github.com/apache/hudi/pull/4187#issuecomment-984335994


   
   ## CI report:
   
   * 548a418f486494091e1001aec0a733fd89ca8fbe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3937)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4189: [HUDI-2913] Disable auto clean in writer task

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4189:
URL: https://github.com/apache/hudi/pull/4189#issuecomment-984348912


   
   ## CI report:
   
   * fa44ae507c0b013f0ebe69c58d06bc25f2bfe6ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3938)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4189: [HUDI-2913] Disable auto clean in writer task

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4189:
URL: https://github.com/apache/hudi/pull/4189#issuecomment-984347424


   
   ## CI report:
   
   * fa44ae507c0b013f0ebe69c58d06bc25f2bfe6ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4189: [HUDI-2913] Disable auto clean in writer task

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4189:
URL: https://github.com/apache/hudi/pull/4189#issuecomment-984347424


   
   ## CI report:
   
   * fa44ae507c0b013f0ebe69c58d06bc25f2bfe6ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #4181: [HUDI-2900] Fix corrupt block end position

2021-12-01 Thread GitBox


danny0405 commented on a change in pull request #4181:
URL: https://github.com/apache/hudi/pull/4181#discussion_r760809572



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##
@@ -284,7 +284,7 @@ private HoodieLogBlock createCorruptBlock() throws 
IOException {
 long contentPosition = inputStream.getPos();
 byte[] corruptedBytes = HoodieLogBlock.readOrSkipContent(inputStream, 
corruptedBlockSize, readBlockLazily);
 return HoodieCorruptBlock.getBlock(logFile, inputStream, 
Option.ofNullable(corruptedBytes), readBlockLazily,
-contentPosition, corruptedBlockSize, corruptedBlockSize, new 
HashMap<>(), new HashMap<>());
+contentPosition, corruptedBlockSize, nextBlockOffset - 1, new 
HashMap<>(), new HashMap<>());
   }

Review comment:
   Should be nextBlockOffset directly ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2913) Disable auto clean in writer task

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2913:
-
Labels: pull-request-available  (was: )

> Disable auto clean in writer task
> -
>
> Key: HUDI-2913
> URL: https://issues.apache.org/jira/browse/HUDI-2913
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yuzhaojing opened a new pull request #4189: [HUDI-2913] Disable auto clean in writer task

2021-12-01 Thread GitBox


yuzhaojing opened a new pull request #4189:
URL: https://github.com/apache/hudi/pull/4189


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] h7kanna opened a new issue #4188: [SUPPORT] NullPointerException in HoodieROTablePathFilter Hudi 0.10.0

2021-12-01 Thread GitBox


h7kanna opened a new issue #4188:
URL: https://github.com/apache/hudi/issues/4188


   **Describe the problem you faced**
   
   NullPointerException in HoodieROTablePathFilter while querying Hudi table 
using 0.10.0 that is working with 0.9.0
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create COW table date partitioned
   2. Query for 2 months worth of partitions
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.10.0
   
   * Spark version : 3.1.2
   
   * Hive version :
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Stacktrace**
   
   Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
to stage failure: Task 34 in stage 0.0 failed 10 times, most recent failure: 
Lost task 34.9 in stage 0.0 (TID 336) ( executor 7): 
org.apache.hudi.exception.HoodieException: Error checking path 
:s3a:///completetime=2021/10/10/50c3f98a-bf59-45d1-a01e-602f42f13ed9-0_651-10-4279_20211202004113620.parquet,
 under folder: s3a:///completetime=2021/10/10
at 
org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:230)
at 
org.apache.spark.sql.execution.datasources.PathFilterWrapper.accept(InMemoryFileIndex.scala:227)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$listLeafFiles$8(HadoopFSUtils.scala:318)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$listLeafFiles$8$adapted(HadoopFSUtils.scala:318)
at 
scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:256)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at 
scala.collection.TraversableLike.filterImpl(TraversableLike.scala:255)
at 
scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249)
at 
scala.collection.mutable.ArrayOps$ofRef.filterImpl(ArrayOps.scala:198)
at scala.collection.TraversableLike.filter(TraversableLike.scala:347)
at scala.collection.TraversableLike.filter$(TraversableLike.scala:347)
at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:198)
at 
org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:318)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:138)
at scala.collection.immutable.Stream.map(Stream.scala:418)
at 
org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:128)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.NullPointerException
at 
org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:185)
... 33 more
   
   Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2465)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2414)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2413)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2413)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)

[GitHub] [hudi] hudi-bot commented on pull request #4179: Fix HoodieSqlUtils.formatQueryInstant timestamp variable bug

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4179:
URL: https://github.com/apache/hudi/pull/4179#issuecomment-984341438


   
   ## CI report:
   
   * ffdf5ee6c364d06cf3dd40f523b36c4eadad24eb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3919)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3936)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4179: Fix HoodieSqlUtils.formatQueryInstant timestamp variable bug

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4179:
URL: https://github.com/apache/hudi/pull/4179#issuecomment-984313491


   
   ## CI report:
   
   * ffdf5ee6c364d06cf3dd40f523b36c4eadad24eb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3919)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3936)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2913) Disable auto clean in writer task

2021-12-01 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2913:


 Summary: Disable auto clean in writer task
 Key: HUDI-2913
 URL: https://issues.apache.org/jira/browse/HUDI-2913
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4186: [HUDI-2904] WIP Fix metadata archive issues

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4186:
URL: https://github.com/apache/hudi/pull/4186#issuecomment-984337129


   
   ## CI report:
   
   * 8a185b85ce1f42b5bdec94ba676abc44ce0defa4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3935)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4186: [HUDI-2904] WIP Fix metadata archive issues

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4186:
URL: https://github.com/apache/hudi/pull/4186#issuecomment-984311498


   
   ## CI report:
   
   * 8a185b85ce1f42b5bdec94ba676abc44ce0defa4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3935)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4187: [HUDI-2912] Fix CompactionPlanOperator typo

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4187:
URL: https://github.com/apache/hudi/pull/4187#issuecomment-984334873


   
   ## CI report:
   
   * 548a418f486494091e1001aec0a733fd89ca8fbe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4187: [HUDI-2912] Fix CompactionPlanOperator typo

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4187:
URL: https://github.com/apache/hudi/pull/4187#issuecomment-984335994


   
   ## CI report:
   
   * 548a418f486494091e1001aec0a733fd89ca8fbe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3937)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4187: [HUDI-2912] Fix CompactionPlanOperator typo

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4187:
URL: https://github.com/apache/hudi/pull/4187#issuecomment-984334873


   
   ## CI report:
   
   * 548a418f486494091e1001aec0a733fd89ca8fbe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2912) Fix CompactionPlanOperator typo

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2912:
-
Labels: pull-request-available  (was: )

> Fix CompactionPlanOperator typo
> ---
>
> Key: HUDI-2912
> URL: https://issues.apache.org/jira/browse/HUDI-2912
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yuzhaojing opened a new pull request #4187: [HUDI-2912] Fix CompactionPlanOperator typo

2021-12-01 Thread GitBox


yuzhaojing opened a new pull request #4187:
URL: https://github.com/apache/hudi/pull/4187


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2912) Fix CompactionPlanOperator typo

2021-12-01 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2912:


 Summary: Fix CompactionPlanOperator typo
 Key: HUDI-2912
 URL: https://issues.apache.org/jira/browse/HUDI-2912
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] Gatsby-Lee commented on issue #2544: [SUPPORT]failed to read timestamp column in version 0.7.0 even when HIVE_SUPPORT_TIMESTAMP is enabled

2021-12-01 Thread GitBox


Gatsby-Lee commented on issue #2544:
URL: https://github.com/apache/hudi/issues/2544#issuecomment-984330238


   @codope
   
   Can you tell me where I can find the commit for this fix?
   And, do you know if there is any downside of setting this config?
   "hoodie.datasource.hive_sync.support_timestamp": "true",
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Gatsby-Lee commented on issue #2509: [SUPPORT] Hudi Spark DataSource saves TimestampType as bigInt

2021-12-01 Thread GitBox


Gatsby-Lee commented on issue #2509:
URL: https://github.com/apache/hudi/issues/2509#issuecomment-984328231


   AWS Glue3 
   + Spark: 3.1.1-amzn-0
   + Hive: 2.3.7-amzn-4
   + Hudi: 0.9
   
   I had this issue.
   Although I can see timestamp type, the type I see through AWS Athena was 
bigint.
   
   I was able to handle this issue by setting this value when I insert data.
   "hoodie.datasource.hive_sync.support_timestamp": "true"
   
   But, I am not sure if there is any downside of setting this value to true.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #3671: [HUDI-2418] add HiveSchemaProvider

2021-12-01 Thread GitBox


yanghua commented on pull request #3671:
URL: https://github.com/apache/hudi/pull/3671#issuecomment-984318178


   sorry for the late reply. Will have a final check soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on pull request #4179: Fix HoodieSqlUtils.formatQueryInstant timestamp variable bug

2021-12-01 Thread GitBox


leesf commented on pull request #4179:
URL: https://github.com/apache/hudi/pull/4179#issuecomment-984313456


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4179: Fix HoodieSqlUtils.formatQueryInstant timestamp variable bug

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4179:
URL: https://github.com/apache/hudi/pull/4179#issuecomment-983721370


   
   ## CI report:
   
   * ffdf5ee6c364d06cf3dd40f523b36c4eadad24eb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3919)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4179: Fix HoodieSqlUtils.formatQueryInstant timestamp variable bug

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4179:
URL: https://github.com/apache/hudi/pull/4179#issuecomment-984313491


   
   ## CI report:
   
   * ffdf5ee6c364d06cf3dd40f523b36c4eadad24eb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3919)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3936)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4186: [HUDI-2904] WIP Fix metadata archive issues

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4186:
URL: https://github.com/apache/hudi/pull/4186#issuecomment-984310591


   
   ## CI report:
   
   * 8a185b85ce1f42b5bdec94ba676abc44ce0defa4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4186: [HUDI-2904] WIP Fix metadata archive issues

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4186:
URL: https://github.com/apache/hudi/pull/4186#issuecomment-984311498


   
   ## CI report:
   
   * 8a185b85ce1f42b5bdec94ba676abc44ce0defa4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3935)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4186: [HUDI-2904] WIP Fix metadata archive issues

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4186:
URL: https://github.com/apache/hudi/pull/4186#issuecomment-984310591


   
   ## CI report:
   
   * 8a185b85ce1f42b5bdec94ba676abc44ce0defa4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2904) Failed to archive commits due to no such file in metadata

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2904:
-
Labels: pull-request-available  (was: )

> Failed to archive commits due to no such file in metadata
> -
>
> Key: HUDI-2904
> URL: https://issues.apache.org/jira/browse/HUDI-2904
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Rajesh Mahindra
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Hitting the following exception while running DeltaStreamer continuous mode 
> on a COW table on S3:
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieClusteringException: unable to transition 
> clustering inflight to complete: 20211201011347895
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.hudi.async.HoodieAsyncService.lambda$monitorThreads$1(HoodieAsyncService.java:158)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieClusteringException: unable to 
> transition clustering inflight to complete: 20211201011347895
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:395)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.completeTableService(SparkRDDWriteClient.java:470)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.cluster(SparkRDDWriteClient.java:364)
>   at 
> org.apache.hudi.client.HoodieSparkClusteringClient.cluster(HoodieSparkClusteringClient.java:54)
>   at 
> org.apache.hudi.async.AsyncClusteringService.lambda$null$1(AsyncClusteringService.java:79)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>   ... 3 more
> Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to archive 
> commits
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:334)
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:130)
>   at 
> org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:454)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.postWrite(SparkRDDWriteClient.java:280)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:173)
>   at 
> org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:146)
>   at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:590)
>   at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:602)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.lambda$writeTableMetadataForTableServices$5(SparkRDDWriteClient.java:420)
>   at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.writeTableMetadataForTableServices(SparkRDDWriteClient.java:419)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:384)
>   ... 8 more
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from 
> s3a://hudi-testing/test_hoodie_table_2/.hoodie/metadata/.hoodie/20211201002149590.deltacommit.requested
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:634)
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:250)
>   at 
> org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:72)
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:358)
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:321)
>   ... 19 more
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://hudi-testing/test_hoodie_table_2/.hoodie/metadata/.hoodie/20211201002149590.deltacommit.requested
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3356)
>   

[GitHub] [hudi] rmahindra123 opened a new pull request #4186: [HUDI-2904] WIP Fix metadata archive issues

2021-12-01 Thread GitBox


rmahindra123 opened a new pull request #4186:
URL: https://github.com/apache/hudi/pull/4186


   When metadata is enabled, and with Single writer, an async service such as 
clustering on the data table can cause archival at the same time the regular 
writer may trigger archival on the metadata table. To ensure that async 
services on the data table do not trigger archival, we remove archival as a 
separate table service in the write client, and explicitly trigger archival on 
metadata table only when there is a write from the regular writer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (5284730 -> 772f5ca)

2021-12-01 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 5284730  [HUDI-2881] Compact the file group with larger log files to 
reduce write amplification (#4152)
 add 772f5ca  Fixed partitions produced by layout optimization in case 
order-by key is composed of a single column (#4183)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/spark/OrderingIndexHelper.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[GitHub] [hudi] yihua merged pull request #4183: [HUDI-2908] Fixed partitions produced by layout optimization in case order-by key is composed of a single column

2021-12-01 Thread GitBox


yihua merged pull request #4183:
URL: https://github.com/apache/hudi/pull/4183


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Gatsby-Lee commented on issue #2544: [SUPPORT]failed to read timestamp column in version 0.7.0 even when HIVE_SUPPORT_TIMESTAMP is enabled

2021-12-01 Thread GitBox


Gatsby-Lee commented on issue #2544:
URL: https://github.com/apache/hudi/issues/2544#issuecomment-984291838


   Thank you!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4185: [HUDI-2894][HUDI-2905] Metadata table - avoiding key lookup failures on base files over S3

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4185:
URL: https://github.com/apache/hudi/pull/4185#issuecomment-984289273


   
   ## CI report:
   
   * 5602e0b15b5d3ca9ddd30b3f091439a03d951568 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4185: [HUDI-2894][HUDI-2905] Metadata table - avoiding key lookup failures on base files over S3

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4185:
URL: https://github.com/apache/hudi/pull/4185#issuecomment-984269615


   
   ## CI report:
   
   * 5602e0b15b5d3ca9ddd30b3f091439a03d951568 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4173: [MINOR] Mitigate CI jobs timeout issues

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4173:
URL: https://github.com/apache/hudi/pull/4173#issuecomment-984238562


   
   ## CI report:
   
   * 66c6b0d67d07d6eed59b3653c91dbacd87c05501 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3915)
 
   * dfad8ecf4258b000562b3b188e774d926aea6a1e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3932)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4173: [MINOR] Mitigate CI jobs timeout issues

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4173:
URL: https://github.com/apache/hudi/pull/4173#issuecomment-984281684


   
   ## CI report:
   
   * dfad8ecf4258b000562b3b188e774d926aea6a1e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3932)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2904) Failed to archive commits due to no such file in metadata

2021-12-01 Thread Rajesh Mahindra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra updated HUDI-2904:
--
Priority: Blocker  (was: Major)

> Failed to archive commits due to no such file in metadata
> -
>
> Key: HUDI-2904
> URL: https://issues.apache.org/jira/browse/HUDI-2904
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Blocker
> Fix For: 0.10.0
>
>
> Hitting the following exception while running DeltaStreamer continuous mode 
> on a COW table on S3:
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieClusteringException: unable to transition 
> clustering inflight to complete: 20211201011347895
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.hudi.async.HoodieAsyncService.lambda$monitorThreads$1(HoodieAsyncService.java:158)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieClusteringException: unable to 
> transition clustering inflight to complete: 20211201011347895
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:395)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.completeTableService(SparkRDDWriteClient.java:470)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.cluster(SparkRDDWriteClient.java:364)
>   at 
> org.apache.hudi.client.HoodieSparkClusteringClient.cluster(HoodieSparkClusteringClient.java:54)
>   at 
> org.apache.hudi.async.AsyncClusteringService.lambda$null$1(AsyncClusteringService.java:79)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>   ... 3 more
> Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to archive 
> commits
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:334)
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:130)
>   at 
> org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:454)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.postWrite(SparkRDDWriteClient.java:280)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:173)
>   at 
> org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:146)
>   at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:590)
>   at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:602)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.lambda$writeTableMetadataForTableServices$5(SparkRDDWriteClient.java:420)
>   at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.writeTableMetadataForTableServices(SparkRDDWriteClient.java:419)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:384)
>   ... 8 more
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from 
> s3a://hudi-testing/test_hoodie_table_2/.hoodie/metadata/.hoodie/20211201002149590.deltacommit.requested
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:634)
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:250)
>   at 
> org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:72)
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:358)
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:321)
>   ... 19 more
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://hudi-testing/test_hoodie_table_2/.hoodie/metadata/.hoodie/20211201002149590.deltacommit.requested
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3356)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
> 

[jira] [Assigned] (HUDI-2904) Failed to archive commits due to no such file in metadata

2021-12-01 Thread Rajesh Mahindra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra reassigned HUDI-2904:
-

Assignee: Rajesh Mahindra

> Failed to archive commits due to no such file in metadata
> -
>
> Key: HUDI-2904
> URL: https://issues.apache.org/jira/browse/HUDI-2904
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Rajesh Mahindra
>Priority: Blocker
> Fix For: 0.10.0
>
>
> Hitting the following exception while running DeltaStreamer continuous mode 
> on a COW table on S3:
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieClusteringException: unable to transition 
> clustering inflight to complete: 20211201011347895
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.hudi.async.HoodieAsyncService.lambda$monitorThreads$1(HoodieAsyncService.java:158)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieClusteringException: unable to 
> transition clustering inflight to complete: 20211201011347895
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:395)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.completeTableService(SparkRDDWriteClient.java:470)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.cluster(SparkRDDWriteClient.java:364)
>   at 
> org.apache.hudi.client.HoodieSparkClusteringClient.cluster(HoodieSparkClusteringClient.java:54)
>   at 
> org.apache.hudi.async.AsyncClusteringService.lambda$null$1(AsyncClusteringService.java:79)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>   ... 3 more
> Caused by: org.apache.hudi.exception.HoodieCommitException: Failed to archive 
> commits
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:334)
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:130)
>   at 
> org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:454)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.postWrite(SparkRDDWriteClient.java:280)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:173)
>   at 
> org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:146)
>   at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:590)
>   at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:602)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.lambda$writeTableMetadataForTableServices$5(SparkRDDWriteClient.java:420)
>   at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.writeTableMetadataForTableServices(SparkRDDWriteClient.java:419)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.completeClustering(SparkRDDWriteClient.java:384)
>   ... 8 more
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from 
> s3a://hudi-testing/test_hoodie_table_2/.hoodie/metadata/.hoodie/20211201002149590.deltacommit.requested
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:634)
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:250)
>   at 
> org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:72)
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:358)
>   at 
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:321)
>   ... 19 more
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://hudi-testing/test_hoodie_table_2/.hoodie/metadata/.hoodie/20211201002149590.deltacommit.requested
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3356)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerG

[GitHub] [hudi] hudi-bot removed a comment on pull request #4185: [HUDI-2894][HUDI-2905] Metadata table - avoiding key lookup failures on base files over S3

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4185:
URL: https://github.com/apache/hudi/pull/4185#issuecomment-984268716


   
   ## CI report:
   
   * 5602e0b15b5d3ca9ddd30b3f091439a03d951568 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4185: [HUDI-2894][HUDI-2905] Metadata table - avoiding key lookup failures on base files over S3

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4185:
URL: https://github.com/apache/hudi/pull/4185#issuecomment-984269615


   
   ## CI report:
   
   * 5602e0b15b5d3ca9ddd30b3f091439a03d951568 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4185: [HUDI-2894][HUDI-2905] Metadata table - avoiding key lookup failures on base files over S3

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4185:
URL: https://github.com/apache/hudi/pull/4185#issuecomment-984268716


   
   ## CI report:
   
   * 5602e0b15b5d3ca9ddd30b3f091439a03d951568 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2894) Metadata table read after compaction fails in S3

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2894:
-
Labels: pull-request-available  (was: )

> Metadata table read after compaction fails in S3
> 
>
> Key: HUDI-2894
> URL: https://issues.apache.org/jira/browse/HUDI-2894
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Once compaction in metadata kicks in, future read fails( hunch is reading 
> from base hfile fails). 
>  
> {code:java}
> 21/11/30 15:35:20 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
> from s3a://aws-logs-87995575
> 1789-us-west-1/infra-resources-dev/small/emr/home/hadoop/output
> 21/11/30 15:35:20 ERROR HoodieROTablePathFilter: Error checking path 
> :s3a://aws-logs-879955751789-us-wes
> t-1/infra-resources-dev/small/emr/home/hadoop/output/1970/01/04/135ac18a-db3f-4bc1-b376-960fd85a44c1-0_0
> -326-3529_20211130153211490.parquet, under folder: 
> s3a://aws-logs-879955751789-us-west-1/infra-resources
> -dev/small/emr/home/hadoop/output/1970/01/04
> org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
> in partition s3a://aws-logs-
> 879955751789-us-west-1/infra-resources-dev/small/emr/home/hadoop/output/1970/01/04
>  from metadata
>         at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:124)
>         at 
> org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemV
> iew.java:65)
>         at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCor
> rectly$9(AbstractTableFileSystemView.java:290)
>         at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>         at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(
> AbstractTableFileSystemView.java:281)
>         at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFiles(AbstractTabl
> eFileSystemView.java:449)
>         at 
> org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:194)
>         at 
> org.apache.spark.sql.execution.datasources.PathFilterWrapper.accept(InMemoryFileIndex.scala:1
> 65)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$listLeafFiles$8(HadoopFSUtils.scala:285)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$listLeafFiles$8$adapted(HadoopFSUtils.scala:285)
>         at 
> scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:304)
>         at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>         at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>         at 
> scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
>         at 
> scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297)
>         at 
> scala.collection.mutable.ArrayOps$ofRef.filterImpl(ArrayOps.scala:198)
>         at scala.collection.TraversableLike.filter(TraversableLike.scala:395)
>         at scala.collection.TraversableLike.filter$(TraversableLike.scala:395)
>         at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:198)
>         at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:285)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>         at scala.collection.immutable.Stream.map(Stream.scala:418)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>         at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>         at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>         at org.apache.spark.scheduler.Task.run(Task.scala:131)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>         at org.apache.spar

[GitHub] [hudi] manojpec opened a new pull request #4185: [HUDI-2894] Metadata table - avoiding key lookup failures on base files over S3

2021-12-01 Thread GitBox


manojpec opened a new pull request #4185:
URL: https://github.com/apache/hudi/pull/4185


   ## What is the purpose of the pull request
   
- Fetching partition files or all partitions from the metadata table is 
failing
  when run over S3. Metadata table uses HFile format for the base files and 
the
  record lookup uses HFile.Reader and HFileScanner interfaces to get 
records by
  partition keys. When the backing storage is S3, this record lookup from 
HFiles
  is failing with IOException, in turn failing the caller commit/update 
operations.
   
   ## Brief change log
   
- Metadata table looks up HFile records with positional read enabled so as 
to
  perform better for random lookups. But this positional read key lookup is
  returning with partial read sizes over S3 leading to HFile scanner 
throwing
  IOException. This doesn't happen over HDFS. Metadata table though uses 
the HFile
  for random key lookups, the positional read is not mandatory as we sort 
the keys
  when doing a lookup for multiple keys.
   
- The fix is to disable HFile positional read for all HFile scanner based
  key lookups.
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4182: [MINOR] use catalog schema if can not find table schema

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4182:
URL: https://github.com/apache/hudi/pull/4182#issuecomment-984266216


   
   ## CI report:
   
   * aef2c9c5d890b808384cfae906b4ea1f722659a0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4182: [MINOR] use catalog schema if can not find table schema

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4182:
URL: https://github.com/apache/hudi/pull/4182#issuecomment-984232128


   
   ## CI report:
   
   * aef2c9c5d890b808384cfae906b4ea1f722659a0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4181: [HUDI-2900] Fix corrupt block end position

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4181:
URL: https://github.com/apache/hudi/pull/4181#issuecomment-984263502


   
   ## CI report:
   
   * 9924dc7a8af334d3d641da49e045e0b105ddb2c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3926)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4181: [HUDI-2900] Fix corrupt block end position

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4181:
URL: https://github.com/apache/hudi/pull/4181#issuecomment-984226634


   
   ## CI report:
   
   * 9924dc7a8af334d3d641da49e045e0b105ddb2c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3926)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangzhongz opened a new issue #4184: [SUPPORT]parquet is not a Parquet file (too small length:4)

2021-12-01 Thread GitBox


wangzhongz opened a new issue #4184:
URL: https://github.com/apache/hudi/issues/4184


   
   MOR + Spark
   **Environment Description**
   
   * Hudi version :
   0.9
   * Spark version :
   spark2
   
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
![image](https://user-images.githubusercontent.com/51226982/144352136-57398c56-0e45-40d9-9a7b-ac764e3dc84c.png)
   
![image](https://user-images.githubusercontent.com/51226982/144352259-03edf2b5-6b92-4dc3-85df-8ba8555d0c1d.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4183: [HUDI-2908] Fixed partitions produced by layout optimization in case order-by key is composed of a single column

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4183:
URL: https://github.com/apache/hudi/pull/4183#issuecomment-984251732


   
   ## CI report:
   
   * e28f1f7cc461c327254bbf7c78e4e01985abcf11 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2908) Clustering w/ Layout Optimization enabled, produces incorrect number of partitions

2021-12-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2908:
-
Labels: pull-request-available  (was: )

> Clustering w/ Layout Optimization enabled, produces incorrect number of 
> partitions
> --
>
> Key: HUDI-2908
> URL: https://issues.apache.org/jira/browse/HUDI-2908
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently when clustering w/ Layout Optimization enabled (both 
> Z-order/Hilbert) incorrect number of partitions will be produced in cases 
> when dataset is specified to be ordered by single column:
> [https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/spark/OrderingIndexHelper.java#L103]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] alexeykudinkin opened a new pull request #4183: [HUDI-2908] Fixed partitions produced by layout optimization in case order-by key is composed of a single column

2021-12-01 Thread GitBox


alexeykudinkin opened a new pull request #4183:
URL: https://github.com/apache/hudi/pull/4183


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Fixed partitions produced by layout optimization in case order-by key is 
composed of a single column
   
   ## Brief change log
   
   Fixed partitions produced by layout optimization in case order-by key is 
composed of a single column
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2911) Writing non-partitioned table produces incorrect "hoodie.properties" file

2021-12-01 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-2911:
--
Fix Version/s: 0.11.0

> Writing non-partitioned table produces incorrect "hoodie.properties" file
> -
>
> Key: HUDI-2911
> URL: https://issues.apache.org/jira/browse/HUDI-2911
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Major
> Fix For: 0.11.0
>
>
> After ingesting Hudi table w/ the following configuration, i'm still getting 
> "hoodie.table.partition.fields=partitionpath" in the "hoodie.properties", 
> which blocks this table form being read.
>  
> Example table config: 
> {code:java}
> val commonOpts =
>   Map(
> "hoodie.compact.inline" -> "false",
> "hoodie.bulk_insert.shuffle.parallelism" -> "10"
>   )
> spark.sparkContext.setLogLevel("DEBUG")
> 
> // Writing to Hudi
> 
> val fs = FSUtils.getFs(outputPath, spark.sparkContext.hadoopConfiguration)
> if (!fs.exists(new Path(outputPath))) {
>   val df = spark.read.parquet(inputPath)
>   df.write.format("hudi")
> .option(DataSourceWriteOptions.TABLE_TYPE.key(), COW_TABLE_TYPE_OPT_VAL)
> .option("hoodie.table.name", tableName)
> .option(PRECOMBINE_FIELD.key(), "review_id")
> .option(RECORDKEY_FIELD.key(), "review_id")
> //.option(DataSourceWriteOptions.PARTITIONPATH_FIELD.key(), 
> "product_category")
> .option("hoodie.clustering.inline", "true")
> .option("hoodie.clustering.inline.max.commits", "1")
> // NOTE: Small file limit is intentionally kept _ABOVE_ target file-size 
> max threshold for Clustering,
> // to force re-clustering
> .option("hoodie.clustering.plan.strategy.small.file.limit", 
> String.valueOf(1024 * 1024 * 1024)) // 1Gb
> .option("hoodie.clustering.plan.strategy.target.file.max.bytes", 
> String.valueOf(128 * 1024 * 1024)) // 128Mb
> .option("hoodie.clustering.plan.strategy.max.num.groups", 
> String.valueOf(4096))
> .option(HoodieClusteringConfig.LAYOUT_OPTIMIZE_ENABLE.key, "true")
> .option(HoodieClusteringConfig.LAYOUT_OPTIMIZE_STRATEGY.key, 
> layoutOptStrategy)
> .option(HoodieClusteringConfig.PLAN_STRATEGY_SORT_COLUMNS.key, 
> "product_id,customer_id")
> .option(DataSourceWriteOptions.OPERATION.key(), 
> DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
> .option(BULK_INSERT_SORT_MODE.key(), "NONE")
> .options(commonOpts)
> .mode(ErrorIfExists)
> .save(outputPath)
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2911) Writing non-partitioned table produces incorrect "hoodie.properties" file

2021-12-01 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-2911:
--
Priority: Blocker  (was: Major)

> Writing non-partitioned table produces incorrect "hoodie.properties" file
> -
>
> Key: HUDI-2911
> URL: https://issues.apache.org/jira/browse/HUDI-2911
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.11.0
>
>
> After ingesting Hudi table w/ the following configuration, i'm still getting 
> "hoodie.table.partition.fields=partitionpath" in the "hoodie.properties", 
> which blocks this table form being read.
>  
> Example table config: 
> {code:java}
> val commonOpts =
>   Map(
> "hoodie.compact.inline" -> "false",
> "hoodie.bulk_insert.shuffle.parallelism" -> "10"
>   )
> spark.sparkContext.setLogLevel("DEBUG")
> 
> // Writing to Hudi
> 
> val fs = FSUtils.getFs(outputPath, spark.sparkContext.hadoopConfiguration)
> if (!fs.exists(new Path(outputPath))) {
>   val df = spark.read.parquet(inputPath)
>   df.write.format("hudi")
> .option(DataSourceWriteOptions.TABLE_TYPE.key(), COW_TABLE_TYPE_OPT_VAL)
> .option("hoodie.table.name", tableName)
> .option(PRECOMBINE_FIELD.key(), "review_id")
> .option(RECORDKEY_FIELD.key(), "review_id")
> //.option(DataSourceWriteOptions.PARTITIONPATH_FIELD.key(), 
> "product_category")
> .option("hoodie.clustering.inline", "true")
> .option("hoodie.clustering.inline.max.commits", "1")
> // NOTE: Small file limit is intentionally kept _ABOVE_ target file-size 
> max threshold for Clustering,
> // to force re-clustering
> .option("hoodie.clustering.plan.strategy.small.file.limit", 
> String.valueOf(1024 * 1024 * 1024)) // 1Gb
> .option("hoodie.clustering.plan.strategy.target.file.max.bytes", 
> String.valueOf(128 * 1024 * 1024)) // 128Mb
> .option("hoodie.clustering.plan.strategy.max.num.groups", 
> String.valueOf(4096))
> .option(HoodieClusteringConfig.LAYOUT_OPTIMIZE_ENABLE.key, "true")
> .option(HoodieClusteringConfig.LAYOUT_OPTIMIZE_STRATEGY.key, 
> layoutOptStrategy)
> .option(HoodieClusteringConfig.PLAN_STRATEGY_SORT_COLUMNS.key, 
> "product_id,customer_id")
> .option(DataSourceWriteOptions.OPERATION.key(), 
> DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
> .option(BULK_INSERT_SORT_MODE.key(), "NONE")
> .options(commonOpts)
> .mode(ErrorIfExists)
> .save(outputPath)
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2911) Writing non-partitioned table produces incorrect "hoodie.properties" file

2021-12-01 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-2911:
-

 Summary: Writing non-partitioned table produces incorrect 
"hoodie.properties" file
 Key: HUDI-2911
 URL: https://issues.apache.org/jira/browse/HUDI-2911
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Alexey Kudinkin


After ingesting Hudi table w/ the following configuration, i'm still getting 
"hoodie.table.partition.fields=partitionpath" in the "hoodie.properties", which 
blocks this table form being read.

 

Example table config: 
{code:java}
val commonOpts =
  Map(
"hoodie.compact.inline" -> "false",
"hoodie.bulk_insert.shuffle.parallelism" -> "10"
  )

spark.sparkContext.setLogLevel("DEBUG")


// Writing to Hudi


val fs = FSUtils.getFs(outputPath, spark.sparkContext.hadoopConfiguration)

if (!fs.exists(new Path(outputPath))) {
  val df = spark.read.parquet(inputPath)

  df.write.format("hudi")
.option(DataSourceWriteOptions.TABLE_TYPE.key(), COW_TABLE_TYPE_OPT_VAL)
.option("hoodie.table.name", tableName)
.option(PRECOMBINE_FIELD.key(), "review_id")
.option(RECORDKEY_FIELD.key(), "review_id")
//.option(DataSourceWriteOptions.PARTITIONPATH_FIELD.key(), 
"product_category")
.option("hoodie.clustering.inline", "true")
.option("hoodie.clustering.inline.max.commits", "1")
// NOTE: Small file limit is intentionally kept _ABOVE_ target file-size 
max threshold for Clustering,
// to force re-clustering
.option("hoodie.clustering.plan.strategy.small.file.limit", 
String.valueOf(1024 * 1024 * 1024)) // 1Gb
.option("hoodie.clustering.plan.strategy.target.file.max.bytes", 
String.valueOf(128 * 1024 * 1024)) // 128Mb
.option("hoodie.clustering.plan.strategy.max.num.groups", 
String.valueOf(4096))
.option(HoodieClusteringConfig.LAYOUT_OPTIMIZE_ENABLE.key, "true")
.option(HoodieClusteringConfig.LAYOUT_OPTIMIZE_STRATEGY.key, 
layoutOptStrategy)
.option(HoodieClusteringConfig.PLAN_STRATEGY_SORT_COLUMNS.key, 
"product_id,customer_id")
.option(DataSourceWriteOptions.OPERATION.key(), 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
.option(BULK_INSERT_SORT_MODE.key(), "NONE")
.options(commonOpts)
.mode(ErrorIfExists)
.save(outputPath)
} {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-984239555


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-984210665


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4173: [MINOR] Mitigate CI jobs timeout issues

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4173:
URL: https://github.com/apache/hudi/pull/4173#issuecomment-984238562


   
   ## CI report:
   
   * 66c6b0d67d07d6eed59b3653c91dbacd87c05501 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3915)
 
   * dfad8ecf4258b000562b3b188e774d926aea6a1e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3932)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] gudladona commented on issue #3834: [SUPPORT] - AWS Athena snapshot query fails if there are two or more record array fields in a MoR table

2021-12-01 Thread GitBox


gudladona commented on issue #3834:
URL: https://github.com/apache/hudi/issues/3834#issuecomment-984238436


   I think this is fixed in https://github.com/apache/parquet-mr/pull/560. 
Upgrading parquet-avro to >=1.11.0 should address this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4173: [MINOR] Mitigate CI jobs timeout issues

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4173:
URL: https://github.com/apache/hudi/pull/4173#issuecomment-984235437


   
   ## CI report:
   
   * 66c6b0d67d07d6eed59b3653c91dbacd87c05501 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3915)
 
   * dfad8ecf4258b000562b3b188e774d926aea6a1e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on pull request #4180: [HUDI-2903] get table schema from the last commit with data written

2021-12-01 Thread GitBox


xushiyan commented on pull request #4180:
URL: https://github.com/apache/hudi/pull/4180#issuecomment-984238456


   As discussed, let's hold this off.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4173: [MINOR] [WIP] Check on TestHBaseIndex

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4173:
URL: https://github.com/apache/hudi/pull/4173#issuecomment-984235437


   
   ## CI report:
   
   * 66c6b0d67d07d6eed59b3653c91dbacd87c05501 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3915)
 
   * dfad8ecf4258b000562b3b188e774d926aea6a1e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4173: [MINOR] [WIP] Check on TestHBaseIndex

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4173:
URL: https://github.com/apache/hudi/pull/4173#issuecomment-983627070


   
   ## CI report:
   
   * 66c6b0d67d07d6eed59b3653c91dbacd87c05501 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3915)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4182: [MINOR] use catalog schema if can not find table schema

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4182:
URL: https://github.com/apache/hudi/pull/4182#issuecomment-984232128


   
   ## CI report:
   
   * aef2c9c5d890b808384cfae906b4ea1f722659a0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4182: [MINOR] use catalog schema if can not find table schema

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4182:
URL: https://github.com/apache/hudi/pull/4182#issuecomment-984231059


   
   ## CI report:
   
   * aef2c9c5d890b808384cfae906b4ea1f722659a0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4182: [MINOR] use catalog schema if can not find table schema

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4182:
URL: https://github.com/apache/hudi/pull/4182#issuecomment-984231059


   
   ## CI report:
   
   * aef2c9c5d890b808384cfae906b4ea1f722659a0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron opened a new pull request #4182: [MINOR] use catalog schema if can not find table schema

2021-12-01 Thread GitBox


YannByron opened a new pull request #4182:
URL: https://github.com/apache/hudi/pull/4182


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-2905) Insert crashes in MOR table with NullPointerException from HoodieMergeHandle

2021-12-01 Thread Manoj Govindassamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy reassigned HUDI-2905:


Assignee: Manoj Govindassamy

> Insert crashes in MOR table with NullPointerException from HoodieMergeHandle
> 
>
> Key: HUDI-2905
> URL: https://issues.apache.org/jira/browse/HUDI-2905
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Critical
> Fix For: 0.10.0
>
>
> Running Hoodie integration test suite with a MOR table type sometimes crashes 
> with the following stack trace
> {noformat}
>         at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>         at 
> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:35)
>         at 
> org.apache.spark.api.java.JavaDoubleRDD.sum(JavaDoubleRDD.scala:165)
>         at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:519)
>         at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:306)
>         at 
> org.apache.hudi.integ.testsuite.HoodieDeltaStreamerWrapper.upsert(HoodieDeltaStreamerWrapper.java:44)
>         at 
> org.apache.hudi.integ.testsuite.HoodieDeltaStreamerWrapper.insert(HoodieDeltaStreamerWrapper.java:48)
>         at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteWriter.insert(HoodieTestSuiteWriter.java:166)
>         at 
> org.apache.hudi.integ.testsuite.dag.nodes.InsertNode.ingest(InsertNode.java:70)
>         at 
> org.apache.hudi.integ.testsuite.dag.nodes.InsertNode.execute(InsertNode.java:53)
>         at 
> org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
>         ... 6 more
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting 
> bucketType UPDATE for partition :33
>         at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:320)
>         at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleInsertPartition(BaseSparkCommitActionExecutor.java:326)
>         at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:174)
>         at 
> org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
>         at 
> org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
>         at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
>         at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>         at 
> org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386){noformat}
> Test properties
> {noformat}
> hoodie.insert.shuffle.parallelism=100
> hoodie.upsert.shuffle.parallelism=100
> hoodie.bulkinsert.shuffle.parallelism=100hoodie.deltastreamer.source.test.num_partitions=100
> hoodie.deltastreamer.source.test.datagen.use_rocksdb_for_storing_existing_keys=false
> hoodie.deltastreamer.source.test.max_unique_records=1
> hoodie.embed.timeline.server=false
> hoodie.deltastreamer.source.input.selector=org.apache.hudi.integ.testsuite.helpers.DFSTestSuitePathSelectorhoodie.datasource.hive_sync.skip_ro_suffix=truehoodie.datasource.write.recordkey.field=_row_key
> hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
> hoodie.datasource.write.partitionpath.field=timestamphoodie.clustering.plan.strategy.sort.columns=_row_key
> hoodie.clustering.plan.strategy.daybased.lookback.partitions=0
> hoodie.clustering.inline.max.commits=1hoodie.deltastreamer.source.dfs.root=s3a://dl-scale-test/manoj/010RC2/integration-test-large-scale/slong/mor/input
> hoodie.deltastreamer.schemaprovider.target.schema.file=file:/home/hadoop/staging/source.avsc
> hoodie.deltastreamer.schemaprovider.source.schema.file=file:/home/hadoop/staging/source.avsc
> hoodie.deltastreamer.keygen.timebased.timestamp.type=UNIX_TIMESTAMP
> hoodie.deltastreamer.keygen.timebased.output.dateformat=/MM/ddhoodie.datasource.hive_sync.database=testdb{noformat}
> {noformat}
> /home/hadoop/spark-3.2.0-bin-hadoop3.2/bin/spark-submit \
> --packages org.apache

[GitHub] [hudi] hudi-bot removed a comment on pull request #4181: [HUDI-2900] Fix corrupt block end position

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4181:
URL: https://github.com/apache/hudi/pull/4181#issuecomment-983907003


   
   ## CI report:
   
   * 9924dc7a8af334d3d641da49e045e0b105ddb2c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3926)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4181: [HUDI-2900] Fix corrupt block end position

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4181:
URL: https://github.com/apache/hudi/pull/4181#issuecomment-984226634


   
   ## CI report:
   
   * 9924dc7a8af334d3d641da49e045e0b105ddb2c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3926)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3929)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #4181: [HUDI-2900] Fix corrupt block end position

2021-12-01 Thread GitBox


danny0405 commented on pull request #4181:
URL: https://github.com/apache/hudi/pull/4181#issuecomment-984226564


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (f4c25ba -> 5284730)

2021-12-01 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from f4c25ba  [HUDI-2880] Fixing loading of props from default dir (#4167)
 add 5284730  [HUDI-2881] Compact the file group with larger log files to 
reduce write amplification (#4152)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/config/HoodieCompactionConfig.java   | 11 +++
 .../main/java/org/apache/hudi/config/HoodieWriteConfig.java   |  4 
 .../compact/strategy/LogFileSizeBasedCompactionStrategy.java  |  9 +++--
 .../action/compact/strategy/TestHoodieCompactionStrategy.java |  9 +
 4 files changed, 27 insertions(+), 6 deletions(-)


[GitHub] [hudi] leesf merged pull request #4152: [HUDI-2881] Compact the file group with larger log files to reduce wr…

2021-12-01 Thread GitBox


leesf merged pull request #4152:
URL: https://github.com/apache/hudi/pull/4152


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao edited a comment on issue #4135: [SUPPORT] Zordering clustering on a moderate size dataset taking large amounts of time.

2021-12-01 Thread GitBox


xiarixiaoyao edited a comment on issue #4135:
URL: https://github.com/apache/hudi/issues/4135#issuecomment-983424937


   @vinothchandar   
   The current cluster mechanism just can't support concurrency very well. Even 
if you use ordinary sorting (not z-order / Hilbert), there also  exsit  this 
problem.   Let me think about how to modify this mechanism, this is the pr 
https://github.com/apache/hudi/pull/4178


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-984210665


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-983676154


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-01 Thread GitBox


xiarixiaoyao commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-984209981


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2910) Hudi CLI "commits showarchived" throws NPE

2021-12-01 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-2910:
---

 Summary: Hudi CLI "commits showarchived" throws NPE
 Key: HUDI-2910
 URL: https://issues.apache.org/jira/browse/HUDI-2910
 Project: Apache Hudi
  Issue Type: Bug
  Components: CLI
Reporter: Ethan Guo
 Fix For: 0.11.0


When trying to show archived commits through Hudi CLI command "commits 
showarchived", NullPointerException is thrown.  I'm using 0.10.0-rc2.
{code:java}
hudi:test_table->commits showarchived
Command failed java.lang.NullPointerException
java.lang.NullPointerException
    at 
org.apache.hudi.common.table.timeline.HoodieArchivedTimeline.lambda$readCommit$2(HoodieArchivedTimeline.java:154)
    at org.apache.hudi.common.util.Option.map(Option.java:107)
    at 
org.apache.hudi.common.table.timeline.HoodieArchivedTimeline.readCommit(HoodieArchivedTimeline.java:149)
    at 
org.apache.hudi.common.table.timeline.HoodieArchivedTimeline.lambda$loadInstants$5(HoodieArchivedTimeline.java:228)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    at 
org.apache.hudi.common.table.timeline.HoodieArchivedTimeline.loadInstants(HoodieArchivedTimeline.java:230)
    at 
org.apache.hudi.common.table.timeline.HoodieArchivedTimeline.loadInstants(HoodieArchivedTimeline.java:193)
    at 
org.apache.hudi.common.table.timeline.HoodieArchivedTimeline.loadInstants(HoodieArchivedTimeline.java:189)
    at 
org.apache.hudi.common.table.timeline.HoodieArchivedTimeline.loadInstantDetailsInMemory(HoodieArchivedTimeline.java:112)
    at 
org.apache.hudi.cli.commands.CommitsCommand.showArchivedCommits(CommitsCommand.java:217)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:216)
    at 
org.springframework.shell.core.SimpleExecutionStrategy.invoke(SimpleExecutionStrategy.java:68)
    at 
org.springframework.shell.core.SimpleExecutionStrategy.execute(SimpleExecutionStrategy.java:59)
    at 
org.springframework.shell.core.AbstractShell.executeCommand(AbstractShell.java:134)
    at org.springframework.shell.core.JLineShell.promptLoop(JLineShell.java:533)
    at org.springframework.shell.core.JLineShell.run(JLineShell.java:179)
    at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] h7kanna commented on pull request #3944: [HUDI-2495] resolve inconsistent key generation for timestamp types b…

2021-12-01 Thread GitBox


h7kanna commented on pull request #3944:
URL: https://github.com/apache/hudi/pull/3944#issuecomment-984179033


   @YannByron @leesf 
   I think this broke the keygen. I do not know the context of this change.
   Can you please verify this https://issues.apache.org/jira/browse/HUDI-2909.
   
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4175: [HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4175:
URL: https://github.com/apache/hudi/pull/4175#issuecomment-984079663


   
   ## CI report:
   
   * edc7087b751940625db4566dd3f4e9e77bf26aa7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3914)
 
   * f74bd3089f57aa8e229065139a8307bd2cf70892 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4175: [HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4175:
URL: https://github.com/apache/hudi/pull/4175#issuecomment-984122742


   
   ## CI report:
   
   * f74bd3089f57aa8e229065139a8307bd2cf70892 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2909) KeyGenerator is broken in 0.10.0

2021-12-01 Thread Harsha Teja Kanna (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsha Teja Kanna updated HUDI-2909:

Priority: Blocker  (was: Major)

> KeyGenerator is broken in 0.10.0
> 
>
> Key: HUDI-2909
> URL: https://issues.apache.org/jira/browse/HUDI-2909
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Harsha Teja Kanna
>Priority: Blocker
> Fix For: 0.10.0
>
>
> Existing table has timebased keygen config show below
> hoodie.deltastreamer.keygen.timebased.timestamp.type=SCALAR
> hoodie.deltastreamer.keygen.timebased.output.timezone=GMT
> hoodie.deltastreamer.keygen.timebased.output.dateformat=/MM/dd
> hoodie.deltastreamer.keygen.timebased.timestamp.scalar.time.unit=MICROSECONDS
> hoodie.deltastreamer.keygen.timebased.input.timezone=GMT
> hoodie.datasource.write.partitionpath.field=lastdate:timestamp
> hoodie.datasource.write.operation=upsert
> hoodie.deltastreamer.transformer.sql=SELECT session.id, session.rid, 
> session.mid, to_timestamp(session.lastdate) as lastdate, 
> to_timestamp(session.updatedate) as updatedate FROM  a
>  
> Upgrading to 0.10.0 from 0.9.0 fails with exception 
> org.apache.hudi.exception.HoodieKeyGeneratorException: Unable to parse input 
> partition field :2021-12-01 10:13:34.702
> Caused by: org.apache.hudi.exception.HoodieNotSupportedException: Unexpected 
> type for partition field: java.sql.Timestamp
> at 
> org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:211)
> at 
> org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:133)
> *Workaround fix:*
> Reverting this 
> https://github.com/apache/hudi/pull/3944/files#diff-22fb52b5cf28727ba23cb8bd4be820432a4e396ce663ac472a4677e889b7491eR543
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2909) KeyGenerator is broken in 0.10.0

2021-12-01 Thread Harsha Teja Kanna (Jira)
Harsha Teja Kanna created HUDI-2909:
---

 Summary: KeyGenerator is broken in 0.10.0
 Key: HUDI-2909
 URL: https://issues.apache.org/jira/browse/HUDI-2909
 Project: Apache Hudi
  Issue Type: Bug
  Components: DeltaStreamer
Reporter: Harsha Teja Kanna
 Fix For: 0.10.0


Existing table has timebased keygen config show below

hoodie.deltastreamer.keygen.timebased.timestamp.type=SCALAR
hoodie.deltastreamer.keygen.timebased.output.timezone=GMT
hoodie.deltastreamer.keygen.timebased.output.dateformat=/MM/dd
hoodie.deltastreamer.keygen.timebased.timestamp.scalar.time.unit=MICROSECONDS
hoodie.deltastreamer.keygen.timebased.input.timezone=GMT
hoodie.datasource.write.partitionpath.field=lastdate:timestamp
hoodie.datasource.write.operation=upsert
hoodie.deltastreamer.transformer.sql=SELECT session.id, session.rid, 
session.mid, to_timestamp(session.lastdate) as lastdate, 
to_timestamp(session.updatedate) as updatedate FROM  a

 

Upgrading to 0.10.0 from 0.9.0 fails with exception 

org.apache.hudi.exception.HoodieKeyGeneratorException: Unable to parse input 
partition field :2021-12-01 10:13:34.702
Caused by: org.apache.hudi.exception.HoodieNotSupportedException: Unexpected 
type for partition field: java.sql.Timestamp
at 
org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:211)
at 
org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:133)

*Workaround fix:*

Reverting this 
https://github.com/apache/hudi/pull/3944/files#diff-22fb52b5cf28727ba23cb8bd4be820432a4e396ce663ac472a4677e889b7491eR543

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4175: [HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4175:
URL: https://github.com/apache/hudi/pull/4175#issuecomment-984077767


   
   ## CI report:
   
   * edc7087b751940625db4566dd3f4e9e77bf26aa7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3914)
 
   * f74bd3089f57aa8e229065139a8307bd2cf70892 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4175: [HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4175:
URL: https://github.com/apache/hudi/pull/4175#issuecomment-984079663


   
   ## CI report:
   
   * edc7087b751940625db4566dd3f4e9e77bf26aa7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3914)
 
   * f74bd3089f57aa8e229065139a8307bd2cf70892 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3927)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4175: [HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs

2021-12-01 Thread GitBox


hudi-bot removed a comment on pull request #4175:
URL: https://github.com/apache/hudi/pull/4175#issuecomment-983457360


   
   ## CI report:
   
   * edc7087b751940625db4566dd3f4e9e77bf26aa7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3914)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4175: [HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs

2021-12-01 Thread GitBox


hudi-bot commented on pull request #4175:
URL: https://github.com/apache/hudi/pull/4175#issuecomment-984077767


   
   ## CI report:
   
   * edc7087b751940625db4566dd3f4e9e77bf26aa7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3914)
 
   * f74bd3089f57aa8e229065139a8307bd2cf70892 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #4034: [HUDI-2793] Fixing deltastreamer checkpoint fetch/copy over

2021-12-01 Thread GitBox


nsivabalan commented on pull request #4034:
URL: https://github.com/apache/hudi/pull/4034#issuecomment-983970030


   yes, its lazy evaluation. once first entry is found, we may not process 
others.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2908) Clustering w/ Layout Optimization enabled, produces incorrect number of partitions

2021-12-01 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-2908:
--
Priority: Major  (was: Blocker)

> Clustering w/ Layout Optimization enabled, produces incorrect number of 
> partitions
> --
>
> Key: HUDI-2908
> URL: https://issues.apache.org/jira/browse/HUDI-2908
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Major
> Fix For: 0.10.0
>
>
> Currently when clustering w/ Layout Optimization enabled (both 
> Z-order/Hilbert) incorrect number of partitions will be produced in cases 
> when dataset is specified to be ordered by single column:
> [https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/spark/OrderingIndexHelper.java#L103]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2908) Clustering w/ Layout Optimization enabled, produces incorrect number of partitions

2021-12-01 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-2908:
--
Description: 
Currently when clustering w/ Layout Optimization enabled (both Z-order/Hilbert) 
incorrect number of partitions will be produced in cases when dataset is 
specified to be ordered by single column:

[https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/spark/OrderingIndexHelper.java#L103]

 

 

  was:
Currently when clustering w/ Layout Optimization enabled (both Z-order/Hilbert) 
incorrect number of partitions will be produced, b/c of the typo:

 

 


> Clustering w/ Layout Optimization enabled, produces incorrect number of 
> partitions
> --
>
> Key: HUDI-2908
> URL: https://issues.apache.org/jira/browse/HUDI-2908
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.10.0
>
>
> Currently when clustering w/ Layout Optimization enabled (both 
> Z-order/Hilbert) incorrect number of partitions will be produced in cases 
> when dataset is specified to be ordered by single column:
> [https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/spark/OrderingIndexHelper.java#L103]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2908) Clustering w/ Layout Optimization enabled, produces incorrect number of partitions

2021-12-01 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-2908:
--
Priority: Blocker  (was: Major)

> Clustering w/ Layout Optimization enabled, produces incorrect number of 
> partitions
> --
>
> Key: HUDI-2908
> URL: https://issues.apache.org/jira/browse/HUDI-2908
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Blocker
>
> Currently when clustering w/ Layout Optimization enabled (both 
> Z-order/Hilbert) incorrect number of partitions will be produced, b/c of the 
> typo:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2908) Clustering w/ Layout Optimization enabled, produces incorrect number of partitions

2021-12-01 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-2908:
--
Fix Version/s: 0.10.0

> Clustering w/ Layout Optimization enabled, produces incorrect number of 
> partitions
> --
>
> Key: HUDI-2908
> URL: https://issues.apache.org/jira/browse/HUDI-2908
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.10.0
>
>
> Currently when clustering w/ Layout Optimization enabled (both 
> Z-order/Hilbert) incorrect number of partitions will be produced, b/c of the 
> typo:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2908) Clustering w/ Layout Optimization enabled, produces incorrect number of partitions

2021-12-01 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin reassigned HUDI-2908:
-

Assignee: Alexey Kudinkin

> Clustering w/ Layout Optimization enabled, produces incorrect number of 
> partitions
> --
>
> Key: HUDI-2908
> URL: https://issues.apache.org/jira/browse/HUDI-2908
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.10.0
>
>
> Currently when clustering w/ Layout Optimization enabled (both 
> Z-order/Hilbert) incorrect number of partitions will be produced, b/c of the 
> typo:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2908) Clustering w/ Layout Optimization enabled, produces incorrect number of partitions

2021-12-01 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-2908:
-

 Summary: Clustering w/ Layout Optimization enabled, produces 
incorrect number of partitions
 Key: HUDI-2908
 URL: https://issues.apache.org/jira/browse/HUDI-2908
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Alexey Kudinkin


Currently when clustering w/ Layout Optimization enabled (both Z-order/Hilbert) 
incorrect number of partitions will be produced, b/c of the typo:

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2907) Add a table service to validate states

2021-12-01 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-2907:
---

 Summary: Add a table service to validate states
 Key: HUDI-2907
 URL: https://issues.apache.org/jira/browse/HUDI-2907
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2906) Add a repair command to clean up duplicate/uncommitted data files in a table

2021-12-01 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-2906:
---

 Summary: Add a repair command to clean up duplicate/uncommitted 
data files in a table
 Key: HUDI-2906
 URL: https://issues.apache.org/jira/browse/HUDI-2906
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2907) Add a table service to validate data files against timeline

2021-12-01 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-2907:

Summary: Add a table service to validate data files against timeline  (was: 
Add a table service to validate states)

> Add a table service to validate data files against timeline
> ---
>
> Key: HUDI-2907
> URL: https://issues.apache.org/jira/browse/HUDI-2907
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2906) Add a repair command to clean up duplicate/uncommitted data files in a table

2021-12-01 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-2906:
---

Assignee: Ethan Guo

> Add a repair command to clean up duplicate/uncommitted data files in a table
> 
>
> Key: HUDI-2906
> URL: https://issues.apache.org/jira/browse/HUDI-2906
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yihua commented on a change in pull request #4166: [MINOR] Adding verbose output for metadata validate files command

2021-12-01 Thread GitBox


yihua commented on a change in pull request #4166:
URL: https://github.com/apache/hudi/pull/4166#discussion_r760447080



##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/MetadataCommand.java
##
@@ -297,12 +297,20 @@ public String validateFiles(
 row[0] = partition;
 FileStatus fsFileStatus = fileStatusMap.get(file);
 FileStatus metaFileStatus = metadataFileStatusMap.get(file);
+boolean isFsFileExists = fsFileStatus != null;
+boolean isMetadataFileExists = metaFileStatus != null;

Review comment:
   nit: `doesXXXExist` instead of `isXXXExists`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   >