[GitHub] [hudi] hudi-bot commented on pull request #4548: [HUDI-3184] hudi-flink support timestamp-micros

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4548:
URL: https://github.com/apache/hudi/pull/4548#issuecomment-1008613232


   
   ## CI report:
   
   * afe7fac6c45a7ee1f0935e17896d0616f124fca3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5047)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4548: [HUDI-3184] hudi-flink support timestamp-micros

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4548:
URL: https://github.com/apache/hudi/pull/4548#issuecomment-1008586981


   
   ## CI report:
   
   * afe7fac6c45a7ee1f0935e17896d0616f124fca3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5047)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008603238


   
   ## CI report:
   
   * c3295aa79ecd15281ffc573c86e73a2637f3533f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5041)
 
   * 52cad3508ddf12c73f1c5c60180fe1137232192d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guoch commented on issue #4545: [SUPPORT] Hudi(0.10.0) backward compatibility for Flink 1.11/1.12 version

2022-01-09 Thread GitBox


guoch commented on issue #4545:
URL: https://github.com/apache/hudi/issues/4545#issuecomment-1008604526


   > 
   
   Got it. Thanks for the info.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008604559


   
   ## CI report:
   
   * c3295aa79ecd15281ffc573c86e73a2637f3533f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5041)
 
   * 52cad3508ddf12c73f1c5c60180fe1137232192d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5048)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guoch closed issue #4545: [SUPPORT] Hudi(0.10.0) backward compatibility for Flink 1.11/1.12 version

2022-01-09 Thread GitBox


guoch closed issue #4545:
URL: https://github.com/apache/hudi/issues/4545


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008532182


   
   ## CI report:
   
   * c3295aa79ecd15281ffc573c86e73a2637f3533f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5041)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008603238


   
   ## CI report:
   
   * c3295aa79ecd15281ffc573c86e73a2637f3533f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5041)
 
   * 52cad3508ddf12c73f1c5c60180fe1137232192d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4548: [HUDI-3184] hudi-flink support timestamp-micros

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4548:
URL: https://github.com/apache/hudi/pull/4548#issuecomment-1008586981


   
   ## CI report:
   
   * afe7fac6c45a7ee1f0935e17896d0616f124fca3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5047)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4548: [HUDI-3184] hudi-flink support timestamp-micros

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4548:
URL: https://github.com/apache/hudi/pull/4548#issuecomment-1008585610


   
   ## CI report:
   
   * afe7fac6c45a7ee1f0935e17896d0616f124fca3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4548: [HUDI-3184] hudi-flink support timestamp-micros

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4548:
URL: https://github.com/apache/hudi/pull/4548#issuecomment-1008585610


   
   ## CI report:
   
   * afe7fac6c45a7ee1f0935e17896d0616f124fca3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] AirToSupply opened a new pull request #4548: [HUDI-3184] hudi-flink support timestamp-micros

2022-01-09 Thread GitBox


AirToSupply opened a new pull request #4548:
URL: https://github.com/apache/hudi/pull/4548


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   hudi-flink module support timestamp-micros. 
[(HUDI-3184)](https://issues.apache.org/jira/browse/HUDI-3184)
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4546: [MINOR] Fix port number in setupKafka.sh

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4546:
URL: https://github.com/apache/hudi/pull/4546#issuecomment-1008560857


   
   ## CI report:
   
   * d494dc6ad14f71036c0d939f588313adc84dcf8f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5046)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4546: [MINOR] Fix port number in setupKafka.sh

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4546:
URL: https://github.com/apache/hudi/pull/4546#issuecomment-1008583528


   
   ## CI report:
   
   * d494dc6ad14f71036c0d939f588313adc84dcf8f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5046)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4535: [WIP][HUDI-3161] Add Call Produce Command for spark sql

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4535:
URL: https://github.com/apache/hudi/pull/4535#issuecomment-1008559321


   
   ## CI report:
   
   * 49b18f6d40a8b859927dcc9d606d40fd4162f0b1 UNKNOWN
   * 450ccaa4c73197ad56f26c37260f66fc27873f36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5032)
 
   * a39a6cda867038f96d379ff17b7e1216fa2326fb UNKNOWN
   * f56b53b80f3cfc8949eb2f4d14ee2a8a762252da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5045)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4535: [WIP][HUDI-3161] Add Call Produce Command for spark sql

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4535:
URL: https://github.com/apache/hudi/pull/4535#issuecomment-1008582494


   
   ## CI report:
   
   * 49b18f6d40a8b859927dcc9d606d40fd4162f0b1 UNKNOWN
   * a39a6cda867038f96d379ff17b7e1216fa2326fb UNKNOWN
   * f56b53b80f3cfc8949eb2f4d14ee2a8a762252da Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5045)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] arpanrkl7 commented on issue #2509: [SUPPORT] Hudi Spark DataSource saves TimestampType as bigInt

2022-01-09 Thread GitBox


arpanrkl7 commented on issue #2509:
URL: https://github.com/apache/hudi/issues/2509#issuecomment-1008567437


   When i am trying to read using spark-sql getting below error which was same 
mentioned by @zuyanton .
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be 
cast to org.apache.hadoop.hive.serde2.io.TimestampWritable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #3588: [MINOR] Fix wording and table in the marker blog

2022-01-09 Thread GitBox


yihua commented on a change in pull request #3588:
URL: https://github.com/apache/hudi/pull/3588#discussion_r780906284



##
File path: website/blog/2021-08-18-improving-marker-mechanism.md
##
@@ -47,26 +47,26 @@ Note that the worker thread always checks whether the 
marker has already been cr
 
 ## Marker-related write options
 
-We introduce the following new marker-related write options in `0.9.0` 
release, to configure the marker mechanism.
+We introduce the following new marker-related write options in `0.9.0` 
release, to configure the marker mechanism.  Note that the 
timeline-server-based marker mechanism is not yet supported for HDFS in `0.9.0` 
release, and we plan to support the timeline-server-based marker mechanism for 
HDFS in the future.
 
 | Property Name |   Default   | Meaning|
 | - | --- | :-:| 
-| `hoodie.write.markers.type` | direct | Marker type to use.  Two modes 
are supported: (1) `direct`: individual marker file corresponding to each data 
file is directly created by the writer; (2) `timeline_server_based`: marker 
operations are all handled at the timeline service which serves as a proxy.  
New marker entries are batch processed and stored in a limited number of 
underlying files for efficiency. |
+| `hoodie.write.markers.type` | direct | Marker type to use.  Two modes 
are supported: (1) `direct`: individual marker file corresponding to each data 
file is directly created by the executor; (2) `timeline_server_based`: marker 
operations are all handled at the timeline service which serves as a proxy.  
New marker entries are batch processed and stored in a limited number of 
underlying files for efficiency. |
 | `hoodie.markers.timeline_server_based.batch.num_threads` | 20 | Number 
of threads to use for batch processing marker creation requests at the timeline 
server. | 
 | `hoodie.markers.timeline_server_based.batch.interval_ms` | 50 | The batch 
interval in milliseconds for marker creation batch processing. |
 
 ## Performance
 
-We evaluate the write performance over both direct and timeline-server-based 
marker mechanisms by bulk-inserting a large dataset using Amazon EMR with Spark 
and S3. The input data is around 100GB.  We configure the write operation to 
generate a large number of data files concurrently by setting the max parquet 
file size to be 1MB and parallelism to be 240. As we noted before, while the 
latency of direct marker mechanism is acceptable for incremental writes with 
smaller number of data files written, it increases dramatically for large bulk 
inserts/writes which produce much more data files.
+We evaluate the write performance over both direct and timeline-server-based 
marker mechanisms by bulk-inserting a large dataset using Amazon EMR with Spark 
and S3. The input data is around 100GB.  We configure the write operation to 
generate a large number of data files concurrently by setting the max parquet 
file size to be 1MB and parallelism to be 240.  Note that it is unlikely to set 
max parquet file size to 1MB in production and such a setup is only to evaluate 
the performance regarding the marker mechanisms. As we noted before, while the 
latency of direct marker mechanism is acceptable for incremental writes with 
smaller number of data files written, it increases dramatically for large bulk 
inserts/writes which produce much more data files.
 
-As shown below, the timeline-server-based marker mechanism generates much 
fewer files storing markers because of the batch processing, leading to much 
less time on marker-related I/O operations, thus achieving 31% lower write 
completion time compared to the direct marker file mechanism.
+As shown below, direct marker mechanism works really well, when a part of the 
table is written, e.g., 1K out of 165K data files.  However, the time of direct 
marker operations is non-trivial when we need to write significant number of 
data files. Compared to the direct marker mechanism, the timeline-server-based 
marker mechanism generates much fewer files storing markers because of the 
batch processing, leading to much less time on marker-related I/O operations, 
thus achieving 31% lower write completion time compared to the direct marker 
file mechanism.
 
-| Marker Type |   Total Files   |  Num data files written | Files created for 
markers | Marker deletion time | Bulk Insert Time (including marker deletion) |
+| Marker Type |   Input data size   |  Num data files written | Files created 
for markers | Marker deletion time | Bulk Insert Time (including marker 
deletion) |
 | --- | - | :-: | :-: | :-: | 
:-: | 
-| Direct | 165K | 1k | 165k | 5.4secs | - |
-| Direct | 165K | 165k | 165k | 15min | 55min |
-| Timeline-server-based | 165K | 165k | 20 | ~3s | 38min |
+| Direct | 600MB | 1k | 1k | 5.4secs | - |

Review comment:
   Somehow missed the comment.  I put a PR to fix that: #454

[GitHub] [hudi] yihua opened a new pull request #4547: [MINOR] Fix performance table in marker blog

2022-01-09 Thread GitBox


yihua opened a new pull request #4547:
URL: https://github.com/apache/hudi/pull/4547


   ## What is the purpose of the pull request
   
   Fix performance table content in marker blog.
   
   ## Verify this pull request
   
   The site can build and launch.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4546: [MINOR] Fix port number in setupKafka.sh

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4546:
URL: https://github.com/apache/hudi/pull/4546#issuecomment-1008560091


   
   ## CI report:
   
   * d494dc6ad14f71036c0d939f588313adc84dcf8f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4546: [MINOR] Fix port number in setupKafka.sh

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4546:
URL: https://github.com/apache/hudi/pull/4546#issuecomment-1008560857


   
   ## CI report:
   
   * d494dc6ad14f71036c0d939f588313adc84dcf8f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5046)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4546: [MINOR] Fix port number in setupKafka.sh

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4546:
URL: https://github.com/apache/hudi/pull/4546#issuecomment-1008560091


   
   ## CI report:
   
   * d494dc6ad14f71036c0d939f588313adc84dcf8f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on issue #4545: [SUPPORT] Hudi(0.10.0) backward compatibility for Flink 1.11/1.12 version

2022-01-09 Thread GitBox


danny0405 commented on issue #4545:
URL: https://github.com/apache/hudi/issues/4545#issuecomment-1008559861


   I think we can after the flink version is stable, for e,g the flink 1.14.x.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua opened a new pull request #4546: [MINOR] Fix port number in setupKafka.sh

2022-01-09 Thread GitBox


yihua opened a new pull request #4546:
URL: https://github.com/apache/hudi/pull/4546


   ## What is the purpose of the pull request
   
   This PR fixes port number in `setupKafka.sh`.
   
   ## Verify this pull request
   
   Run through the Quick Start Guide of Kafka Connect Sink for Hudi to make 
sure the script does not throw errors anymore.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4535: [WIP][HUDI-3161] Add Call Produce Command for spark sql

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4535:
URL: https://github.com/apache/hudi/pull/4535#issuecomment-1008559321


   
   ## CI report:
   
   * 49b18f6d40a8b859927dcc9d606d40fd4162f0b1 UNKNOWN
   * 450ccaa4c73197ad56f26c37260f66fc27873f36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5032)
 
   * a39a6cda867038f96d379ff17b7e1216fa2326fb UNKNOWN
   * f56b53b80f3cfc8949eb2f4d14ee2a8a762252da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5045)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4535: [WIP][HUDI-3161] Add Call Produce Command for spark sql

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4535:
URL: https://github.com/apache/hudi/pull/4535#issuecomment-1008556417


   
   ## CI report:
   
   * 49b18f6d40a8b859927dcc9d606d40fd4162f0b1 UNKNOWN
   * 450ccaa4c73197ad56f26c37260f66fc27873f36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5032)
 
   * a39a6cda867038f96d379ff17b7e1216fa2326fb UNKNOWN
   * f56b53b80f3cfc8949eb2f4d14ee2a8a762252da UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4441: [HUDI-3085] improve bulk insert partitioner abstraction

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#issuecomment-1008557130


   
   ## CI report:
   
   * cdb9542f861b32af8fdedb3f5107b3a6d60b3d2d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5040)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5044)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4441: [HUDI-3085] improve bulk insert partitioner abstraction

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#issuecomment-1008529426


   
   ## CI report:
   
   * cdb9542f861b32af8fdedb3f5107b3a6d60b3d2d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5040)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5044)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4535: [WIP][HUDI-3161] Add Call Produce Command for spark sql

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4535:
URL: https://github.com/apache/hudi/pull/4535#issuecomment-1008547036


   
   ## CI report:
   
   * 49b18f6d40a8b859927dcc9d606d40fd4162f0b1 UNKNOWN
   * 450ccaa4c73197ad56f26c37260f66fc27873f36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5032)
 
   * a39a6cda867038f96d379ff17b7e1216fa2326fb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4535: [WIP][HUDI-3161] Add Call Produce Command for spark sql

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4535:
URL: https://github.com/apache/hudi/pull/4535#issuecomment-1008556417


   
   ## CI report:
   
   * 49b18f6d40a8b859927dcc9d606d40fd4162f0b1 UNKNOWN
   * 450ccaa4c73197ad56f26c37260f66fc27873f36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5032)
 
   * a39a6cda867038f96d379ff17b7e1216fa2326fb UNKNOWN
   * f56b53b80f3cfc8949eb2f4d14ee2a8a762252da UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Gatsby-Lee commented on issue #2509: [SUPPORT] Hudi Spark DataSource saves TimestampType as bigInt

2022-01-09 Thread GitBox


Gatsby-Lee commented on issue #2509:
URL: https://github.com/apache/hudi/issues/2509#issuecomment-1008552876


   @nsivabalan  
   after I got your msg, I queried to RT table. It still fails.
   I heard from AWS that the fix will be shipped out at the end of Jan 2022.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] waywtdcc closed issue #4305: [SUPPORT] Duplicate Flink write record

2022-01-09 Thread GitBox


waywtdcc closed issue #4305:
URL: https://github.com/apache/hudi/issues/4305


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] waywtdcc closed issue #4508: [SUPPORT]Duplicate Flink Hudi data

2022-01-09 Thread GitBox


waywtdcc closed issue #4508:
URL: https://github.com/apache/hudi/issues/4508


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3533: [SUPPORT]How to use MOR Table to Merge small file?

2022-01-09 Thread GitBox


nsivabalan commented on issue #3533:
URL: https://github.com/apache/hudi/issues/3533#issuecomment-1008552108


   @aresa7796 : will go ahead and close due to inactivity. Feel free to reopen 
if need be. will be happy to help. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #3533: [SUPPORT]How to use MOR Table to Merge small file?

2022-01-09 Thread GitBox


nsivabalan closed issue #3533:
URL: https://github.com/apache/hudi/issues/3533


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2509: [SUPPORT] Hudi Spark DataSource saves TimestampType as bigInt

2022-01-09 Thread GitBox


nsivabalan commented on issue #2509:
URL: https://github.com/apache/hudi/issues/2509#issuecomment-1008551882


   @umehrot2 @zhedoubushishi : Do you folks have any pointers on this. 
   @Gatsby-Lee : I guess athena added support for real time query in one of the 
latest versions. Did you try using latest athena? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2936: [SUPPORT] OverwriteNonDefaultsWithLatestAvroPayload not work in mor table

2022-01-09 Thread GitBox


nsivabalan commented on issue #2936:
URL: https://github.com/apache/hudi/issues/2936#issuecomment-1008551168


   @shenbinglife : let us know if you are looking for any more help. Or feel 
free to close the issue if you got it resolved. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3478: [SUPPORT] Unexpected Hive behaviour

2022-01-09 Thread GitBox


nsivabalan commented on issue #3478:
URL: https://github.com/apache/hudi/issues/3478#issuecomment-1008550588


   @affei : hey, any updates for us in this regard please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

2022-01-09 Thread GitBox


nsivabalan commented on issue #3713:
URL: https://github.com/apache/hudi/issues/3713#issuecomment-1008550188


   Closing this due to inactivity. Feel free to re-open if need be. would be 
happy to help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

2022-01-09 Thread GitBox


nsivabalan closed issue #3713:
URL: https://github.com/apache/hudi/issues/3713


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

2022-01-09 Thread GitBox


nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1008549934


   what kind of lock provider are you using? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4082: [SUPPORT] How to write multiple HUDi tables simultaneously in a Spark Streaming task?

2022-01-09 Thread GitBox


nsivabalan commented on issue #4082:
URL: https://github.com/apache/hudi/issues/4082#issuecomment-1008548662


   @xuranyang : are you referring to MultiTableDeltastreamer. I don't think we 
have any such functionality for now to stream from multiple and write to diff 
hudi tables. Had to be done manually at the application layer by the user. 
   If you can build some simple framework to get this, please consider 
upstreaming the functionality to benefit others in the community. 
   thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3163) Validate/certify hudi against diff spark 3 versions

2022-01-09 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3163:
-
Status: In Progress  (was: Open)

> Validate/certify hudi against diff spark 3 versions 
> 
>
> Key: HUDI-3163
> URL: https://issues.apache.org/jira/browse/HUDI-3163
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: Raymond Xu
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.10.1
>
>
> We have diff spark3 versions. Lets validate/certify diff spark3 versions 
> against 0.10.0 and master.
>  
> I do see this in our github readme. If its already certified, feel free to 
> close it out(link to original ticket where verifications are documented. 
> {code:java}
> # Build against Spark 3.2.0 (default build shipped with the public jars)
> mvn clean package -DskipTests -Dspark3# Build against Spark 3.1.2
> mvn clean package -DskipTests -Dspark3.1.x# Build against Spark 3.0.3
> mvn clean package -DskipTests -Dspark3.0.x {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4535: [WIP][HUDI-3161] Add Call Produce Command for spark sql

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4535:
URL: https://github.com/apache/hudi/pull/4535#issuecomment-1008329300


   
   ## CI report:
   
   * 49b18f6d40a8b859927dcc9d606d40fd4162f0b1 UNKNOWN
   * 450ccaa4c73197ad56f26c37260f66fc27873f36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5032)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4535: [WIP][HUDI-3161] Add Call Produce Command for spark sql

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4535:
URL: https://github.com/apache/hudi/pull/4535#issuecomment-1008547036


   
   ## CI report:
   
   * 49b18f6d40a8b859927dcc9d606d40fd4162f0b1 UNKNOWN
   * 450ccaa4c73197ad56f26c37260f66fc27873f36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5032)
 
   * a39a6cda867038f96d379ff17b7e1216fa2326fb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4544: [HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4544:
URL: https://github.com/apache/hudi/pull/4544#issuecomment-1008543620


   
   ## CI report:
   
   * 8ca9f2823977584fb07efc737ccc175a6e33f115 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5043)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4544: [HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4544:
URL: https://github.com/apache/hudi/pull/4544#issuecomment-1008512050


   
   ## CI report:
   
   * 8ca9f2823977584fb07efc737ccc175a6e33f115 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5043)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


nsivabalan commented on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008535743


   @xiarixiaoyao : hey, can you review this patch please. Touches part of the 
code authored by you. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008501965


   
   ## CI report:
   
   * dc6e817b518774152944d658e4c239cfcce30c9f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5016)
 
   * c3295aa79ecd15281ffc573c86e73a2637f3533f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5041)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008532182


   
   ## CI report:
   
   * c3295aa79ecd15281ffc573c86e73a2637f3533f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5041)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

2022-01-09 Thread GitBox


danny0405 commented on a change in pull request #4446:
URL: https://github.com/apache/hudi/pull/4446#discussion_r780733934



##
File path: 
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/BaseJavaCommitActionExecutor.java
##
@@ -90,27 +90,29 @@ public BaseJavaCommitActionExecutor(HoodieEngineContext 
context,
   public HoodieWriteMetadata> execute(List> 
inputRecords) {
 HoodieWriteMetadata> result = new 
HoodieWriteMetadata<>();
 
-WorkloadProfile profile = null;
+WorkloadProfile inputProfile = null;
 if (isWorkloadProfileNeeded()) {
-  profile = new WorkloadProfile(buildProfile(inputRecords));
-  LOG.info("Workload profile :" + profile);
+  inputProfile = new WorkloadProfile(buildProfile(inputRecords));
+  LOG.info("Input workload profile :" + inputProfile);
+}
+
+final Partitioner partitioner = getPartitioner(inputProfile);
+try {
+  WorkloadProfile executionProfile = 
partitioner.getExecutionWorkloadProfile();
+  LOG.info("Execution workload profile :" + inputProfile);
+  saveWorkloadProfileMetadataToInflight(executionProfile, instantTime);

Review comment:
   And why we must use the execution profile here ? I know the original 
profile also works only for bloomfilter index but we should fix the profile 
building instead of fetch it from the partitioner, if we have a way to 
distinguish between  `INSERT`s and `UPDATE`s before write.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4441: [HUDI-3085] improve bulk insert partitioner abstraction

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#issuecomment-1008523582


   
   ## CI report:
   
   * cdb9542f861b32af8fdedb3f5107b3a6d60b3d2d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5040)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4441: [HUDI-3085] improve bulk insert partitioner abstraction

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#issuecomment-1008529426


   
   ## CI report:
   
   * cdb9542f861b32af8fdedb3f5107b3a6d60b3d2d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5040)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5044)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YuweiXiao commented on pull request #4441: [HUDI-3085] improve bulk insert partitioner abstraction

2022-01-09 Thread GitBox


YuweiXiao commented on pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#issuecomment-1008528378


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4441: [HUDI-3085] improve bulk insert partitioner abstraction

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#issuecomment-1008523582


   
   ## CI report:
   
   * cdb9542f861b32af8fdedb3f5107b3a6d60b3d2d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5040)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4441: [HUDI-3085] improve bulk insert partitioner abstraction

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#issuecomment-1008500469


   
   ## CI report:
   
   * 1277b45508e2b713a3c8416a87893b1d059c375a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5037)
 
   * cdb9542f861b32af8fdedb3f5107b3a6d60b3d2d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5040)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guanziyue commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

2022-01-09 Thread GitBox


guanziyue commented on a change in pull request #4446:
URL: https://github.com/apache/hudi/pull/4446#discussion_r780881267



##
File path: 
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/BaseJavaCommitActionExecutor.java
##
@@ -90,27 +90,29 @@ public BaseJavaCommitActionExecutor(HoodieEngineContext 
context,
   public HoodieWriteMetadata> execute(List> 
inputRecords) {
 HoodieWriteMetadata> result = new 
HoodieWriteMetadata<>();
 
-WorkloadProfile profile = null;
+WorkloadProfile inputProfile = null;
 if (isWorkloadProfileNeeded()) {
-  profile = new WorkloadProfile(buildProfile(inputRecords));
-  LOG.info("Workload profile :" + profile);
+  inputProfile = new WorkloadProfile(buildProfile(inputRecords));
+  LOG.info("Input workload profile :" + inputProfile);
+}
+
+final Partitioner partitioner = getPartitioner(inputProfile);
+try {
+  WorkloadProfile executionProfile = 
partitioner.getExecutionWorkloadProfile();
+  LOG.info("Execution workload profile :" + inputProfile);
+  saveWorkloadProfileMetadataToInflight(executionProfile, instantTime);

Review comment:
   > 
   I do this because the logic assign records to log file is covered by 
partitioner and minimize the change to existing code. We could move all 
assignment logic of insert records from partitioner to profile generation. I 
will modify this part.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] boneanxs commented on a change in pull request #4350: [HUDI-3047] Basic Implementation of Spark Datasource V2

2022-01-09 Thread GitBox


boneanxs commented on a change in pull request #4350:
URL: https://github.com/apache/hudi/pull/4350#discussion_r780880750



##
File path: 
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/SparkAdapter.scala
##
@@ -92,4 +95,31 @@ trait SparkAdapter extends Serializable {
* ParserInterface#parseMultipartIdentifier is supported since spark3, for 
spark2 this should not be called.
*/
   def parseMultipartIdentifier(parser: ParserInterface, sqlText: String): 
Seq[String]
+
+  def isHoodieTable(table: LogicalPlan, spark: SparkSession): Boolean = {

Review comment:
   Is there any difference with **hoodieSqlCommonUtils.isHoodieTable**? I 
see sometimes we use **adapter.isHoodieTable**, sometimes use 
**hoodieSqlCommonUtils.isHoodieTable**
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guanziyue commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

2022-01-09 Thread GitBox


guanziyue commented on a change in pull request #4446:
URL: https://github.com/apache/hudi/pull/4446#discussion_r780880230



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##
@@ -182,14 +182,28 @@ public abstract void preCompact(
 .withOperationField(config.allowOperationMetadataField())
 .withPartition(operation.getPartitionPath())
 .build();
-if (!scanner.iterator().hasNext()) {
-  scanner.close();
-  return new ArrayList<>();
-}
 
 Option oldDataFileOpt =
 operation.getBaseFile(metaClient.getBasePath(), 
operation.getPartitionPath());
 
+// Considering following scenario: if all log blocks in this fileSlice is 
rollback, it returns an empty scanner.
+// But in this case, we need to give it a base file. Otherwise, it will 
lose base file in following fileSlice.
+if (!scanner.iterator().hasNext()) {
+  if (!oldDataFileOpt.isPresent()) {
+scanner.close();
+return new ArrayList<>();
+  } else {
+// TODO: we may directly rename original parquet file if there is not 
evolution/devolution of schema

Review comment:
   > If the file slice only has parquet files, why we still trigger 
compaction ?
   
   Before we actually do compaction, it is quite difficult to know that new 
fileSlice only has parquet file. There do have one or more log Files exists 
which have no valid log blocks in them. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guanziyue commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

2022-01-09 Thread GitBox


guanziyue commented on a change in pull request #4446:
URL: https://github.com/apache/hudi/pull/4446#discussion_r780879568



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##
@@ -182,14 +182,28 @@ public abstract void preCompact(
 .withOperationField(config.allowOperationMetadataField())
 .withPartition(operation.getPartitionPath())
 .build();
-if (!scanner.iterator().hasNext()) {
-  scanner.close();
-  return new ArrayList<>();
-}
 
 Option oldDataFileOpt =
 operation.getBaseFile(metaClient.getBasePath(), 
operation.getPartitionPath());
 
+// Considering following scenario: if all log blocks in this fileSlice is 
rollback, it returns an empty scanner.
+// But in this case, we need to give it a base file. Otherwise, it will 
lose base file in following fileSlice.
+if (!scanner.iterator().hasNext()) {
+  if (!oldDataFileOpt.isPresent()) {
+scanner.close();
+return new ArrayList<>();
+  } else {
+// TODO: we may directly rename original parquet file if there is not 
evolution/devolution of schema

Review comment:
   Correct me if I misunderstand you question. The reason why we try to 
generate a new base file here rather than end up this compaction operation is 
that any upsert occurs after compaction plan generated will use compaction 
commit time as new log file base commit time. Such a fileSlice is comprised by 
new log file and basefile generated by compaction. If hoodieCompactor didn't 
generate a basefile for this fileSlice, Filegroup will lose all data in 
baseFile in new and following Fileslices.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on pull request #3420: [HUDI-2283] Support Clustering Command For Spark Sql

2022-01-09 Thread GitBox


yihua commented on pull request #3420:
URL: https://github.com/apache/hudi/pull/3420#issuecomment-1008519568


   > @nsivabalan @yihua If no one take this up, i am glad to.
   
   @YannByron Feel free to take a stab at this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#issuecomment-1008497224


   
   ## CI report:
   
   * dc9fe1b878dc47eaed13911fc5ca7eaffb80fb2f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4753)
 
   * ce8a8d9547819b23368115ba640caed1cb385213 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5039)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#issuecomment-1008518397


   
   ## CI report:
   
   * ce8a8d9547819b23368115ba640caed1cb385213 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5039)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guoch opened a new issue #4545: [SUPPORT] Hudi(0.10.0) backward compatibility for Flink 1.11/1.12 version

2022-01-09 Thread GitBox


guoch opened a new issue #4545:
URL: https://github.com/apache/hudi/issues/4545


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? Yes
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Hudi flink bundle version 0.10.0 and master version cannot be running in the 
flink 1.11. While flink 1.11 version are still widely used in many situation, 
and hudi is keeping doing great job to support flink better in newer version.
   
   According to the discussion here https://github.com/apache/hudi/pull/3291, 
the community was unlikely to backward support for old flink version.  (Hudi 
did something like  hudi 0.8-flink 1.11, hudi 0.9- flink 1.12, hudi 0.10 - 
flink 1.13, new hudi version cannot work in old flink)
   
   While old Spark version (2.4/3.1) are always supported with different mvn 
compiling profiles, is there any possibility to retain the support for old 
flink version using similar trick like Spark ?
   
   
   **Expected behavior**
   New version can be backward compatible with old flink version using 
different profiles.
   
   
   **Environment Description**
   
   * Hudi version : 0.10.0 and master branch
   
   * Spark version : 3.2.0
   
   * Hive version : 3.1.2
   
   * Hadoop version : 3.3.1
   * Flink version: 1.11.3
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4544: [HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4544:
URL: https://github.com/apache/hudi/pull/4544#issuecomment-1008512050


   
   ## CI report:
   
   * 8ca9f2823977584fb07efc737ccc175a6e33f115 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5043)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4544: [HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4544:
URL: https://github.com/apache/hudi/pull/4544#issuecomment-1008510590


   
   ## CI report:
   
   * 8ca9f2823977584fb07efc737ccc175a6e33f115 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4544: [HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4544:
URL: https://github.com/apache/hudi/pull/4544#issuecomment-1008510590


   
   ## CI report:
   
   * 8ca9f2823977584fb07efc737ccc175a6e33f115 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua opened a new pull request #4544: [HUDI-2735] Allow empty commits in Kafka Connect Sink for Hudi

2022-01-09 Thread GitBox


yihua opened a new pull request #4544:
URL: https://github.com/apache/hudi/pull/4544


   ## What is the purpose of the pull request
   
   This PR makes Kafka Connect Sink for Hudi to write empty commits when there 
are no new messages from the Kafka topic.  This avoids constant rollbacks if 
the Kafka topic has no new message.  Regardless of whether there are new 
messages or not, the write commit logic, including archival, is always 
executed, resolving the problem of no archival of rollbacks when there is no 
new message as well.
   
   ## Brief change log
   
 - Removes the check of the size of write status list from all participants 
in `ConnectTransactionCoordinator`.
 - Adds a new test for empty status list.
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   - Run Kafka Connect Sink for Hudi using Quick Start Guide
   - Publish some messages to the Kafka topic: `bash setupKafka.sh -n 100 -b 6`
   - Wait for some time so the Sink ingests all messages and writes empty 
commits
   - Publish more messages to the topic: `bash setupKafka.sh -n 100 -b 6 -o 600 
-t`
   - Verify the table timeline using hudi-cli:
   ```
   hudi:hudi-test-topic->commits show
   
╔═══╤═╤═══╤═╤══╤═══╤══╤══╗
   ║ CommitTime│ Total Bytes Written │ Total Files Added │ Total Files 
Updated │ Total Partitions Written │ Total Records Written │ Total Update 
Records Written │ Total Errors ║
   
╠═══╪═╪═══╪═╪══╪═══╪══╪══╣
   ║ 20220109184255282 │ 76.1 KB │ 0 │ 20   
   │ 5│ 300   │ 300 
 │ 0║
   
╟───┼─┼───┼─┼──┼───┼──┼──╢
   ║ 20220109184129070 │ 75.7 KB │ 0 │ 20   
   │ 5│ 300   │ 300 
 │ 0║
   
╟───┼─┼───┼─┼──┼───┼──┼──╢
   ║ 20220109183955630 │ 0.0 B   │ 0 │ 0
   │ 0│ 0 │ 0   
 │ 0║
   
╟───┼─┼───┼─┼──┼───┼──┼──╢
   ║ 20220109183755160 │ 0.0 B   │ 0 │ 0
   │ 0│ 0 │ 0   
 │ 0║
   
╟───┼─┼───┼─┼──┼───┼──┼──╢
   ║ 20220109183554995 │ 0.0 B   │ 0 │ 0
   │ 0│ 0 │ 0   
 │ 0║
   
╟───┼─┼───┼─┼──┼───┼──┼──╢
   ║ 20220109183354904 │ 0.0 B   │ 0 │ 0
   │ 0│ 0 │ 0   
 │ 0║
   
╟───┼─┼───┼─┼──┼───┼──┼──╢
   ║ 20220109183225656 │ 75.7 KB │ 0 │ 20   
   │ 5│ 300   │ 300 
 │ 0║
   
╟───┼─┼───┼─┼──┼───┼──┼──╢
   ║ 20220109183055068 │ 71.8 KB │ 0 │ 16   
   │ 5│ 300   │ 300 
 │ 0║
   
╚═══╧═╧═══╧═╧══╧═══╧══╧══╝
   ```
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] N

[hudi] branch master updated (56f93f4 -> 251d4eb)

2022-01-09 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 56f93f4  Removing rollbacks instants from timeline for restore 
operation (#4518)
 add 251d4eb  [HUDI-3030] InProcessLockPovider as default when any async 
servcies enabled with no lock provider override (#4406)

No new revisions were added by this update.

Summary of changes:
 .../hudi/client/AbstractHoodieWriteClient.java |   2 +-
 .../hudi/client/transaction/lock/LockManager.java  |  15 ++-
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  34 ++-
 .../apache/hudi/config/TestHoodieWriteConfig.java  | 102 +
 .../apache/hudi/common/config/HoodieConfig.java|   6 +-
 5 files changed, 149 insertions(+), 10 deletions(-)


[GitHub] [hudi] codope merged pull request #4406: [HUDI-3030] InProcessLockPovider as default when any async servcies enabled with no lock provider override

2022-01-09 Thread GitBox


codope merged pull request #4406:
URL: https://github.com/apache/hudi/pull/4406


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008501296


   
   ## CI report:
   
   * dc6e817b518774152944d658e4c239cfcce30c9f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5016)
 
   * c3295aa79ecd15281ffc573c86e73a2637f3533f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008501965


   
   ## CI report:
   
   * dc6e817b518774152944d658e4c239cfcce30c9f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5016)
 
   * c3295aa79ecd15281ffc573c86e73a2637f3533f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5041)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008006702


   
   ## CI report:
   
   * dc6e817b518774152944d658e4c239cfcce30c9f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5016)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4540: [HUDI-3194][WIP] fix MOR snapshot query (HIVE) during compaction

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4540:
URL: https://github.com/apache/hudi/pull/4540#issuecomment-1008501296


   
   ## CI report:
   
   * dc6e817b518774152944d658e4c239cfcce30c9f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5016)
 
   * c3295aa79ecd15281ffc573c86e73a2637f3533f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4441: [HUDI-3085] improve bulk insert partitioner abstraction

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#issuecomment-1008500469


   
   ## CI report:
   
   * 1277b45508e2b713a3c8416a87893b1d059c375a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5037)
 
   * cdb9542f861b32af8fdedb3f5107b3a6d60b3d2d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5040)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4441: [HUDI-3085] improve bulk insert partitioner abstraction

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4441:
URL: https://github.com/apache/hudi/pull/4441#issuecomment-1008483743


   
   ## CI report:
   
   * 1277b45508e2b713a3c8416a87893b1d059c375a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5037)
 
   * cdb9542f861b32af8fdedb3f5107b3a6d60b3d2d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-2779) Cache BaseDir if HudiTableNotFound Exception thrown

2022-01-09 Thread Hui An (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An closed HUDI-2779.


> Cache BaseDir if HudiTableNotFound Exception thrown
> ---
>
> Key: HUDI-2779
> URL: https://issues.apache.org/jira/browse/HUDI-2779
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


manojpec commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r780869585



##
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##
@@ -62,6 +64,7 @@
   // Scanner used to read individual keys. This is cached to prevent the 
overhead of opening the scanner for each
   // key retrieval.
   private HFileScanner keyScanner;
+  private final String keyField = HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY;

Review comment:
   Unlike HFile writer, readers don't pass in hfile config or any other 
writer config. Callers make use of the factory static methods to construct the 
reader. Factory and the reader are the hudi-common package and hence it cannot 
make use of the hudi-client storage configs where the new hfile properties are 
available. Factory can pass in the key schema field as an extra arg, but that 
doesn't cover all cases. There are callers who can instantiate HFileReader from 
the serialized contents and they are also at the hudi-common package level with 
no access to new storage configs. 
   
   In https://github.com/apache/hudi/pull/4447, I made all the HFileReader 
callers to pass in the key schema field and thats what made the patch to touch 
a lot of places all around.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


manojpec commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r780868566



##
File path: 
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##
@@ -151,15 +154,15 @@ public BloomFilter readBloomFilter() {
   }
 
   public List> readAllRecords(Schema writerSchema, Schema 
readerSchema) throws IOException {
+final Option keySchemaField = 
Option.ofNullable(readerSchema.getField(keyField));
 List> recordList = new LinkedList<>();
 try {
   final HFileScanner scanner = reader.getScanner(false, false);
   if (scanner.seekTo()) {
 do {
   Cell c = scanner.getKeyValue();
-  byte[] keyBytes = Arrays.copyOfRange(c.getRowArray(), 
c.getRowOffset(), c.getRowOffset() + c.getRowLength());
-  R record = getRecordFromCell(c, writerSchema, readerSchema);
-  recordList.add(new Pair<>(new String(keyBytes), record));
+  final Pair keyAndRecordPair = getRecordFromCell(c, 
writerSchema, readerSchema, keySchemaField);
+  recordList.add(new Pair<>(keyAndRecordPair.getFirst(), 
keyAndRecordPair.getSecond()));

Review comment:
   fixed it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


manojpec commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r780868540



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##
@@ -507,6 +519,255 @@ public void 
testMetadataTableWithPendingCompaction(boolean simulateFailedCompact
 }
   }
 
+  /**
+   * Test arguments - Table type, populate meta fields, exclude key from 
payload.
+   */
+  public static List testMetadataRecordKeyExcludeFromPayloadArgs() {
+return asList(
+Arguments.of(COPY_ON_WRITE, true),
+Arguments.of(COPY_ON_WRITE, false),
+Arguments.of(MERGE_ON_READ, true),
+Arguments.of(MERGE_ON_READ, false)
+);
+  }
+
+  /**

Review comment:
   I initially had the testing at HFile writer and reader level, but it did 
not cover the compaction use case for the metadata table. The test here is more 
of everything combined and checks exactly what is needed for the metadata table 
records. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#issuecomment-1008497224


   
   ## CI report:
   
   * dc9fe1b878dc47eaed13911fc5ca7eaffb80fb2f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4753)
 
   * ce8a8d9547819b23368115ba640caed1cb385213 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5039)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#issuecomment-1008496412


   
   ## CI report:
   
   * dc9fe1b878dc47eaed13911fc5ca7eaffb80fb2f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4753)
 
   * ce8a8d9547819b23368115ba640caed1cb385213 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


manojpec commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r780868368



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileWriter.java
##
@@ -122,7 +128,13 @@ public boolean canWrite() {
 
   @Override
   public void writeAvro(String recordKey, IndexedRecord object) throws 
IOException {
-byte[] value = HoodieAvroUtils.avroToBytes((GenericRecord)object);
+byte[] value = HoodieAvroUtils.avroToBytes((GenericRecord) object);

Review comment:
   We should not empty/change the passed in record object 'key' field, else 
the caller will have the in-memory copy of the record object with key missing 
and affects all users of it. So, i need a copy of the record object, where i 
can empty the key field and then save to disk. The second de-serialization back 
to a new record object where i can change the field is needed. If there are any 
other better ways to doing this, happy to change. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


hudi-bot commented on pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#issuecomment-1008496412


   
   ## CI report:
   
   * dc9fe1b878dc47eaed13911fc5ca7eaffb80fb2f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4753)
 
   * ce8a8d9547819b23368115ba640caed1cb385213 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


hudi-bot removed a comment on pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#issuecomment-1001797582


   
   ## CI report:
   
   * dc9fe1b878dc47eaed13911fc5ca7eaffb80fb2f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4753)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


manojpec commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r780867818



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java
##
@@ -162,6 +158,20 @@ protected void createRecordsFromContentBytes() throws 
IOException {
 return records;
   }
 
+  /**
+   * Serialize the record to byte buffer.
+   *
+   * @param record - Record to serialize
+   * @param schemaKeyField - Key field in the schema
+   * @return Serialized byte buffer for the record
+   */
+  private byte[] serializeRecord(final IndexedRecord record, final 
Option schemaKeyField) {
+if (schemaKeyField.isPresent()) {
+  record.put(schemaKeyField.get().pos(), "");

Review comment:
   fixed. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on a change in pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field

2022-01-09 Thread GitBox


manojpec commented on a change in pull request #4449:
URL: https://github.com/apache/hudi/pull/4449#discussion_r780867793



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileWriter.java
##
@@ -77,6 +81,8 @@ public HoodieHFileWriter(String instantTime, Path file, 
HoodieHFileConfig hfileC
 this.file = HoodieWrapperFileSystem.convertToHoodiePath(file, conf);
 this.fs = (HoodieWrapperFileSystem) this.file.getFileSystem(conf);
 this.hfileConfig = hfileConfig;
+this.schema = schema;
+this.schemaRecordKeyField = 
Option.ofNullable(schema.getField(HoodieMetadataPayload.SCHEMA_FIELD_ID_KEY));

Review comment:
   Incorporated vinoth's suggestion on using the storage config property 
and letting the HFileWriter use the config to let the writer know about the key 
field. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Guanpx commented on issue #4539: [SUPPORT] spark 2.4.0 write data to hudi ERROR (0.10.0)

2022-01-09 Thread GitBox


Guanpx commented on issue #4539:
URL: https://github.com/apache/hudi/issues/4539#issuecomment-1008493909


   > 2.4.0 is not supported. Can you try with 2.4.3 or higher spark versions.
   
   our spark can not upgrade, so, if replace hudi source code
   SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY TO string 
"__partition_columns" or delete that code will Impact on other functions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (e9a7f49 -> 56f93f4)

2022-01-09 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from e9a7f49  [HUDI-3112] Fix KafkaConnect cannot sync to Hive Problem 
(#4458)
 add 56f93f4  Removing rollbacks instants from timeline for restore 
operation (#4518)

No new revisions were added by this update.

Summary of changes:
 .../hudi/table/action/restore/BaseRestoreActionExecutor.java   | 10 ++
 .../functional/TestHoodieClientOnCopyOnWriteStorage.java   |  2 ++
 2 files changed, 12 insertions(+)


[GitHub] [hudi] codope merged pull request #4518: [HUDI-2477] Removing rollbacks instants from timeline for restore operation

2022-01-09 Thread GitBox


codope merged pull request #4518:
URL: https://github.com/apache/hudi/pull/4518


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-3065) spark auto partition discovery does not work from 0.9.0

2022-01-09 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-3065.

 Reviewers: Forward Xu, Raymond Xu  (was: Raymond Xu)
Resolution: Won't Fix

> spark auto partition discovery does not work from 0.9.0
> ---
>
> Key: HUDI-3065
> URL: https://issues.apache.org/jira/browse/HUDI-3065
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: Yann Byron
>Priority: Major
>  Labels: core-flow-ds, sev:critical, spark
> Fix For: 0.10.1
>
>
> with 0.8.0, if partition is of the format  "/partitionKey=partitionValue", 
> Spark auto partition discovery will kick in. we can see explicit fields in 
> hudi's table schema. 
> But with 0.9.0, it does not happen. 
> // launch spark shell with 0.8.0 
> {code:scala}
> import org.apache.hudi.QuickstartUtils._
> import scala.collection.JavaConversions._
> import org.apache.spark.sql.SaveMode._
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"
> val basePath = "file:///tmp/hudi_trips_cow"
> val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").
> options(getQuickstartWriteConfigs).
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").
> option(RECORDKEY_FIELD_OPT_KEY, "uuid").
> option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
> option(TABLE_NAME, tableName).
> mode(Overwrite).save(basePath)
> val tripsSnapshotDF = spark.
>         read.
>         format("hudi").
>         load(basePath)
> tripsSnapshotDF.printSchema
> {code}
> // output : check for continent, country, city in the end.
> {code}
> |– _hoodie_commit_time: string (nullable = true)|
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- begin_lat: double (nullable = true)
>  |-- begin_lon: double (nullable = true)
>  |-- driver: string (nullable = true)
>  |-- end_lat: double (nullable = true)
>  |-- end_lon: double (nullable = true)
>  |-- fare: double (nullable = true)
>  |-- partitionpath: string (nullable = true)
>  |-- rider: string (nullable = true)
>  |-- ts: long (nullable = true)
>  |-- uuid: string (nullable = true)
>  |-- continent: string (nullable = true)
>  |-- country: string (nullable = true)
>  |-- city: string (nullable = true)
>  {code}
>  
> Lets run this with 0.9.0.
> {code:scala}
> import org.apache.hudi.QuickstartUtils._
> import scala.collection.JavaConversions._
> import org.apache.spark.sql.SaveMode._
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"
> val basePath = "file:///tmp/hudi_trips_cow"
> val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").  
> options(getQuickstartWriteConfigs).  
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").  
> option(RECORDKEY_FIELD_OPT_KEY, "uuid").  
> option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").  
> option(TABLE_NAME, tableName).  
> mode(Overwrite).  save(basePath)
> val tripsSnapshotDF = spark.
>      |   read.
>      |   format("hudi").
>      |   load(basePath )
> tripsSnapshotDF.printSchema
> {code}
> //output: continent, country, city is missing. 
> {code}
> root
>  |-- _hoodie_commit_time: string (nullable = true)
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- begin_lat: double (nullable = true)
>  |-- begin_lon: double (nullable = true)
>  |-- driver: string (nullable = true)
>  |-- end_lat: double (nullable = true)
>  |-- end_lon: double (nullable = true)
>  |-- fare: double (nullable = true)
>  |-- rider: string (nullable = true)
>  |-- ts: long (nullable = true)
>  |-- uuid: string (nullable = true)
>  |-- partitionpath: string (nullable = true)
>  {code}
> Ref issue: [https://github.com/apache/hudi/issues/3984]
>  
>  
>  
>  



--

[jira] [Commented] (HUDI-3065) spark auto partition discovery does not work from 0.9.0

2022-01-09 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471637#comment-17471637
 ] 

Raymond Xu commented on HUDI-3065:
--

After discussion with [~x1q1j1] [~biyan900...@gmail.com], we think that auto 
partition discovery behavior should be address separately. In the end state, we 
should have a keygen or a flag to help user enable partition discovery. Without 
the keygen or partition discover flag, we respect user's setting and take 
partition paths as is. i.e., no partition auto discovery. Will close this as 
won't fix and the next steps are recorded in the linked tickets. cc @

> spark auto partition discovery does not work from 0.9.0
> ---
>
> Key: HUDI-3065
> URL: https://issues.apache.org/jira/browse/HUDI-3065
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: Yann Byron
>Priority: Major
>  Labels: core-flow-ds, sev:critical, spark
> Fix For: 0.10.1
>
>
> with 0.8.0, if partition is of the format  "/partitionKey=partitionValue", 
> Spark auto partition discovery will kick in. we can see explicit fields in 
> hudi's table schema. 
> But with 0.9.0, it does not happen. 
> // launch spark shell with 0.8.0 
> {code:scala}
> import org.apache.hudi.QuickstartUtils._
> import scala.collection.JavaConversions._
> import org.apache.spark.sql.SaveMode._
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"
> val basePath = "file:///tmp/hudi_trips_cow"
> val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").
> options(getQuickstartWriteConfigs).
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").
> option(RECORDKEY_FIELD_OPT_KEY, "uuid").
> option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
> option(TABLE_NAME, tableName).
> mode(Overwrite).save(basePath)
> val tripsSnapshotDF = spark.
>         read.
>         format("hudi").
>         load(basePath)
> tripsSnapshotDF.printSchema
> {code}
> // output : check for continent, country, city in the end.
> {code}
> |– _hoodie_commit_time: string (nullable = true)|
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- begin_lat: double (nullable = true)
>  |-- begin_lon: double (nullable = true)
>  |-- driver: string (nullable = true)
>  |-- end_lat: double (nullable = true)
>  |-- end_lon: double (nullable = true)
>  |-- fare: double (nullable = true)
>  |-- partitionpath: string (nullable = true)
>  |-- rider: string (nullable = true)
>  |-- ts: long (nullable = true)
>  |-- uuid: string (nullable = true)
>  |-- continent: string (nullable = true)
>  |-- country: string (nullable = true)
>  |-- city: string (nullable = true)
>  {code}
>  
> Lets run this with 0.9.0.
> {code:scala}
> import org.apache.hudi.QuickstartUtils._
> import scala.collection.JavaConversions._
> import org.apache.spark.sql.SaveMode._
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"
> val basePath = "file:///tmp/hudi_trips_cow"
> val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").  
> options(getQuickstartWriteConfigs).  
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").  
> option(RECORDKEY_FIELD_OPT_KEY, "uuid").  
> option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").  
> option(TABLE_NAME, tableName).  
> mode(Overwrite).  save(basePath)
> val tripsSnapshotDF = spark.
>      |   read.
>      |   format("hudi").
>      |   load(basePath )
> tripsSnapshotDF.printSchema
> {code}
> //output: continent, country, city is missing. 
> {code}
> root
>  |-- _hoodie_commit_time: string (nullable = true)
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- begin_lat: double (nullable = true)
>  |-- begin

[jira] [Comment Edited] (HUDI-3065) spark auto partition discovery does not work from 0.9.0

2022-01-09 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471637#comment-17471637
 ] 

Raymond Xu edited comment on HUDI-3065 at 1/10/22, 2:04 AM:


After discussion with [~x1q1j1] [~biyan900...@gmail.com], we think that auto 
partition discovery behavior should be address separately. In the end state, we 
should have a keygen or a flag to help user enable partition discovery. Without 
the keygen or partition discover flag, we respect user's setting and take 
partition paths as is. i.e., no partition auto discovery. Will close this as 
won't fix and the next steps are recorded in the linked tickets. cc 
[~shivnarayan]


was (Author: xushiyan):
After discussion with [~x1q1j1] [~biyan900...@gmail.com], we think that auto 
partition discovery behavior should be address separately. In the end state, we 
should have a keygen or a flag to help user enable partition discovery. Without 
the keygen or partition discover flag, we respect user's setting and take 
partition paths as is. i.e., no partition auto discovery. Will close this as 
won't fix and the next steps are recorded in the linked tickets. cc @

> spark auto partition discovery does not work from 0.9.0
> ---
>
> Key: HUDI-3065
> URL: https://issues.apache.org/jira/browse/HUDI-3065
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: Yann Byron
>Priority: Major
>  Labels: core-flow-ds, sev:critical, spark
> Fix For: 0.10.1
>
>
> with 0.8.0, if partition is of the format  "/partitionKey=partitionValue", 
> Spark auto partition discovery will kick in. we can see explicit fields in 
> hudi's table schema. 
> But with 0.9.0, it does not happen. 
> // launch spark shell with 0.8.0 
> {code:scala}
> import org.apache.hudi.QuickstartUtils._
> import scala.collection.JavaConversions._
> import org.apache.spark.sql.SaveMode._
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"
> val basePath = "file:///tmp/hudi_trips_cow"
> val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").
> options(getQuickstartWriteConfigs).
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").
> option(RECORDKEY_FIELD_OPT_KEY, "uuid").
> option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
> option(TABLE_NAME, tableName).
> mode(Overwrite).save(basePath)
> val tripsSnapshotDF = spark.
>         read.
>         format("hudi").
>         load(basePath)
> tripsSnapshotDF.printSchema
> {code}
> // output : check for continent, country, city in the end.
> {code}
> |– _hoodie_commit_time: string (nullable = true)|
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- begin_lat: double (nullable = true)
>  |-- begin_lon: double (nullable = true)
>  |-- driver: string (nullable = true)
>  |-- end_lat: double (nullable = true)
>  |-- end_lon: double (nullable = true)
>  |-- fare: double (nullable = true)
>  |-- partitionpath: string (nullable = true)
>  |-- rider: string (nullable = true)
>  |-- ts: long (nullable = true)
>  |-- uuid: string (nullable = true)
>  |-- continent: string (nullable = true)
>  |-- country: string (nullable = true)
>  |-- city: string (nullable = true)
>  {code}
>  
> Lets run this with 0.9.0.
> {code:scala}
> import org.apache.hudi.QuickstartUtils._
> import scala.collection.JavaConversions._
> import org.apache.spark.sql.SaveMode._
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"
> val basePath = "file:///tmp/hudi_trips_cow"
> val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").  
> options(getQuickstartWriteConfigs).  
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").  
> option(RECORDKEY_FIELD_OPT_KEY, "uuid").  
> option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").  
> option(TABLE_NAME, tableName).  
> mode(Ov

[jira] [Updated] (HUDI-3200) File Index config affects partition fields shown in printSchema results

2022-01-09 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3200:
-
Description: 
Discovered in HUDI-3065, disabling file index config should not affect 
partition fields shown in printSchema. 

It looks like since 0.9.0

- file index = true: it enables partition auto discovery
- file index = false: it disables partition auto discovery

> File Index config affects partition fields shown in printSchema results
> ---
>
> Key: HUDI-3200
> URL: https://issues.apache.org/jira/browse/HUDI-3200
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.11.0
>
>
> Discovered in HUDI-3065, disabling file index config should not affect 
> partition fields shown in printSchema. 
> It looks like since 0.9.0
> - file index = true: it enables partition auto discovery
> - file index = false: it disables partition auto discovery



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3202) Add keygen to support partition discovery

2022-01-09 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3202:
-
Reviewers: Forward Xu, Raymond Xu, Yann Byron

> Add keygen to support partition discovery
> -
>
> Key: HUDI-3202
> URL: https://issues.apache.org/jira/browse/HUDI-3202
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3201) Make partition auto discovery configurable

2022-01-09 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3201:
-
Reviewers: Forward Xu, Raymond Xu, Yann Byron

> Make partition auto discovery configurable
> --
>
> Key: HUDI-3201
> URL: https://issues.apache.org/jira/browse/HUDI-3201
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3200) File Index config affects partition fields shown in printSchema results

2022-01-09 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3200:
-
Reviewers: Forward Xu, Raymond Xu, Yann Byron

> File Index config affects partition fields shown in printSchema results
> ---
>
> Key: HUDI-3200
> URL: https://issues.apache.org/jira/browse/HUDI-3200
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3202) Add keygen to support partition discovery

2022-01-09 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-3202:


 Summary: Add keygen to support partition discovery
 Key: HUDI-3202
 URL: https://issues.apache.org/jira/browse/HUDI-3202
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Raymond Xu
 Fix For: 0.11.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3201) Make partition auto discovery configurable

2022-01-09 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-3201:


 Summary: Make partition auto discovery configurable
 Key: HUDI-3201
 URL: https://issues.apache.org/jira/browse/HUDI-3201
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Raymond Xu
 Fix For: 0.11.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3200) File Index config affects partition fields shown in printSchema results

2022-01-09 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-3200:


 Summary: File Index config affects partition fields shown in 
printSchema results
 Key: HUDI-3200
 URL: https://issues.apache.org/jira/browse/HUDI-3200
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Raymond Xu
 Fix For: 0.11.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   >