[GitHub] [hudi] hudi-bot removed a comment on pull request #4984: [HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4984:
URL: https://github.com/apache/hudi/pull/4984#issuecomment-1064863475


   
   ## CI report:
   
   * 015f7f0e07d3f0efbd8d3a728f802fc5572a8f52 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6694)
 
   * e30a63cc90f3afbea7ee36c37283f2f21ea7998f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4984: [HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4984:
URL: https://github.com/apache/hudi/pull/4984#issuecomment-1064865517


   
   ## CI report:
   
   * 015f7f0e07d3f0efbd8d3a728f802fc5572a8f52 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6694)
 
   * e30a63cc90f3afbea7ee36c37283f2f21ea7998f UNKNOWN
   * c0a0e141561d1d75150aab046090e1ccd1c9e2c2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4984: [HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4984:
URL: https://github.com/apache/hudi/pull/4984#issuecomment-1061697101


   
   ## CI report:
   
   * 015f7f0e07d3f0efbd8d3a728f802fc5572a8f52 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6694)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4984: [HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4984:
URL: https://github.com/apache/hudi/pull/4984#issuecomment-1064863475


   
   ## CI report:
   
   * 015f7f0e07d3f0efbd8d3a728f802fc5572a8f52 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6694)
 
   * e30a63cc90f3afbea7ee36c37283f2f21ea7998f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteH

2022-03-10 Thread GitBox


danny0405 commented on a change in pull request #5018:
URL: https://github.com/apache/hudi/pull/5018#discussion_r824461742



##
File path: 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkWriteHelper.java
##
@@ -113,5 +114,10 @@ public static FlinkWriteHelper newInstance() {
   hoodieRecord.setCurrentLocation(rec1.getCurrentLocation());
   return hoodieRecord;
 }).orElse(null)).filter(Objects::nonNull).collect(Collectors.toList());
+
+if (hasInsert) {
+  recordList.get(0).getCurrentLocation().setInstantTime("I");
+}
+return recordList;

Review comment:
   In line 114, we already reset the location, so each records list under 
the same key after reduction should have the same instant time type as before, 
so why the set is needed ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4872: [HUDI-3475] Support run compaction / clustering job in Service

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4872:
URL: https://github.com/apache/hudi/pull/4872#issuecomment-1064853531


   
   ## CI report:
   
   * 0fd561ae050f39c022862eae351c73b323a61e05 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6209)
 
   * c662e400cd71c1dbba9b4f37512ca5e748736f03 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6833)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4872: [HUDI-3475] Support run compaction / clustering job in Service

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4872:
URL: https://github.com/apache/hudi/pull/4872#issuecomment-1064851908


   
   ## CI report:
   
   * 0fd561ae050f39c022862eae351c73b323a61e05 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6209)
 
   * c662e400cd71c1dbba9b4f37512ca5e748736f03 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4872: [HUDI-3475] Support run compaction / clustering job in Service

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4872:
URL: https://github.com/apache/hudi/pull/4872#issuecomment-1064851908


   
   ## CI report:
   
   * 0fd561ae050f39c022862eae351c73b323a61e05 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6209)
 
   * c662e400cd71c1dbba9b4f37512ca5e748736f03 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4872: [HUDI-3475] Support run compaction / clustering job in Service

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4872:
URL: https://github.com/apache/hudi/pull/4872#issuecomment-1048383460


   
   ## CI report:
   
   * 0fd561ae050f39c022862eae351c73b323a61e05 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6209)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu edited a comment on pull request #4969: [HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at one time

2022-03-10 Thread GitBox


wangxianghu edited a comment on pull request #4969:
URL: https://github.com/apache/hudi/pull/4969#issuecomment-1064849682


   hi @nsivabalan  can we add this processor ? it is very useful in scenarios 
with diversified data requirements.
   In our comany, we have use this feature to add multiple processors to our 
pipeline:
   1. maxwell post processor : extract data from maxwell json string. this is a 
standard processor
   2. Encrypt post processor : Encrypt some fields for safety purpose
   3. flag post processor : this is quite a business related processor.
   
   with ChainedJsonKafkaSourcePostProcessor we can make data processing more 
flexible, it makes up for the lack of expression ability of `Transformer`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu edited a comment on pull request #4969: [HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at one time

2022-03-10 Thread GitBox


wangxianghu edited a comment on pull request #4969:
URL: https://github.com/apache/hudi/pull/4969#issuecomment-1064849682


   hi @nsivabalan  can we add this processor ? it is very useful in scenarios 
with diversified data requirements.
   In our comany, we have use this feature to add multiple processors to our 
pipeline:
   1. maxwell post processor : extract data from maxwell json string. this is a 
standard processor
   2. Encrypt post processor : Encrypt some fields for safety purpose
   3. flag post processor : this is quite business related processor.
   
   with ChainedJsonKafkaSourcePostProcessor we can make data processing more 
flexible, it makes up for the lack of expression ability of `Transformer`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on pull request #4969: [HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at one time

2022-03-10 Thread GitBox


wangxianghu commented on pull request #4969:
URL: https://github.com/apache/hudi/pull/4969#issuecomment-1064849682


   hi @nsivabalan  can we add this processor ? it is very useful in scenarios 
with diversified data requirements.
   In our comany, we have use this feature to add multiple processors in our 
pipeline:
   1. maxwell post processor : extract data from maxwell json string. this is a 
standard processor
   2. Encrypt post processor : Encrypt some fields for safety purpose
   3. flag post processor : this is quite business related processor.
   
   with ChainedJsonKafkaSourcePostProcessor we can make data processing more 
flexible, it makes up for the lack of expression ability of `Transformer`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on a change in pull request #4640: [HUDI-3225] [RFC-45] for async metadata indexing

2022-03-10 Thread GitBox


prashantwason commented on a change in pull request #4640:
URL: https://github.com/apache/hudi/pull/4640#discussion_r824447974



##
File path: rfc/rfc-45/rfc-45.md
##
@@ -0,0 +1,264 @@
+
+
+# RFC-45: Asynchronous Metadata Indexing
+
+## Proposers
+
+- @codope
+- @manojpec
+
+## Approvers
+
+- @nsivabalan
+- @vinothchandar
+
+## Status
+
+JIRA: [HUDI-2488](https://issues.apache.org/jira/browse/HUDI-2488)
+
+## Abstract
+
+Metadata indexing (aka metadata bootstrapping) is the process of creation of 
one
+or more metadata-based indexes, e.g. data partitions to files index, that is
+stored in Hudi metadata table. Currently, the metadata table (referred as MDT
+hereafter) supports single partition which is created synchronously with the
+corresponding data table, i.e. commits are first applied to metadata table
+followed by data table. Our goal for MDT is to support multiple partitions to
+boost the performance of existing index and records lookup. However, the
+synchronous manner of metadata indexing is not very scalable as we add more
+partitions to the MDT because the regular writers (writing to the data table)
+have to wait until the MDT commit completes. In this RFC, we propose a design 
to
+support asynchronous metadata indexing.
+
+## Background
+
+We can read more about the MDT design
+in 
[RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)
+. Here is a quick summary of the current state (Hudi v0.10.1). MDT is an
+internal Merge-on-Read (MOR) table that has a single partition called `files`
+which stores the data partitions to files index that is used in file listing.
+MDT is co-located with the data table (inside `.hoodie/metadata` directory 
under
+the basepath). In order to handle multi-writer scenario, users configure lock
+provider and only one writer can access MDT in read-write mode. Hence, any 
write
+to MDT is guarded by the data table lock. This ensures only one write is
+committed to MDT at any point in time and thus guarantees serializability.
+However, locking overhead adversely affects the write throughput and will reach
+its scalability limits as we add more partitions to the MDT.
+
+## Goals
+
+- Support indexing one or more partitions in MDT while regular writers and 
table
+  services (such as cleaning or compaction) are in progress.
+- Locking to be as lightweight as possible.
+- Keep required config changes to a minimum to simplify deployment / upgrade in
+  production.
+- Do not require specific ordering of how writers and table service pipelines
+  need to be upgraded / restarted.
+- If an external long-running process is being used to initialize the index, 
the
+  process should be made idempotent so it can handle errors from previous runs.
+- To re-initialize the index, make it as simple as running the external
+  initialization process again without having to change configs.
+
+## Implementation
+
+### A new Hudi action: INDEX
+
+We introduce a new action `index` which will denote the index building process,
+the mechanics of which is as follows:
+
+1. From an external process, users can issue a CREATE INDEX or similar 
statement
+   to trigger indexing for an existing table.
+1. This will schedule INDEX action and add
+   a `.index.requested` to the timeline, which contains the
+   indexing plan. Index scheduling will also initialize the filegroup for
+   the partitions for which indexing is planned.
+2. From here on, the index building process will continue to build an index
+   up to instant time `t`, where `t` is the latest completed instant time 
on
+   the timeline without any
+   "holes" i.e. no pending async operations prior to it.
+3. The indexing process will write these out as base files within the
+   corresponding metadata partition. A metadata partition cannot be used if
+   there is any pending indexing action against it. As and when indexing is
+   completed for a partition, then table config (`hoodie.properties`) will
+   be updated to indicate that partition is available for reads or
+   synchronous updates. Hudi table config will be the source of truth for
+   the current state of metadata index.
+
+2. Any inflight writers (i.e. with instant time `t'` > `t`)  will check for any
+   new indexing request on the timeline prior to preparing to commit.
+1. Such writers will proceed to additionally add log entries corresponding
+   to each such indexing request into the metadata partition.
+2. There is always a TOCTOU issue here, where the inflight writer may not
+   see an indexing request that was just added and proceed to commit 
without
+   that. We will correct this during indexing action completion. In the
+   average case, this may not happen and the design has liveness.
+
+3. When the indexing process is about to complete (i.e. indexing upto
+   instant `t` is done but before completing indexing commit), it 

[GitHub] [hudi] hudi-bot commented on pull request #5013: [HUDI-3593] Restore TypedProperties and flush checksum in table config

2022-03-10 Thread GitBox


hudi-bot commented on pull request #5013:
URL: https://github.com/apache/hudi/pull/5013#issuecomment-1064845719


   
   ## CI report:
   
   * a2e2b2ecd3ffe2974fac5e6472c2ab273f4d13c4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6800)
 
   * f50fc2686b0c3b7f17c741ca99db9629aafc6b66 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6832)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5013: [HUDI-3593] Restore TypedProperties and flush checksum in table config

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #5013:
URL: https://github.com/apache/hudi/pull/5013#issuecomment-1064844072


   
   ## CI report:
   
   * a2e2b2ecd3ffe2974fac5e6472c2ab273f4d13c4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6800)
 
   * f50fc2686b0c3b7f17c741ca99db9629aafc6b66 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5013: [HUDI-3593] Restore TypedProperties and flush checksum in table config

2022-03-10 Thread GitBox


hudi-bot commented on pull request #5013:
URL: https://github.com/apache/hudi/pull/5013#issuecomment-1064844072


   
   ## CI report:
   
   * a2e2b2ecd3ffe2974fac5e6472c2ab273f4d13c4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6800)
 
   * f50fc2686b0c3b7f17c741ca99db9629aafc6b66 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5013: [HUDI-3593] Restore TypedProperties and flush checksum in table config

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #5013:
URL: https://github.com/apache/hudi/pull/5013#issuecomment-1064395857


   
   ## CI report:
   
   * a2e2b2ecd3ffe2974fac5e6472c2ab273f4d13c4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4925:
URL: https://github.com/apache/hudi/pull/4925#issuecomment-1064843916


   
   ## CI report:
   
   * 018bb851445f7eabaa0bd4cc2b362f269d6fec59 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6436)
 
   * 7119319af35fb23afa97e058cd2fbfaea18292a1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6831)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4925:
URL: https://github.com/apache/hudi/pull/4925#issuecomment-1064842219


   
   ## CI report:
   
   * 018bb851445f7eabaa0bd4cc2b362f269d6fec59 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6436)
 
   * 7119319af35fb23afa97e058cd2fbfaea18292a1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5019: [HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor

2022-03-10 Thread GitBox


hudi-bot commented on pull request #5019:
URL: https://github.com/apache/hudi/pull/5019#issuecomment-1064842384


   
   ## CI report:
   
   * 3b6b326bb3650689e8ad78504ccaca3df2700998 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6830)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5019: [HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #5019:
URL: https://github.com/apache/hudi/pull/5019#issuecomment-1064840735


   
   ## CI report:
   
   * 3b6b326bb3650689e8ad78504ccaca3df2700998 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4925:
URL: https://github.com/apache/hudi/pull/4925#issuecomment-1064842219


   
   ## CI report:
   
   * 018bb851445f7eabaa0bd4cc2b362f269d6fec59 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6436)
 
   * 7119319af35fb23afa97e058cd2fbfaea18292a1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4925:
URL: https://github.com/apache/hudi/pull/4925#issuecomment-1055496401


   
   ## CI report:
   
   * 018bb851445f7eabaa0bd4cc2b362f269d6fec59 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6436)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5019: [HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor

2022-03-10 Thread GitBox


hudi-bot commented on pull request #5019:
URL: https://github.com/apache/hudi/pull/5019#issuecomment-1064840735


   
   ## CI report:
   
   * 3b6b326bb3650689e8ad78504ccaca3df2700998 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3575) Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor

2022-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3575:
-
Labels: pull-request-available  (was: )

> Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in 
> TestSchemaPostProcessor
> 
>
> Key: HUDI-3575
> URL: https://issues.apache.org/jira/browse/HUDI-3575
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] wangxianghu opened a new pull request #5019: [HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor

2022-03-10 Thread GitBox


wangxianghu opened a new pull request #5019:
URL: https://github.com/apache/hudi/pull/5019


   ## What is the purpose of the pull request
   
   *Use standard test schema in our UT instead of a shema from a specific 
enterprise data*
   
   ## Brief change log
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests:
   
   org.apache.hudi.utilities.TestSchemaPostProcessor
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064838694


   
   ## CI report:
   
   * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824)
 
   * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6829)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064827082


   
   ## CI report:
   
   * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824)
 
   * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4888: [HUDI-3396][Stacked on 4877] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected columns

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4888:
URL: https://github.com/apache/hudi/pull/4888#issuecomment-1064831876


   
   ## CI report:
   
   * b07cca5112163e153385c690203603b74542ace6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6820)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4888: [HUDI-3396][Stacked on 4877] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected columns

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4888:
URL: https://github.com/apache/hudi/pull/4888#issuecomment-1064748224


   
   ## CI report:
   
   * e0afa9f1de90411220a6c1d25c0c9e43f09f6baf Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6815)
 
   * b07cca5112163e153385c690203603b74542ace6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6820)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064827082


   
   ## CI report:
   
   * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824)
 
   * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064825631


   
   ## CI report:
   
   * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120)
 
   * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824)
 
   * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064825631


   
   ## CI report:
   
   * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120)
 
   * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824)
 
   * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064806027


   
   ## CI report:
   
   * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120)
 
   * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-3607) Support backend switch in HoodieFlinkStreamer

2022-03-10 Thread Jira


[ 
https://issues.apache.org/jira/browse/HUDI-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504744#comment-17504744
 ] 

刘方奇 commented on HUDI-3607:
---

[~wangxianghu] Could you help to take a glance? Can assign it to me.

> Support backend switch in HoodieFlinkStreamer
> -
>
> Key: HUDI-3607
> URL: https://issues.apache.org/jira/browse/HUDI-3607
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: 刘方奇
>Priority: Major
>
> Now, HoodieFlinkStreamer utility only support one backend - FsStateBackend.
> I think it's not flexible for the application configuration. Could we make 
> backend configurable?
> Moreover, for flink version 1.14, FsStateBackend is deprecated in favor of 
> org.apache.flink.runtime.state.hashmap.HashMapStateBackend and 
> org.apache.flink.runtime.state.storage.FileSystemCheckpointStorage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] guanziyue commented on a change in pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


guanziyue commented on a change in pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#discussion_r824428011



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java
##
@@ -101,13 +101,13 @@ public void runMerge(HoodieTable>, JavaRDD
 } catch (Exception e) {
   throw new HoodieException(e);
 } finally {
+  if (null != wrapper) {

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guanziyue commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


guanziyue commented on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064821297


   > @guanziyue thank you for taking the time to troubleshoot this concurrency 
issues and implement the fix!
   > 
   > I echo @vinothchandar concerns and i think we're taking a step a bit too 
far -- `ParquetWriter` is not assumed to be thread-safe, neither do i believe 
we should make it such.
   > 
   > Instead, i believe we should just resolve the problem with its concurrent 
access (which you already did) and make sure we make it clear that 
`ParquetWriter` is not thread-safe so its usage need to be properly guarded 
externally.
   
   Hi @alexeykudinkin, may I know if your concern is "adding a lock to 
parquetWriter" or "adding a lock to hot path"? I'm afraid that it is difficult 
to come up with a method to guarantee this problem is totally solved except 
adding a signal to hot path. Producer need to check if current thread is 
interrupted and response to it in a reasonable time or consumer need to 
immediately reject any writing just after close method is called, which also 
need a lock on hot path. For producer solution, we can have a lock-free check. 
For consumer, we may use volatile rather than a lock? But either of them is 
adding something to hot path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (83cff3a -> 18cdad9)

2022-03-10 Thread garyli
This is an automated email from the ASF dual-hosted git repository.

garyli pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 83cff3a  [HUDI-3522] Introduce DropColumnSchemaPostProcessor to 
support drop columns from schema (#4972)
 add 18cdad9  [HUDI-2999] [RFC-42] RFC for consistent hashing index (#4326)

No new revisions were added by this update.

Summary of changes:
 rfc/rfc-42/basic_bucket_hashing.png | Bin 0 -> 26942 bytes
 rfc/rfc-42/bucket_resizing.png  | Bin 0 -> 53114 bytes
 rfc/rfc-42/bucket_resizing_virtual_log_file.png | Bin 0 -> 42742 bytes
 rfc/rfc-42/consistent_hashing.png   | Bin 0 -> 38682 bytes
 rfc/rfc-42/rfc-42.md| 230 
 5 files changed, 230 insertions(+)
 create mode 100644 rfc/rfc-42/basic_bucket_hashing.png
 create mode 100644 rfc/rfc-42/bucket_resizing.png
 create mode 100644 rfc/rfc-42/bucket_resizing_virtual_log_file.png
 create mode 100644 rfc/rfc-42/consistent_hashing.png
 create mode 100644 rfc/rfc-42/rfc-42.md


[GitHub] [hudi] garyli1019 merged pull request #4326: [HUDI-2999] [RFC-42] RFC for consistent hashing index

2022-03-10 Thread GitBox


garyli1019 merged pull request #4326:
URL: https://github.com/apache/hudi/pull/4326


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4982:
URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064816693


   
   ## CI report:
   
   * 282ca401f8e2a93d7703f592041b854959291d41 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6805)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6817)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4982:
URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064820630


   
   ## CI report:
   
   * 282ca401f8e2a93d7703f592041b854959291d41 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6805)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6817)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6827)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5015: [HUDI-3513] Make sure Column Stats does not fail in case it fails to load previous Index Table state

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #5015:
URL: https://github.com/apache/hudi/pull/5015#issuecomment-1064739346


   
   ## CI report:
   
   * 16c497f48a922830b3fbcb833bca203c292158da Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6818)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5015: [HUDI-3513] Make sure Column Stats does not fail in case it fails to load previous Index Table state

2022-03-10 Thread GitBox


hudi-bot commented on pull request #5015:
URL: https://github.com/apache/hudi/pull/5015#issuecomment-1064819300


   
   ## CI report:
   
   * 16c497f48a922830b3fbcb833bca203c292158da Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6818)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] huberylee commented on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable

2022-03-10 Thread GitBox


huberylee commented on pull request #4982:
URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064819324


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3607) Support backend switch in HoodieFlinkStreamer

2022-03-10 Thread Jira
刘方奇 created HUDI-3607:
-

 Summary: Support backend switch in HoodieFlinkStreamer
 Key: HUDI-3607
 URL: https://issues.apache.org/jira/browse/HUDI-3607
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: 刘方奇


Now, HoodieFlinkStreamer utility only support one backend - FsStateBackend.

I think it's not flexible for the application configuration. Could we make 
backend configurable?

Moreover, for flink version 1.14, FsStateBackend is deprecated in favor of 
org.apache.flink.runtime.state.hashmap.HashMapStateBackend and 
org.apache.flink.runtime.state.storage.FileSystemCheckpointStorage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4982:
URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064734674


   
   ## CI report:
   
   * 282ca401f8e2a93d7703f592041b854959291d41 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6805)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6817)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4982:
URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064816693


   
   ## CI report:
   
   * 282ca401f8e2a93d7703f592041b854959291d41 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6805)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6817)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4971:
URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064813820


   
   ## CI report:
   
   * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810)
 
   * 7367ebfc60119b4442988ebc7350e4daac15b65f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4971:
URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064815294


   
   ## CI report:
   
   * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810)
 
   * 7367ebfc60119b4442988ebc7350e4daac15b65f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6826)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4971:
URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064813820


   
   ## CI report:
   
   * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810)
 
   * 7367ebfc60119b4442988ebc7350e4daac15b65f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4971:
URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064772913


   
   ## CI report:
   
   * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] boneanxs edited a comment on pull request #4999: [HUDI-3592] Fix NPE of DefaultHoodieRecordPayload if Property is empty

2022-03-10 Thread GitBox


boneanxs edited a comment on pull request #4999:
URL: https://github.com/apache/hudi/pull/4999#issuecomment-1064809186


   @nsivabalan @xushiyan @XuQianJin-Stars could you pls review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] boneanxs commented on pull request #4999: [HUDI-3592] Fix NPE of DefaultHoodieRecordPayload if Property is empty

2022-03-10 Thread GitBox


boneanxs commented on pull request #4999:
URL: https://github.com/apache/hudi/pull/4999#issuecomment-1064809186


   @nsivabalan @xushiyan could you pls review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of

2022-03-10 Thread GitBox


hudi-bot commented on pull request #5018:
URL: https://github.com/apache/hudi/pull/5018#issuecomment-1064807946


   
   ## CI report:
   
   * b9e437b2c2942ba29945d1d21c7e214e350e4333 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6825)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #5018:
URL: https://github.com/apache/hudi/pull/5018#issuecomment-1064806485


   
   ## CI report:
   
   * b9e437b2c2942ba29945d1d21c7e214e350e4333 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of

2022-03-10 Thread GitBox


hudi-bot commented on pull request #5018:
URL: https://github.com/apache/hudi/pull/5018#issuecomment-1064806485


   
   ## CI report:
   
   * b9e437b2c2942ba29945d1d21c7e214e350e4333 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wxplovecc commented on pull request #4981: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause o…

2022-03-10 Thread GitBox


wxplovecc commented on pull request #4981:
URL: https://github.com/apache/hudi/pull/4981#issuecomment-1064806140


   https://github.com/apache/hudi/pull/5018


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064806027


   
   ## CI report:
   
   * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120)
 
   * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064804622


   
   ## CI report:
   
   * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120)
 
   * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wxplovecc opened a new pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of

2022-03-10 Thread GitBox


wxplovecc opened a new pull request #5018:
URL: https://github.com/apache/hudi/pull/5018


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   This pull request avoid deduplicateRecords method in FlinkWriteHelper run 
out of order
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064804622


   
   ## CI report:
   
   * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120)
 
   * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4264:
URL: https://github.com/apache/hudi/pull/4264#issuecomment-1044252671


   
   ## CI report:
   
   * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-184) Integrate Hudi with Apache Flink

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-184.
-
Resolution: Implemented

This feature has been tracked via 
https://issues.apache.org/jira/browse/HUDI-1521

> Integrate Hudi with Apache Flink
> 
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Reopened] (HUDI-184) Integrate Hudi with Apache Flink

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reopened HUDI-184:
---

> Integrate Hudi with Apache Flink
> 
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] guanziyue commented on a change in pull request #4913: [HUDI-1517] create marker file for every log file

2022-03-10 Thread GitBox


guanziyue commented on a change in pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r824411709



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##
@@ -113,22 +116,37 @@
   // Header metadata for a log block
   protected final Map header = new HashMap<>();
   private SizeEstimator sizeEstimator;
+  protected final WriteMarkers writeMarkers;
+  private final IOType ioType;
 
   private Properties recordProperties = new Properties();
 
   public HoodieAppendHandle(HoodieWriteConfig config, String instantTime, 
HoodieTable hoodieTable,
-String partitionPath, String fileId, 
Iterator> recordItr, TaskContextSupplier taskContextSupplier) {
+String partitionPath, String fileId, 
Iterator> recordItr,
+TaskContextSupplier taskContextSupplier, IOType 
ioType) {
 super(config, instantTime, partitionPath, fileId, hoodieTable, 
taskContextSupplier);
 this.fileId = fileId;
 this.recordItr = recordItr;
 sizeEstimator = new DefaultSizeEstimator();
 this.statuses = new ArrayList<>();
 this.recordProperties.putAll(config.getProps());
+this.writeMarkers = WriteMarkersFactory.get(config.getMarkersType(), 
hoodieTable, instantTime);
+this.ioType = ioType;
   }
 
+  // constructor used for creating new file group
   public HoodieAppendHandle(HoodieWriteConfig config, String instantTime, 
HoodieTable hoodieTable,
 String partitionPath, String fileId, 
TaskContextSupplier sparkTaskContextSupplier) {
-this(config, instantTime, hoodieTable, partitionPath, fileId, null, 
sparkTaskContextSupplier);
+this(config, instantTime, hoodieTable, partitionPath, fileId, null, 
sparkTaskContextSupplier,
+IOType.CREATE);

Review comment:
   For indexes which have attribute canindexLogFile. Currently, HbaseIndex, 
Flink State index and memory Index has this attribute.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5017: [HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom

2022-03-10 Thread GitBox


hudi-bot commented on pull request #5017:
URL: https://github.com/apache/hudi/pull/5017#issuecomment-1064800761


   
   ## CI report:
   
   * d1211dd592bcb9e3df60b80b9585d2eda9f0b8ab Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6823)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #5017: [HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #5017:
URL: https://github.com/apache/hudi/pull/5017#issuecomment-1064799451


   
   ## CI report:
   
   * d1211dd592bcb9e3df60b80b9585d2eda9f0b8ab UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #5017: [HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom

2022-03-10 Thread GitBox


hudi-bot commented on pull request #5017:
URL: https://github.com/apache/hudi/pull/5017#issuecomment-1064799451


   
   ## CI report:
   
   * d1211dd592bcb9e3df60b80b9585d2eda9f0b8ab UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3606) ClassNotFoundException: org.objenesis.strategy.InstantiatorStrategy

2022-03-10 Thread cdmikechen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cdmikechen updated HUDI-3606:
-
Description: 
When using *hudi-timeline-server-bundle* in hadoop server (3.2.2), hudi will 
occasionally encounter errors similar to the this.
{code}
2022-03-11 05:28:48,223 [qtp818093527-18] ERROR javalin.Javalin: Exception 
occurred while servicing http-request
java.lang.NoClassDefFoundError: org/objenesis/strategy/InstantiatorStrategy
at 
org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.(SerializationUtils.java:88)
at 
java.base/java.lang.ThreadLocal$SuppliedThreadLocal.initialValue(Unknown Source)
at java.base/java.lang.ThreadLocal.setInitialValue(Unknown Source)
at java.base/java.lang.ThreadLocal.get(Unknown Source)
at 
org.apache.hudi.common.util.SerializationUtils.serialize(SerializationUtils.java:52)
at 
org.apache.hudi.common.util.collection.RocksDBDAO.serializePayload(RocksDBDAO.java:469)
at 
org.apache.hudi.common.util.collection.RocksDBDAO.putInBatch(RocksDBDAO.java:175)
at 
org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$12(RocksDbBasedFileSystemView.java:237)
at 
java.base/java.util.TreeMap$ValueSpliterator.forEachRemaining(Unknown Source)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(Unknown 
Source)
at 
org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$13(RocksDbBasedFileSystemView.java:236)
at 
org.apache.hudi.common.util.collection.RocksDBDAO.writeBatch(RocksDBDAO.java:157)
at 
org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$storePartitionView$14(RocksDbBasedFileSystemView.java:235)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at 
org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.storePartitionView(RocksDbBasedFileSystemView.java:234)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$addFilesToView$2(AbstractTableFileSystemView.java:146)
at java.base/java.util.HashMap.forEach(Unknown Source)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:134)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:308)
at 
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFilesBeforeOrOn(AbstractTableFileSystemView.java:489)
at 
org.apache.hudi.timeline.service.handlers.BaseFileHandler.getLatestDataFilesBeforeOrOn(BaseFileHandler.java:60)
at 
org.apache.hudi.timeline.service.RequestHandler.lambda$registerDataFilesAPI$6(RequestHandler.java:268)
at 
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:497)
at 
io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
at 
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
at 
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
at 
io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
at 
io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
at 
org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:502)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback

[jira] [Updated] (HUDI-3606) ClassNotFoundException: org.objenesis.strategy.InstantiatorStrategy

2022-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3606:
-
Labels: pull-request-available  (was: )

> ClassNotFoundException: org.objenesis.strategy.InstantiatorStrategy
> ---
>
> Key: HUDI-3606
> URL: https://issues.apache.org/jira/browse/HUDI-3606
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: timeline-server
>Affects Versions: 0.10.1
>Reporter: cdmikechen
>Priority: Major
>  Labels: pull-request-available
>
> When user *hudi-timeline-server-bundle* in hadoop server (3.2.2), hudi will 
> occasionally encounter errors similar to the this.
> {code}
> 2022-03-11 05:28:48,223 [qtp818093527-18] ERROR javalin.Javalin: Exception 
> occurred while servicing http-request
> java.lang.NoClassDefFoundError: org/objenesis/strategy/InstantiatorStrategy
>   at 
> org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.(SerializationUtils.java:88)
>   at 
> java.base/java.lang.ThreadLocal$SuppliedThreadLocal.initialValue(Unknown 
> Source)
>   at java.base/java.lang.ThreadLocal.setInitialValue(Unknown Source)
>   at java.base/java.lang.ThreadLocal.get(Unknown Source)
>   at 
> org.apache.hudi.common.util.SerializationUtils.serialize(SerializationUtils.java:52)
>   at 
> org.apache.hudi.common.util.collection.RocksDBDAO.serializePayload(RocksDBDAO.java:469)
>   at 
> org.apache.hudi.common.util.collection.RocksDBDAO.putInBatch(RocksDBDAO.java:175)
>   at 
> org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$12(RocksDbBasedFileSystemView.java:237)
>   at 
> java.base/java.util.TreeMap$ValueSpliterator.forEachRemaining(Unknown Source)
>   at java.base/java.util.stream.ReferencePipeline$Head.forEach(Unknown 
> Source)
>   at 
> org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$13(RocksDbBasedFileSystemView.java:236)
>   at 
> org.apache.hudi.common.util.collection.RocksDBDAO.writeBatch(RocksDBDAO.java:157)
>   at 
> org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$storePartitionView$14(RocksDbBasedFileSystemView.java:235)
>   at java.base/java.util.ArrayList.forEach(Unknown Source)
>   at 
> org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.storePartitionView(RocksDbBasedFileSystemView.java:234)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$addFilesToView$2(AbstractTableFileSystemView.java:146)
>   at java.base/java.util.HashMap.forEach(Unknown Source)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:134)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:308)
>   at 
> java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown 
> Source)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
>   at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFilesBeforeOrOn(AbstractTableFileSystemView.java:489)
>   at 
> org.apache.hudi.timeline.service.handlers.BaseFileHandler.getLatestDataFilesBeforeOrOn(BaseFileHandler.java:60)
>   at 
> org.apache.hudi.timeline.service.RequestHandler.lambda$registerDataFilesAPI$6(RequestHandler.java:268)
>   at 
> org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:497)
>   at 
> io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
>   at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
>   at 
> io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
>   at 
> io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
>   at 
> io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
>   at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
>   at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
>   at 
> io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandle

[GitHub] [hudi] cdmikechen opened a new pull request #5017: [HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom

2022-03-10 Thread GitBox


cdmikechen opened a new pull request #5017:
URL: https://github.com/apache/hudi/pull/5017


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   This pull request adds `org.objenesis:objenesis` to 
hudi-timeline-server-bundle pom.
   
   ## Brief change log
   
   Add `org.objenesis:objenesis` include to hudi-timeline-server-bundle pom
   
   ## Verify this pull request
   
   In theory, as long as Ci passes, it can be proved that there is no problem
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-609) Implement a Flink specific HoodieIndex

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-609.
-
Resolution: Won't Do

> Implement a Flink specific HoodieIndex
> --
>
> Key: HUDI-609
> URL: https://issues.apache.org/jira/browse/HUDI-609
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Indexing is a key step in hudi's write flow. {{HoodieIndex}} is the super 
> abstract class of all the implement of the index. Currently, {{HoodieIndex}} 
> couples with Spark in the design. However, HUDI-538 is doing the restructure 
> for hudi-client so that hudi can be decoupled with Spark. After that, we 
> would get an engine-irrelevant implementation of {{HoodieIndex}}. And 
> extending that class, we could implement a Flink specific index.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-184) Integrate Hudi with Apache Flink

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-184.
-
Resolution: Won't Do

> Integrate Hudi with Apache Flink
> 
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-608) Implement a flink datastream execution context

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-608.
-
Resolution: Won't Do

> Implement a flink datastream execution context
> --
>
> Key: HUDI-608
> URL: https://issues.apache.org/jira/browse/HUDI-608
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Currently {{HoodieWriteClient}} does something like 
> `hoodieRecordRDD.map().sort()` internally.. if we want to support Flink 
> DataStream as the object, then we need to somehow define an abstraction like 
> {{HoodieExecutionContext}}  which will have a common set of map(T) -> T, 
> filter(), repartition() methods. There will be subclass like 
> {{HoodieFlinkDataStreamExecutionContext}} which will implement it 
> in Flink specific ways and hand back the transformed T object.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3606) ClassNotFoundException: org.objenesis.strategy.InstantiatorStrategy

2022-03-10 Thread cdmikechen (Jira)
cdmikechen created HUDI-3606:


 Summary: ClassNotFoundException: 
org.objenesis.strategy.InstantiatorStrategy
 Key: HUDI-3606
 URL: https://issues.apache.org/jira/browse/HUDI-3606
 Project: Apache Hudi
  Issue Type: Bug
  Components: timeline-server
Affects Versions: 0.10.1
Reporter: cdmikechen


When user *hudi-timeline-server-bundle* in hadoop server (3.2.2), hudi will 
occasionally encounter errors similar to the this.
{code}
2022-03-11 05:28:48,223 [qtp818093527-18] ERROR javalin.Javalin: Exception 
occurred while servicing http-request
java.lang.NoClassDefFoundError: org/objenesis/strategy/InstantiatorStrategy
at 
org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.(SerializationUtils.java:88)
at 
java.base/java.lang.ThreadLocal$SuppliedThreadLocal.initialValue(Unknown Source)
at java.base/java.lang.ThreadLocal.setInitialValue(Unknown Source)
at java.base/java.lang.ThreadLocal.get(Unknown Source)
at 
org.apache.hudi.common.util.SerializationUtils.serialize(SerializationUtils.java:52)
at 
org.apache.hudi.common.util.collection.RocksDBDAO.serializePayload(RocksDBDAO.java:469)
at 
org.apache.hudi.common.util.collection.RocksDBDAO.putInBatch(RocksDBDAO.java:175)
at 
org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$12(RocksDbBasedFileSystemView.java:237)
at 
java.base/java.util.TreeMap$ValueSpliterator.forEachRemaining(Unknown Source)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(Unknown 
Source)
at 
org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$13(RocksDbBasedFileSystemView.java:236)
at 
org.apache.hudi.common.util.collection.RocksDBDAO.writeBatch(RocksDBDAO.java:157)
at 
org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$storePartitionView$14(RocksDbBasedFileSystemView.java:235)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at 
org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.storePartitionView(RocksDbBasedFileSystemView.java:234)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$addFilesToView$2(AbstractTableFileSystemView.java:146)
at java.base/java.util.HashMap.forEach(Unknown Source)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:134)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:308)
at 
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFilesBeforeOrOn(AbstractTableFileSystemView.java:489)
at 
org.apache.hudi.timeline.service.handlers.BaseFileHandler.getLatestDataFilesBeforeOrOn(BaseFileHandler.java:60)
at 
org.apache.hudi.timeline.service.RequestHandler.lambda$registerDataFilesAPI$6(RequestHandler.java:268)
at 
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:497)
at 
io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
at 
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
at 
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
at 
io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
at 
io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
at 
org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:502)
at org.ec

[GitHub] [hudi] hudi-bot commented on pull request #4877: [HUDI-3457][Stacked on 4818] Refactored Spark DataSource Relations to avoid code duplication

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4877:
URL: https://github.com/apache/hudi/pull/4877#issuecomment-1064793172


   
   ## CI report:
   
   * 2940f46a133ca3142f7ebb26b8c6f20583d7f395 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6814)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4877: [HUDI-3457][Stacked on 4818] Refactored Spark DataSource Relations to avoid code duplication

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4877:
URL: https://github.com/apache/hudi/pull/4877#issuecomment-1064717467


   
   ## CI report:
   
   * d875e412abc29bf6a0e8a6fa7bef747ded15d60b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6284)
 
   * 2940f46a133ca3142f7ebb26b8c6f20583d7f395 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6814)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wxplovecc closed pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true

2022-03-10 Thread GitBox


wxplovecc closed pull request #4654:
URL: https://github.com/apache/hudi/pull/4654


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-3522) Introduce DropColumnSchemaPostProcessor to support drop columns from schema

2022-03-10 Thread Xianghu Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghu Wang closed HUDI-3522.
--
Resolution: Fixed

Resolved via master : 83cff3afee15e129034eb51e68a1734c55d85da2

> Introduce DropColumnSchemaPostProcessor to support drop columns from schema
> ---
>
> Key: HUDI-3522
> URL: https://issues.apache.org/jira/browse/HUDI-3522
> Project: Apache Hudi
>  Issue Type: Task
>  Components: deltastreamer
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> A SchemaPostProcessor to drop columns from given schema



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] wxplovecc closed pull request #4981: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause o…

2022-03-10 Thread GitBox


wxplovecc closed pull request #4981:
URL: https://github.com/apache/hudi/pull/4981


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (9dc6df5 -> 83cff3a)

2022-03-10 Thread wangxianghu
This is an automated email from the ASF dual-hosted git repository.

wangxianghu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 9dc6df5  [HUDI-3595] Fixing NULL schema provider for empty batch 
(#5002)
 add 83cff3a  [HUDI-3522] Introduce DropColumnSchemaPostProcessor to 
support drop columns from schema (#4972)

No new revisions were added by this update.

Summary of changes:
 .../schema/DropColumnSchemaPostProcessor.java  | 88 ++
 .../hudi/utilities/TestSchemaPostProcessor.java| 25 ++
 2 files changed, 113 insertions(+)
 create mode 100644 
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/DropColumnSchemaPostProcessor.java


[GitHub] [hudi] wangxianghu merged pull request #4972: [HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema

2022-03-10 Thread GitBox


wangxianghu merged pull request #4972:
URL: https://github.com/apache/hudi/pull/4972


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4996: [HUDI-3594][Stacked on 4948] Supporting Composite Expressions over Data Table Columns in Data Skipping flow

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4996:
URL: https://github.com/apache/hudi/pull/4996#issuecomment-1064734709


   
   ## CI report:
   
   * 25578be3436f3a95af26f99368dd581efc5062e0 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6811)
 
   * 9de43c5d691fa4a4f383a4647ddefa4798fa127d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6816)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4996: [HUDI-3594][Stacked on 4948] Supporting Composite Expressions over Data Table Columns in Data Skipping flow

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4996:
URL: https://github.com/apache/hudi/pull/4996#issuecomment-1064776265


   
   ## CI report:
   
   * 9de43c5d691fa4a4f383a4647ddefa4798fa127d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6816)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-3593) AsyncClustering failed because of ConcurrentModificationException

2022-03-10 Thread shibei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504719#comment-17504719
 ] 

shibei edited comment on HUDI-3593 at 3/11/22, 5:04 AM:


Another failure
{code:java}
 [ERROR] Tests run: 46, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
853.859 s <<< FAILURE! - in JUnit Vintage
 [ERROR] String, String, String).[6] MERGE_ON_READ, linear, 
null(testLayoutOptimizationFunctional  Time elapsed: 6.185 s  <<< ERROR!
 org.apache.spark.SparkException: Writing job failed.
     at 
org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:87)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
     at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
     at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
     at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
     at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:260)
     at 
org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:502)
     at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:172)
     at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162)
     at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
     at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
     at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
     at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
     at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
     at 
org.apache.hudi.functional.TestLayoutOptimization.testLayoutOptimizationFunctional(TestLayoutOptimization.scala:109)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
     at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(Metho

[jira] [Comment Edited] (HUDI-3593) AsyncClustering failed because of ConcurrentModificationException

2022-03-10 Thread shibei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504719#comment-17504719
 ] 

shibei edited comment on HUDI-3593 at 3/11/22, 5:03 AM:


Another failure
{code:java}
 [ERROR] Tests run: 46, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
853.859 s <<< FAILURE! - in JUnit Vintage
 [ERROR] String, String, String).[6] MERGE_ON_READ, linear, 
null(testLayoutOptimizationFunctional  Time elapsed: 6.185 s  <<< ERROR!
 org.apache.spark.SparkException: Writing job failed.
     at 
org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:87)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
     at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
     at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
     at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
     at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:260)
     at 
org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:502)
     at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:172)
     at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162)
     at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
     at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
     at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
     at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
     at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
     at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
     at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
     at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
     at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
     at 
org.apache.hudi.functional.TestLayoutOptimization.testLayoutOptimizationFunctional(TestLayoutOptimization.scala:109)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
     at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(Metho

[GitHub] [hudi] xushiyan commented on a change in pull request #4962: [HUDI-3355] Issue with out of order commits in the timeline when ingestion writers using SparkAllowUpdateStrategy

2022-03-10 Thread GitBox


xushiyan commented on a change in pull request #4962:
URL: https://github.com/apache/hudi/pull/4962#discussion_r824386018



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/TransactionUtils.java
##
@@ -137,4 +165,20 @@
   throw new HoodieIOException("Unable to read metadata for instant " + 
hoodieInstantOption.get(), io);
 }
   }
+
+  /**
+   * Get pending clustering instant.
+   * Notice:
+   *   we return .requested instant here.
+   *
+   * @param metaClient
+   * @return
+   */
+  public static List 
getUncheckedPendingClusteringInstants(HoodieTableMetaClient metaClient) {

Review comment:
   shall we call it "ReplaceRequestedInstant" to be specific? Also 
"unchecked" is only in the context of write client; `TransactionUtils` does not 
know "unchecked" or not.
   
   ```suggestion
 public static List 
getPendingReplaceRequestedInstants(HoodieTableMetaClient metaClient) {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4948: [HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4948:
URL: https://github.com/apache/hudi/pull/4948#issuecomment-1064773990


   
   ## CI report:
   
   * 14366cac6e233cb85ee94307a7f62f6184ed5b34 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6812)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4948: [HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4948:
URL: https://github.com/apache/hudi/pull/4948#issuecomment-1064707297


   
   ## CI report:
   
   * 4421752bef3dd3b53cd896f7d3ca23bb49d22034 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6669)
 
   * 14366cac6e233cb85ee94307a7f62f6184ed5b34 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6812)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4971:
URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064705899


   
   ## CI report:
   
   * 8e89371fed3d147b43959a73e3e6a33cfaefd32c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6650)
 
   * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4971:
URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064772913


   
   ## CI report:
   
   * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-3593) AsyncClustering failed because of ConcurrentModificationException

2022-03-10 Thread shibei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504719#comment-17504719
 ] 

shibei edited comment on HUDI-3593 at 3/11/22, 4:10 AM:


{code:java}
at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:371)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.map(RDD.scala:370)
at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:93)
at 
org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:45)
at 
org.apache.hudi.client.clustering.run.strategy.MultipleSparkJobExecutionStrategy.readRecordsForGroupBaseFiles(MultipleSparkJobExecutionStrategy.java:269)
at 
org.apache.hudi.client.clustering.run.strategy.MultipleSparkJobExecutionStrategy.readRecordsForGroup(MultipleSparkJobExecutionStrategy.java:191)
at 
org.apache.hudi.client.clustering.run.strategy.MultipleSparkJobExecutionStrategy.lambda$runClusteringForGroupAsync$4(MultipleSparkJobExecutionStrategy.java:171)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
... 1 more
Caused by: java.util.ConcurrentModificationException
at 
java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
at 
java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
at java.util.HashSet.writeObject(HashSet.java:287)
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1154)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400)
... 16 more
 {code}


was (Author: JIRAUSER279853):
Another failure

 
{code:java}
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1154)
at 
java.io

[jira] [Commented] (HUDI-3593) AsyncClustering failed because of ConcurrentModificationException

2022-03-10 Thread shibei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504719#comment-17504719
 ] 

shibei commented on HUDI-3593:
--

Another failure

 
{code:java}
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1154)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400)
... 16 more {code}
 

 

> AsyncClustering failed because of ConcurrentModificationException
> -
>
> Key: HUDI-3593
> URL: https://issues.apache.org/jira/browse/HUDI-3593
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2022-03-10 at 9.53.13 AM.png
>
>
> Following is the stacktrace I met,
> {code:java}
>  ERROR AsyncClusteringService: Clustering executor failed 
> java.util.concurrent.CompletionException: org.apache.spark.SparkException: 
> Task not serializable 
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
>  
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
>  
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
>  
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
>  
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
> at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
> at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> Caused by: org.apache.spark.SparkException: Task not serializable 
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
>  
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) 
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) 
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2467) 
> at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$1(RDD.scala:912) 
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>  
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) 
> at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:911) 
> at 
> org.apache.spark.api.java.JavaRDDLike.mapPartitionsWithIndex(JavaRDDLike.scala:103)
>  
> at 

[GitHub] [hudi] hudi-bot removed a comment on pull request #4489: [HUDI-3135] Fix Delete partitions with metadata table and fix show partitions in spark sql

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4489:
URL: https://github.com/apache/hudi/pull/4489#issuecomment-1064705595


   
   ## CI report:
   
   * e74a30e1b9f4395780cfe412d3574dabe2ae9f57 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6795)
 
   * d17343318be38b5a9b0953004700aa72f4fed689 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6809)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4489: [HUDI-3135] Fix Delete partitions with metadata table and fix show partitions in spark sql

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4489:
URL: https://github.com/apache/hudi/pull/4489#issuecomment-1064751722


   
   ## CI report:
   
   * d17343318be38b5a9b0953004700aa72f4fed689 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6809)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] melin opened a new issue #5016: [SUPPORT] Add AS OF syntax support

2022-03-10 Thread GitBox


melin opened a new issue #5016:
URL: https://github.com/apache/hudi/issues/5016


   Use sql to query the specified version data
   ``` 
   SELECT * FROM default.people10m VERSION AS OF 0; 
   SELECT * FROM default.people10m TIMESTAMP AS OF '2019-01-29 00:37:58'; 
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4888: [HUDI-3396][Stacked on 4877] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected columns

2022-03-10 Thread GitBox


hudi-bot commented on pull request #4888:
URL: https://github.com/apache/hudi/pull/4888#issuecomment-1064748224


   
   ## CI report:
   
   * e0afa9f1de90411220a6c1d25c0c9e43f09f6baf Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6815)
 
   * b07cca5112163e153385c690203603b74542ace6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6820)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4888: [HUDI-3396][Stacked on 4877] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected columns

2022-03-10 Thread GitBox


hudi-bot removed a comment on pull request #4888:
URL: https://github.com/apache/hudi/pull/4888#issuecomment-1064723457


   
   ## CI report:
   
   * e0afa9f1de90411220a6c1d25c0c9e43f09f6baf Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6815)
 
   * b07cca5112163e153385c690203603b74542ace6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (fa5e750 -> 9dc6df5)

2022-03-10 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from fa5e750  [HUDI-3586] Add Trino Queries in integration tests (#4988)
 add 9dc6df5  [HUDI-3595] Fixing NULL schema provider for empty batch 
(#5002)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/common/util/CommitUtils.java   |  5 ++-
 .../functional/TestHoodieDeltaStreamer.java| 28 -
 .../sources/TestParquetDFSSourceEmptyBatch.java| 49 ++
 3 files changed, 80 insertions(+), 2 deletions(-)
 create mode 100644 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestParquetDFSSourceEmptyBatch.java


[GitHub] [hudi] nsivabalan merged pull request #5002: [HUDI-3595] Fixing NULL schema provider for empty batch

2022-03-10 Thread GitBox


nsivabalan merged pull request #5002:
URL: https://github.com/apache/hudi/pull/5002


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   >