[GitHub] [hudi] hudi-bot removed a comment on pull request #4984: [HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException
hudi-bot removed a comment on pull request #4984: URL: https://github.com/apache/hudi/pull/4984#issuecomment-1064863475 ## CI report: * 015f7f0e07d3f0efbd8d3a728f802fc5572a8f52 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6694) * e30a63cc90f3afbea7ee36c37283f2f21ea7998f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4984: [HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException
hudi-bot commented on pull request #4984: URL: https://github.com/apache/hudi/pull/4984#issuecomment-1064865517 ## CI report: * 015f7f0e07d3f0efbd8d3a728f802fc5572a8f52 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6694) * e30a63cc90f3afbea7ee36c37283f2f21ea7998f UNKNOWN * c0a0e141561d1d75150aab046090e1ccd1c9e2c2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4984: [HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException
hudi-bot removed a comment on pull request #4984: URL: https://github.com/apache/hudi/pull/4984#issuecomment-1061697101 ## CI report: * 015f7f0e07d3f0efbd8d3a728f802fc5572a8f52 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6694) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4984: [HUDI-3583] Fix MarkerBasedRollbackStrategy NoSuchElementException
hudi-bot commented on pull request #4984: URL: https://github.com/apache/hudi/pull/4984#issuecomment-1064863475 ## CI report: * 015f7f0e07d3f0efbd8d3a728f802fc5572a8f52 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6694) * e30a63cc90f3afbea7ee36c37283f2f21ea7998f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteH
danny0405 commented on a change in pull request #5018: URL: https://github.com/apache/hudi/pull/5018#discussion_r824461742 ## File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkWriteHelper.java ## @@ -113,5 +114,10 @@ public static FlinkWriteHelper newInstance() { hoodieRecord.setCurrentLocation(rec1.getCurrentLocation()); return hoodieRecord; }).orElse(null)).filter(Objects::nonNull).collect(Collectors.toList()); + +if (hasInsert) { + recordList.get(0).getCurrentLocation().setInstantTime("I"); +} +return recordList; Review comment: In line 114, we already reset the location, so each records list under the same key after reduction should have the same instant time type as before, so why the set is needed ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4872: [HUDI-3475] Support run compaction / clustering job in Service
hudi-bot commented on pull request #4872: URL: https://github.com/apache/hudi/pull/4872#issuecomment-1064853531 ## CI report: * 0fd561ae050f39c022862eae351c73b323a61e05 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6209) * c662e400cd71c1dbba9b4f37512ca5e748736f03 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6833) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4872: [HUDI-3475] Support run compaction / clustering job in Service
hudi-bot removed a comment on pull request #4872: URL: https://github.com/apache/hudi/pull/4872#issuecomment-1064851908 ## CI report: * 0fd561ae050f39c022862eae351c73b323a61e05 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6209) * c662e400cd71c1dbba9b4f37512ca5e748736f03 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4872: [HUDI-3475] Support run compaction / clustering job in Service
hudi-bot commented on pull request #4872: URL: https://github.com/apache/hudi/pull/4872#issuecomment-1064851908 ## CI report: * 0fd561ae050f39c022862eae351c73b323a61e05 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6209) * c662e400cd71c1dbba9b4f37512ca5e748736f03 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4872: [HUDI-3475] Support run compaction / clustering job in Service
hudi-bot removed a comment on pull request #4872: URL: https://github.com/apache/hudi/pull/4872#issuecomment-1048383460 ## CI report: * 0fd561ae050f39c022862eae351c73b323a61e05 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6209) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu edited a comment on pull request #4969: [HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at one time
wangxianghu edited a comment on pull request #4969: URL: https://github.com/apache/hudi/pull/4969#issuecomment-1064849682 hi @nsivabalan can we add this processor ? it is very useful in scenarios with diversified data requirements. In our comany, we have use this feature to add multiple processors to our pipeline: 1. maxwell post processor : extract data from maxwell json string. this is a standard processor 2. Encrypt post processor : Encrypt some fields for safety purpose 3. flag post processor : this is quite a business related processor. with ChainedJsonKafkaSourcePostProcessor we can make data processing more flexible, it makes up for the lack of expression ability of `Transformer` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu edited a comment on pull request #4969: [HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at one time
wangxianghu edited a comment on pull request #4969: URL: https://github.com/apache/hudi/pull/4969#issuecomment-1064849682 hi @nsivabalan can we add this processor ? it is very useful in scenarios with diversified data requirements. In our comany, we have use this feature to add multiple processors to our pipeline: 1. maxwell post processor : extract data from maxwell json string. this is a standard processor 2. Encrypt post processor : Encrypt some fields for safety purpose 3. flag post processor : this is quite business related processor. with ChainedJsonKafkaSourcePostProcessor we can make data processing more flexible, it makes up for the lack of expression ability of `Transformer` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on pull request #4969: [HUDI-3569] Introduce ChainedJsonKafkaSourePostProcessor to support setting multi processors at one time
wangxianghu commented on pull request #4969: URL: https://github.com/apache/hudi/pull/4969#issuecomment-1064849682 hi @nsivabalan can we add this processor ? it is very useful in scenarios with diversified data requirements. In our comany, we have use this feature to add multiple processors in our pipeline: 1. maxwell post processor : extract data from maxwell json string. this is a standard processor 2. Encrypt post processor : Encrypt some fields for safety purpose 3. flag post processor : this is quite business related processor. with ChainedJsonKafkaSourcePostProcessor we can make data processing more flexible, it makes up for the lack of expression ability of `Transformer` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] prashantwason commented on a change in pull request #4640: [HUDI-3225] [RFC-45] for async metadata indexing
prashantwason commented on a change in pull request #4640: URL: https://github.com/apache/hudi/pull/4640#discussion_r824447974 ## File path: rfc/rfc-45/rfc-45.md ## @@ -0,0 +1,264 @@ + + +# RFC-45: Asynchronous Metadata Indexing + +## Proposers + +- @codope +- @manojpec + +## Approvers + +- @nsivabalan +- @vinothchandar + +## Status + +JIRA: [HUDI-2488](https://issues.apache.org/jira/browse/HUDI-2488) + +## Abstract + +Metadata indexing (aka metadata bootstrapping) is the process of creation of one +or more metadata-based indexes, e.g. data partitions to files index, that is +stored in Hudi metadata table. Currently, the metadata table (referred as MDT +hereafter) supports single partition which is created synchronously with the +corresponding data table, i.e. commits are first applied to metadata table +followed by data table. Our goal for MDT is to support multiple partitions to +boost the performance of existing index and records lookup. However, the +synchronous manner of metadata indexing is not very scalable as we add more +partitions to the MDT because the regular writers (writing to the data table) +have to wait until the MDT commit completes. In this RFC, we propose a design to +support asynchronous metadata indexing. + +## Background + +We can read more about the MDT design +in [RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements) +. Here is a quick summary of the current state (Hudi v0.10.1). MDT is an +internal Merge-on-Read (MOR) table that has a single partition called `files` +which stores the data partitions to files index that is used in file listing. +MDT is co-located with the data table (inside `.hoodie/metadata` directory under +the basepath). In order to handle multi-writer scenario, users configure lock +provider and only one writer can access MDT in read-write mode. Hence, any write +to MDT is guarded by the data table lock. This ensures only one write is +committed to MDT at any point in time and thus guarantees serializability. +However, locking overhead adversely affects the write throughput and will reach +its scalability limits as we add more partitions to the MDT. + +## Goals + +- Support indexing one or more partitions in MDT while regular writers and table + services (such as cleaning or compaction) are in progress. +- Locking to be as lightweight as possible. +- Keep required config changes to a minimum to simplify deployment / upgrade in + production. +- Do not require specific ordering of how writers and table service pipelines + need to be upgraded / restarted. +- If an external long-running process is being used to initialize the index, the + process should be made idempotent so it can handle errors from previous runs. +- To re-initialize the index, make it as simple as running the external + initialization process again without having to change configs. + +## Implementation + +### A new Hudi action: INDEX + +We introduce a new action `index` which will denote the index building process, +the mechanics of which is as follows: + +1. From an external process, users can issue a CREATE INDEX or similar statement + to trigger indexing for an existing table. +1. This will schedule INDEX action and add + a `.index.requested` to the timeline, which contains the + indexing plan. Index scheduling will also initialize the filegroup for + the partitions for which indexing is planned. +2. From here on, the index building process will continue to build an index + up to instant time `t`, where `t` is the latest completed instant time on + the timeline without any + "holes" i.e. no pending async operations prior to it. +3. The indexing process will write these out as base files within the + corresponding metadata partition. A metadata partition cannot be used if + there is any pending indexing action against it. As and when indexing is + completed for a partition, then table config (`hoodie.properties`) will + be updated to indicate that partition is available for reads or + synchronous updates. Hudi table config will be the source of truth for + the current state of metadata index. + +2. Any inflight writers (i.e. with instant time `t'` > `t`) will check for any + new indexing request on the timeline prior to preparing to commit. +1. Such writers will proceed to additionally add log entries corresponding + to each such indexing request into the metadata partition. +2. There is always a TOCTOU issue here, where the inflight writer may not + see an indexing request that was just added and proceed to commit without + that. We will correct this during indexing action completion. In the + average case, this may not happen and the design has liveness. + +3. When the indexing process is about to complete (i.e. indexing upto + instant `t` is done but before completing indexing commit), it
[GitHub] [hudi] hudi-bot commented on pull request #5013: [HUDI-3593] Restore TypedProperties and flush checksum in table config
hudi-bot commented on pull request #5013: URL: https://github.com/apache/hudi/pull/5013#issuecomment-1064845719 ## CI report: * a2e2b2ecd3ffe2974fac5e6472c2ab273f4d13c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6800) * f50fc2686b0c3b7f17c741ca99db9629aafc6b66 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6832) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #5013: [HUDI-3593] Restore TypedProperties and flush checksum in table config
hudi-bot removed a comment on pull request #5013: URL: https://github.com/apache/hudi/pull/5013#issuecomment-1064844072 ## CI report: * a2e2b2ecd3ffe2974fac5e6472c2ab273f4d13c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6800) * f50fc2686b0c3b7f17c741ca99db9629aafc6b66 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5013: [HUDI-3593] Restore TypedProperties and flush checksum in table config
hudi-bot commented on pull request #5013: URL: https://github.com/apache/hudi/pull/5013#issuecomment-1064844072 ## CI report: * a2e2b2ecd3ffe2974fac5e6472c2ab273f4d13c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6800) * f50fc2686b0c3b7f17c741ca99db9629aafc6b66 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #5013: [HUDI-3593] Restore TypedProperties and flush checksum in table config
hudi-bot removed a comment on pull request #5013: URL: https://github.com/apache/hudi/pull/5013#issuecomment-1064395857 ## CI report: * a2e2b2ecd3ffe2974fac5e6472c2ab273f4d13c4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6800) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables
hudi-bot commented on pull request #4925: URL: https://github.com/apache/hudi/pull/4925#issuecomment-1064843916 ## CI report: * 018bb851445f7eabaa0bd4cc2b362f269d6fec59 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6436) * 7119319af35fb23afa97e058cd2fbfaea18292a1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6831) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables
hudi-bot removed a comment on pull request #4925: URL: https://github.com/apache/hudi/pull/4925#issuecomment-1064842219 ## CI report: * 018bb851445f7eabaa0bd4cc2b362f269d6fec59 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6436) * 7119319af35fb23afa97e058cd2fbfaea18292a1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5019: [HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor
hudi-bot commented on pull request #5019: URL: https://github.com/apache/hudi/pull/5019#issuecomment-1064842384 ## CI report: * 3b6b326bb3650689e8ad78504ccaca3df2700998 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6830) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #5019: [HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor
hudi-bot removed a comment on pull request #5019: URL: https://github.com/apache/hudi/pull/5019#issuecomment-1064840735 ## CI report: * 3b6b326bb3650689e8ad78504ccaca3df2700998 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables
hudi-bot commented on pull request #4925: URL: https://github.com/apache/hudi/pull/4925#issuecomment-1064842219 ## CI report: * 018bb851445f7eabaa0bd4cc2b362f269d6fec59 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6436) * 7119319af35fb23afa97e058cd2fbfaea18292a1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4925: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single sink table from multiple source tables
hudi-bot removed a comment on pull request #4925: URL: https://github.com/apache/hudi/pull/4925#issuecomment-1055496401 ## CI report: * 018bb851445f7eabaa0bd4cc2b362f269d6fec59 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6436) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5019: [HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor
hudi-bot commented on pull request #5019: URL: https://github.com/apache/hudi/pull/5019#issuecomment-1064840735 ## CI report: * 3b6b326bb3650689e8ad78504ccaca3df2700998 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3575) Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor
[ https://issues.apache.org/jira/browse/HUDI-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3575: - Labels: pull-request-available (was: ) > Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in > TestSchemaPostProcessor > > > Key: HUDI-3575 > URL: https://issues.apache.org/jira/browse/HUDI-3575 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Xianghu Wang >Assignee: Xianghu Wang >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] wangxianghu opened a new pull request #5019: [HUDI-3575] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in TestSchemaPostProcessor
wangxianghu opened a new pull request #5019: URL: https://github.com/apache/hudi/pull/5019 ## What is the purpose of the pull request *Use standard test schema in our UT instead of a shema from a specific enterprise data* ## Brief change log ## Verify this pull request This pull request is already covered by existing tests: org.apache.hudi.utilities.TestSchemaPostProcessor ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot commented on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064838694 ## CI report: * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824) * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6829) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot removed a comment on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064827082 ## CI report: * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824) * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4888: [HUDI-3396][Stacked on 4877] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected columns
hudi-bot commented on pull request #4888: URL: https://github.com/apache/hudi/pull/4888#issuecomment-1064831876 ## CI report: * b07cca5112163e153385c690203603b74542ace6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6820) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4888: [HUDI-3396][Stacked on 4877] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected columns
hudi-bot removed a comment on pull request #4888: URL: https://github.com/apache/hudi/pull/4888#issuecomment-1064748224 ## CI report: * e0afa9f1de90411220a6c1d25c0c9e43f09f6baf Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6815) * b07cca5112163e153385c690203603b74542ace6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6820) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot commented on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064827082 ## CI report: * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824) * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot removed a comment on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064825631 ## CI report: * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120) * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824) * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot commented on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064825631 ## CI report: * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120) * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824) * 4a3662cb03d0fbf4f5041b9b27eebd03cd132783 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot removed a comment on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064806027 ## CI report: * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120) * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-3607) Support backend switch in HoodieFlinkStreamer
[ https://issues.apache.org/jira/browse/HUDI-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504744#comment-17504744 ] 刘方奇 commented on HUDI-3607: --- [~wangxianghu] Could you help to take a glance? Can assign it to me. > Support backend switch in HoodieFlinkStreamer > - > > Key: HUDI-3607 > URL: https://issues.apache.org/jira/browse/HUDI-3607 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: 刘方奇 >Priority: Major > > Now, HoodieFlinkStreamer utility only support one backend - FsStateBackend. > I think it's not flexible for the application configuration. Could we make > backend configurable? > Moreover, for flink version 1.14, FsStateBackend is deprecated in favor of > org.apache.flink.runtime.state.hashmap.HashMapStateBackend and > org.apache.flink.runtime.state.storage.FileSystemCheckpointStorage. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] guanziyue commented on a change in pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
guanziyue commented on a change in pull request #4264: URL: https://github.com/apache/hudi/pull/4264#discussion_r824428011 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java ## @@ -101,13 +101,13 @@ public void runMerge(HoodieTable>, JavaRDD } catch (Exception e) { throw new HoodieException(e); } finally { + if (null != wrapper) { Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] guanziyue commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
guanziyue commented on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064821297 > @guanziyue thank you for taking the time to troubleshoot this concurrency issues and implement the fix! > > I echo @vinothchandar concerns and i think we're taking a step a bit too far -- `ParquetWriter` is not assumed to be thread-safe, neither do i believe we should make it such. > > Instead, i believe we should just resolve the problem with its concurrent access (which you already did) and make sure we make it clear that `ParquetWriter` is not thread-safe so its usage need to be properly guarded externally. Hi @alexeykudinkin, may I know if your concern is "adding a lock to parquetWriter" or "adding a lock to hot path"? I'm afraid that it is difficult to come up with a method to guarantee this problem is totally solved except adding a signal to hot path. Producer need to check if current thread is interrupted and response to it in a reasonable time or consumer need to immediately reject any writing just after close method is called, which also need a lock on hot path. For producer solution, we can have a lock-free check. For consumer, we may use volatile rather than a lock? But either of them is adding something to hot path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (83cff3a -> 18cdad9)
This is an automated email from the ASF dual-hosted git repository. garyli pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 83cff3a [HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema (#4972) add 18cdad9 [HUDI-2999] [RFC-42] RFC for consistent hashing index (#4326) No new revisions were added by this update. Summary of changes: rfc/rfc-42/basic_bucket_hashing.png | Bin 0 -> 26942 bytes rfc/rfc-42/bucket_resizing.png | Bin 0 -> 53114 bytes rfc/rfc-42/bucket_resizing_virtual_log_file.png | Bin 0 -> 42742 bytes rfc/rfc-42/consistent_hashing.png | Bin 0 -> 38682 bytes rfc/rfc-42/rfc-42.md| 230 5 files changed, 230 insertions(+) create mode 100644 rfc/rfc-42/basic_bucket_hashing.png create mode 100644 rfc/rfc-42/bucket_resizing.png create mode 100644 rfc/rfc-42/bucket_resizing_virtual_log_file.png create mode 100644 rfc/rfc-42/consistent_hashing.png create mode 100644 rfc/rfc-42/rfc-42.md
[GitHub] [hudi] garyli1019 merged pull request #4326: [HUDI-2999] [RFC-42] RFC for consistent hashing index
garyli1019 merged pull request #4326: URL: https://github.com/apache/hudi/pull/4326 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable
hudi-bot removed a comment on pull request #4982: URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064816693 ## CI report: * 282ca401f8e2a93d7703f592041b854959291d41 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6805) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6817) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable
hudi-bot commented on pull request #4982: URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064820630 ## CI report: * 282ca401f8e2a93d7703f592041b854959291d41 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6805) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6817) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6827) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #5015: [HUDI-3513] Make sure Column Stats does not fail in case it fails to load previous Index Table state
hudi-bot removed a comment on pull request #5015: URL: https://github.com/apache/hudi/pull/5015#issuecomment-1064739346 ## CI report: * 16c497f48a922830b3fbcb833bca203c292158da Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6818) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5015: [HUDI-3513] Make sure Column Stats does not fail in case it fails to load previous Index Table state
hudi-bot commented on pull request #5015: URL: https://github.com/apache/hudi/pull/5015#issuecomment-1064819300 ## CI report: * 16c497f48a922830b3fbcb833bca203c292158da Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6818) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] huberylee commented on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable
huberylee commented on pull request #4982: URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064819324 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-3607) Support backend switch in HoodieFlinkStreamer
刘方奇 created HUDI-3607: - Summary: Support backend switch in HoodieFlinkStreamer Key: HUDI-3607 URL: https://issues.apache.org/jira/browse/HUDI-3607 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: 刘方奇 Now, HoodieFlinkStreamer utility only support one backend - FsStateBackend. I think it's not flexible for the application configuration. Could we make backend configurable? Moreover, for flink version 1.14, FsStateBackend is deprecated in favor of org.apache.flink.runtime.state.hashmap.HashMapStateBackend and org.apache.flink.runtime.state.storage.FileSystemCheckpointStorage. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable
hudi-bot removed a comment on pull request #4982: URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064734674 ## CI report: * 282ca401f8e2a93d7703f592041b854959291d41 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6805) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6817) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4982: [HUDI-3567] Refactor HoodieCommonUtils to make code more reasonable
hudi-bot commented on pull request #4982: URL: https://github.com/apache/hudi/pull/4982#issuecomment-1064816693 ## CI report: * 282ca401f8e2a93d7703f592041b854959291d41 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6805) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6817) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way
hudi-bot removed a comment on pull request #4971: URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064813820 ## CI report: * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810) * 7367ebfc60119b4442988ebc7350e4daac15b65f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way
hudi-bot commented on pull request #4971: URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064815294 ## CI report: * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810) * 7367ebfc60119b4442988ebc7350e4daac15b65f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6826) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way
hudi-bot commented on pull request #4971: URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064813820 ## CI report: * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810) * 7367ebfc60119b4442988ebc7350e4daac15b65f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way
hudi-bot removed a comment on pull request #4971: URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064772913 ## CI report: * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs edited a comment on pull request #4999: [HUDI-3592] Fix NPE of DefaultHoodieRecordPayload if Property is empty
boneanxs edited a comment on pull request #4999: URL: https://github.com/apache/hudi/pull/4999#issuecomment-1064809186 @nsivabalan @xushiyan @XuQianJin-Stars could you pls review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs commented on pull request #4999: [HUDI-3592] Fix NPE of DefaultHoodieRecordPayload if Property is empty
boneanxs commented on pull request #4999: URL: https://github.com/apache/hudi/pull/4999#issuecomment-1064809186 @nsivabalan @xushiyan could you pls review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of
hudi-bot commented on pull request #5018: URL: https://github.com/apache/hudi/pull/5018#issuecomment-1064807946 ## CI report: * b9e437b2c2942ba29945d1d21c7e214e350e4333 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6825) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper
hudi-bot removed a comment on pull request #5018: URL: https://github.com/apache/hudi/pull/5018#issuecomment-1064806485 ## CI report: * b9e437b2c2942ba29945d1d21c7e214e350e4333 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of
hudi-bot commented on pull request #5018: URL: https://github.com/apache/hudi/pull/5018#issuecomment-1064806485 ## CI report: * b9e437b2c2942ba29945d1d21c7e214e350e4333 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wxplovecc commented on pull request #4981: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause o…
wxplovecc commented on pull request #4981: URL: https://github.com/apache/hudi/pull/4981#issuecomment-1064806140 https://github.com/apache/hudi/pull/5018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot commented on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064806027 ## CI report: * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120) * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6824) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot removed a comment on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064804622 ## CI report: * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120) * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wxplovecc opened a new pull request #5018: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause of deduplicateRecords method in FlinkWriteHelper out of
wxplovecc opened a new pull request #5018: URL: https://github.com/apache/hudi/pull/5018 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request This pull request avoid deduplicateRecords method in FlinkWriteHelper run out of order ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot commented on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1064804622 ## CI report: * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120) * 4a9c78781cc4efcf3f13d6f12836b6fc3e738878 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4264: [HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …
hudi-bot removed a comment on pull request #4264: URL: https://github.com/apache/hudi/pull/4264#issuecomment-1044252671 ## CI report: * 6f55461f206b4608607bc8ce706d9fa451dd2ab7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6120) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-184) Integrate Hudi with Apache Flink
[ https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-184. - Resolution: Implemented This feature has been tracked via https://issues.apache.org/jira/browse/HUDI-1521 > Integrate Hudi with Apache Flink > > > Key: HUDI-184 > URL: https://issues.apache.org/jira/browse/HUDI-184 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > Apache Flink is a popular streaming processing engine. > Integrating Hudi with Flink is a valuable work. > The discussion mailing thread is here: > [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Reopened] (HUDI-184) Integrate Hudi with Apache Flink
[ https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reopened HUDI-184: --- > Integrate Hudi with Apache Flink > > > Key: HUDI-184 > URL: https://issues.apache.org/jira/browse/HUDI-184 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > Apache Flink is a popular streaming processing engine. > Integrating Hudi with Flink is a valuable work. > The discussion mailing thread is here: > [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] guanziyue commented on a change in pull request #4913: [HUDI-1517] create marker file for every log file
guanziyue commented on a change in pull request #4913: URL: https://github.com/apache/hudi/pull/4913#discussion_r824411709 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java ## @@ -113,22 +116,37 @@ // Header metadata for a log block protected final Map header = new HashMap<>(); private SizeEstimator sizeEstimator; + protected final WriteMarkers writeMarkers; + private final IOType ioType; private Properties recordProperties = new Properties(); public HoodieAppendHandle(HoodieWriteConfig config, String instantTime, HoodieTable hoodieTable, -String partitionPath, String fileId, Iterator> recordItr, TaskContextSupplier taskContextSupplier) { +String partitionPath, String fileId, Iterator> recordItr, +TaskContextSupplier taskContextSupplier, IOType ioType) { super(config, instantTime, partitionPath, fileId, hoodieTable, taskContextSupplier); this.fileId = fileId; this.recordItr = recordItr; sizeEstimator = new DefaultSizeEstimator(); this.statuses = new ArrayList<>(); this.recordProperties.putAll(config.getProps()); +this.writeMarkers = WriteMarkersFactory.get(config.getMarkersType(), hoodieTable, instantTime); +this.ioType = ioType; } + // constructor used for creating new file group public HoodieAppendHandle(HoodieWriteConfig config, String instantTime, HoodieTable hoodieTable, String partitionPath, String fileId, TaskContextSupplier sparkTaskContextSupplier) { -this(config, instantTime, hoodieTable, partitionPath, fileId, null, sparkTaskContextSupplier); +this(config, instantTime, hoodieTable, partitionPath, fileId, null, sparkTaskContextSupplier, +IOType.CREATE); Review comment: For indexes which have attribute canindexLogFile. Currently, HbaseIndex, Flink State index and memory Index has this attribute. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5017: [HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom
hudi-bot commented on pull request #5017: URL: https://github.com/apache/hudi/pull/5017#issuecomment-1064800761 ## CI report: * d1211dd592bcb9e3df60b80b9585d2eda9f0b8ab Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6823) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #5017: [HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom
hudi-bot removed a comment on pull request #5017: URL: https://github.com/apache/hudi/pull/5017#issuecomment-1064799451 ## CI report: * d1211dd592bcb9e3df60b80b9585d2eda9f0b8ab UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5017: [HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom
hudi-bot commented on pull request #5017: URL: https://github.com/apache/hudi/pull/5017#issuecomment-1064799451 ## CI report: * d1211dd592bcb9e3df60b80b9585d2eda9f0b8ab UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3606) ClassNotFoundException: org.objenesis.strategy.InstantiatorStrategy
[ https://issues.apache.org/jira/browse/HUDI-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cdmikechen updated HUDI-3606: - Description: When using *hudi-timeline-server-bundle* in hadoop server (3.2.2), hudi will occasionally encounter errors similar to the this. {code} 2022-03-11 05:28:48,223 [qtp818093527-18] ERROR javalin.Javalin: Exception occurred while servicing http-request java.lang.NoClassDefFoundError: org/objenesis/strategy/InstantiatorStrategy at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.(SerializationUtils.java:88) at java.base/java.lang.ThreadLocal$SuppliedThreadLocal.initialValue(Unknown Source) at java.base/java.lang.ThreadLocal.setInitialValue(Unknown Source) at java.base/java.lang.ThreadLocal.get(Unknown Source) at org.apache.hudi.common.util.SerializationUtils.serialize(SerializationUtils.java:52) at org.apache.hudi.common.util.collection.RocksDBDAO.serializePayload(RocksDBDAO.java:469) at org.apache.hudi.common.util.collection.RocksDBDAO.putInBatch(RocksDBDAO.java:175) at org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$12(RocksDbBasedFileSystemView.java:237) at java.base/java.util.TreeMap$ValueSpliterator.forEachRemaining(Unknown Source) at java.base/java.util.stream.ReferencePipeline$Head.forEach(Unknown Source) at org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$13(RocksDbBasedFileSystemView.java:236) at org.apache.hudi.common.util.collection.RocksDBDAO.writeBatch(RocksDBDAO.java:157) at org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$storePartitionView$14(RocksDbBasedFileSystemView.java:235) at java.base/java.util.ArrayList.forEach(Unknown Source) at org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.storePartitionView(RocksDbBasedFileSystemView.java:234) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$addFilesToView$2(AbstractTableFileSystemView.java:146) at java.base/java.util.HashMap.forEach(Unknown Source) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:134) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:308) at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFilesBeforeOrOn(AbstractTableFileSystemView.java:489) at org.apache.hudi.timeline.service.handlers.BaseFileHandler.getLatestDataFilesBeforeOrOn(BaseFileHandler.java:60) at org.apache.hudi.timeline.service.RequestHandler.lambda$registerDataFilesAPI$6(RequestHandler.java:268) at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:497) at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22) at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606) at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46) at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17) at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143) at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41) at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107) at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61) at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:502) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) at org.eclipse.jetty.io.AbstractConnection$ReadCallback
[jira] [Updated] (HUDI-3606) ClassNotFoundException: org.objenesis.strategy.InstantiatorStrategy
[ https://issues.apache.org/jira/browse/HUDI-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3606: - Labels: pull-request-available (was: ) > ClassNotFoundException: org.objenesis.strategy.InstantiatorStrategy > --- > > Key: HUDI-3606 > URL: https://issues.apache.org/jira/browse/HUDI-3606 > Project: Apache Hudi > Issue Type: Bug > Components: timeline-server >Affects Versions: 0.10.1 >Reporter: cdmikechen >Priority: Major > Labels: pull-request-available > > When user *hudi-timeline-server-bundle* in hadoop server (3.2.2), hudi will > occasionally encounter errors similar to the this. > {code} > 2022-03-11 05:28:48,223 [qtp818093527-18] ERROR javalin.Javalin: Exception > occurred while servicing http-request > java.lang.NoClassDefFoundError: org/objenesis/strategy/InstantiatorStrategy > at > org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.(SerializationUtils.java:88) > at > java.base/java.lang.ThreadLocal$SuppliedThreadLocal.initialValue(Unknown > Source) > at java.base/java.lang.ThreadLocal.setInitialValue(Unknown Source) > at java.base/java.lang.ThreadLocal.get(Unknown Source) > at > org.apache.hudi.common.util.SerializationUtils.serialize(SerializationUtils.java:52) > at > org.apache.hudi.common.util.collection.RocksDBDAO.serializePayload(RocksDBDAO.java:469) > at > org.apache.hudi.common.util.collection.RocksDBDAO.putInBatch(RocksDBDAO.java:175) > at > org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$12(RocksDbBasedFileSystemView.java:237) > at > java.base/java.util.TreeMap$ValueSpliterator.forEachRemaining(Unknown Source) > at java.base/java.util.stream.ReferencePipeline$Head.forEach(Unknown > Source) > at > org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$13(RocksDbBasedFileSystemView.java:236) > at > org.apache.hudi.common.util.collection.RocksDBDAO.writeBatch(RocksDBDAO.java:157) > at > org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$storePartitionView$14(RocksDbBasedFileSystemView.java:235) > at java.base/java.util.ArrayList.forEach(Unknown Source) > at > org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.storePartitionView(RocksDbBasedFileSystemView.java:234) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$addFilesToView$2(AbstractTableFileSystemView.java:146) > at java.base/java.util.HashMap.forEach(Unknown Source) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:134) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:308) > at > java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown > Source) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFilesBeforeOrOn(AbstractTableFileSystemView.java:489) > at > org.apache.hudi.timeline.service.handlers.BaseFileHandler.getLatestDataFilesBeforeOrOn(BaseFileHandler.java:60) > at > org.apache.hudi.timeline.service.RequestHandler.lambda$registerDataFilesAPI$6(RequestHandler.java:268) > at > org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:497) > at > io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22) > at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606) > at > io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46) > at > io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17) > at > io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143) > at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41) > at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107) > at > io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) > at > org.eclipse.jetty.server.handler.ScopedHandle
[GitHub] [hudi] cdmikechen opened a new pull request #5017: [HUDI-3606] Add `org.objenesis:objenesis` to hudi-timeline-server-bundle pom
cdmikechen opened a new pull request #5017: URL: https://github.com/apache/hudi/pull/5017 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request This pull request adds `org.objenesis:objenesis` to hudi-timeline-server-bundle pom. ## Brief change log Add `org.objenesis:objenesis` include to hudi-timeline-server-bundle pom ## Verify this pull request In theory, as long as Ci passes, it can be proved that there is no problem ## Committer checklist - [x] Has a corresponding JIRA in PR title & commit - [x] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-609) Implement a Flink specific HoodieIndex
[ https://issues.apache.org/jira/browse/HUDI-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-609. - Resolution: Won't Do > Implement a Flink specific HoodieIndex > -- > > Key: HUDI-609 > URL: https://issues.apache.org/jira/browse/HUDI-609 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > Indexing is a key step in hudi's write flow. {{HoodieIndex}} is the super > abstract class of all the implement of the index. Currently, {{HoodieIndex}} > couples with Spark in the design. However, HUDI-538 is doing the restructure > for hudi-client so that hudi can be decoupled with Spark. After that, we > would get an engine-irrelevant implementation of {{HoodieIndex}}. And > extending that class, we could implement a Flink specific index. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-184) Integrate Hudi with Apache Flink
[ https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-184. - Resolution: Won't Do > Integrate Hudi with Apache Flink > > > Key: HUDI-184 > URL: https://issues.apache.org/jira/browse/HUDI-184 > Project: Apache Hudi > Issue Type: New Feature > Components: writer-core >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > Apache Flink is a popular streaming processing engine. > Integrating Hudi with Flink is a valuable work. > The discussion mailing thread is here: > [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-608) Implement a flink datastream execution context
[ https://issues.apache.org/jira/browse/HUDI-608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-608. - Resolution: Won't Do > Implement a flink datastream execution context > -- > > Key: HUDI-608 > URL: https://issues.apache.org/jira/browse/HUDI-608 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > Currently {{HoodieWriteClient}} does something like > `hoodieRecordRDD.map().sort()` internally.. if we want to support Flink > DataStream as the object, then we need to somehow define an abstraction like > {{HoodieExecutionContext}} which will have a common set of map(T) -> T, > filter(), repartition() methods. There will be subclass like > {{HoodieFlinkDataStreamExecutionContext}} which will implement it > in Flink specific ways and hand back the transformed T object. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3606) ClassNotFoundException: org.objenesis.strategy.InstantiatorStrategy
cdmikechen created HUDI-3606: Summary: ClassNotFoundException: org.objenesis.strategy.InstantiatorStrategy Key: HUDI-3606 URL: https://issues.apache.org/jira/browse/HUDI-3606 Project: Apache Hudi Issue Type: Bug Components: timeline-server Affects Versions: 0.10.1 Reporter: cdmikechen When user *hudi-timeline-server-bundle* in hadoop server (3.2.2), hudi will occasionally encounter errors similar to the this. {code} 2022-03-11 05:28:48,223 [qtp818093527-18] ERROR javalin.Javalin: Exception occurred while servicing http-request java.lang.NoClassDefFoundError: org/objenesis/strategy/InstantiatorStrategy at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.(SerializationUtils.java:88) at java.base/java.lang.ThreadLocal$SuppliedThreadLocal.initialValue(Unknown Source) at java.base/java.lang.ThreadLocal.setInitialValue(Unknown Source) at java.base/java.lang.ThreadLocal.get(Unknown Source) at org.apache.hudi.common.util.SerializationUtils.serialize(SerializationUtils.java:52) at org.apache.hudi.common.util.collection.RocksDBDAO.serializePayload(RocksDBDAO.java:469) at org.apache.hudi.common.util.collection.RocksDBDAO.putInBatch(RocksDBDAO.java:175) at org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$12(RocksDbBasedFileSystemView.java:237) at java.base/java.util.TreeMap$ValueSpliterator.forEachRemaining(Unknown Source) at java.base/java.util.stream.ReferencePipeline$Head.forEach(Unknown Source) at org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$null$13(RocksDbBasedFileSystemView.java:236) at org.apache.hudi.common.util.collection.RocksDBDAO.writeBatch(RocksDBDAO.java:157) at org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.lambda$storePartitionView$14(RocksDbBasedFileSystemView.java:235) at java.base/java.util.ArrayList.forEach(Unknown Source) at org.apache.hudi.common.table.view.RocksDbBasedFileSystemView.storePartitionView(RocksDbBasedFileSystemView.java:234) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$addFilesToView$2(AbstractTableFileSystemView.java:146) at java.base/java.util.HashMap.forEach(Unknown Source) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:134) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$9(AbstractTableFileSystemView.java:308) at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295) at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFilesBeforeOrOn(AbstractTableFileSystemView.java:489) at org.apache.hudi.timeline.service.handlers.BaseFileHandler.getLatestDataFilesBeforeOrOn(BaseFileHandler.java:60) at org.apache.hudi.timeline.service.RequestHandler.lambda$registerDataFilesAPI$6(RequestHandler.java:268) at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:497) at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22) at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606) at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46) at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17) at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143) at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41) at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107) at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61) at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:502) at org.ec
[GitHub] [hudi] hudi-bot commented on pull request #4877: [HUDI-3457][Stacked on 4818] Refactored Spark DataSource Relations to avoid code duplication
hudi-bot commented on pull request #4877: URL: https://github.com/apache/hudi/pull/4877#issuecomment-1064793172 ## CI report: * 2940f46a133ca3142f7ebb26b8c6f20583d7f395 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6814) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4877: [HUDI-3457][Stacked on 4818] Refactored Spark DataSource Relations to avoid code duplication
hudi-bot removed a comment on pull request #4877: URL: https://github.com/apache/hudi/pull/4877#issuecomment-1064717467 ## CI report: * d875e412abc29bf6a0e8a6fa7bef747ded15d60b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6284) * 2940f46a133ca3142f7ebb26b8c6f20583d7f395 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6814) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wxplovecc closed pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true
wxplovecc closed pull request #4654: URL: https://github.com/apache/hudi/pull/4654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-3522) Introduce DropColumnSchemaPostProcessor to support drop columns from schema
[ https://issues.apache.org/jira/browse/HUDI-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianghu Wang closed HUDI-3522. -- Resolution: Fixed Resolved via master : 83cff3afee15e129034eb51e68a1734c55d85da2 > Introduce DropColumnSchemaPostProcessor to support drop columns from schema > --- > > Key: HUDI-3522 > URL: https://issues.apache.org/jira/browse/HUDI-3522 > Project: Apache Hudi > Issue Type: Task > Components: deltastreamer >Reporter: Xianghu Wang >Assignee: Xianghu Wang >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > A SchemaPostProcessor to drop columns from given schema -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] wxplovecc closed pull request #4981: [HUDI-3559] fix flink Bucket Index with COW table type `NoSuchElementException` cause o…
wxplovecc closed pull request #4981: URL: https://github.com/apache/hudi/pull/4981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (9dc6df5 -> 83cff3a)
This is an automated email from the ASF dual-hosted git repository. wangxianghu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 9dc6df5 [HUDI-3595] Fixing NULL schema provider for empty batch (#5002) add 83cff3a [HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema (#4972) No new revisions were added by this update. Summary of changes: .../schema/DropColumnSchemaPostProcessor.java | 88 ++ .../hudi/utilities/TestSchemaPostProcessor.java| 25 ++ 2 files changed, 113 insertions(+) create mode 100644 hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/DropColumnSchemaPostProcessor.java
[GitHub] [hudi] wangxianghu merged pull request #4972: [HUDI-3522] Introduce DropColumnSchemaPostProcessor to support drop columns from schema
wangxianghu merged pull request #4972: URL: https://github.com/apache/hudi/pull/4972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4996: [HUDI-3594][Stacked on 4948] Supporting Composite Expressions over Data Table Columns in Data Skipping flow
hudi-bot removed a comment on pull request #4996: URL: https://github.com/apache/hudi/pull/4996#issuecomment-1064734709 ## CI report: * 25578be3436f3a95af26f99368dd581efc5062e0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6811) * 9de43c5d691fa4a4f383a4647ddefa4798fa127d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6816) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4996: [HUDI-3594][Stacked on 4948] Supporting Composite Expressions over Data Table Columns in Data Skipping flow
hudi-bot commented on pull request #4996: URL: https://github.com/apache/hudi/pull/4996#issuecomment-1064776265 ## CI report: * 9de43c5d691fa4a4f383a4647ddefa4798fa127d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6816) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-3593) AsyncClustering failed because of ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HUDI-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504719#comment-17504719 ] shibei edited comment on HUDI-3593 at 3/11/22, 5:04 AM: Another failure {code:java} [ERROR] Tests run: 46, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 853.859 s <<< FAILURE! - in JUnit Vintage [ERROR] String, String, String).[6] MERGE_ON_READ, linear, null(testLayoutOptimizationFunctional Time elapsed: 6.185 s <<< ERROR! org.apache.spark.SparkException: Writing job failed. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:87) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:260) at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:502) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:172) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at org.apache.hudi.functional.TestLayoutOptimization.testLayoutOptimizationFunctional(TestLayoutOptimization.scala:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(Metho
[jira] [Comment Edited] (HUDI-3593) AsyncClustering failed because of ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HUDI-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504719#comment-17504719 ] shibei edited comment on HUDI-3593 at 3/11/22, 5:03 AM: Another failure {code:java} [ERROR] Tests run: 46, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 853.859 s <<< FAILURE! - in JUnit Vintage [ERROR] String, String, String).[6] MERGE_ON_READ, linear, null(testLayoutOptimizationFunctional Time elapsed: 6.185 s <<< ERROR! org.apache.spark.SparkException: Writing job failed. at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:87) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:260) at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:502) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:172) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at org.apache.hudi.functional.TestLayoutOptimization.testLayoutOptimizationFunctional(TestLayoutOptimization.scala:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(Metho
[GitHub] [hudi] xushiyan commented on a change in pull request #4962: [HUDI-3355] Issue with out of order commits in the timeline when ingestion writers using SparkAllowUpdateStrategy
xushiyan commented on a change in pull request #4962: URL: https://github.com/apache/hudi/pull/4962#discussion_r824386018 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/TransactionUtils.java ## @@ -137,4 +165,20 @@ throw new HoodieIOException("Unable to read metadata for instant " + hoodieInstantOption.get(), io); } } + + /** + * Get pending clustering instant. + * Notice: + * we return .requested instant here. + * + * @param metaClient + * @return + */ + public static List getUncheckedPendingClusteringInstants(HoodieTableMetaClient metaClient) { Review comment: shall we call it "ReplaceRequestedInstant" to be specific? Also "unchecked" is only in the context of write client; `TransactionUtils` does not know "unchecked" or not. ```suggestion public static List getPendingReplaceRequestedInstants(HoodieTableMetaClient metaClient) { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4948: [HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index
hudi-bot commented on pull request #4948: URL: https://github.com/apache/hudi/pull/4948#issuecomment-1064773990 ## CI report: * 14366cac6e233cb85ee94307a7f62f6184ed5b34 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6812) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4948: [HUDI-3514] Rebase Data Skipping flow to rely on MT Column Stats index
hudi-bot removed a comment on pull request #4948: URL: https://github.com/apache/hudi/pull/4948#issuecomment-1064707297 ## CI report: * 4421752bef3dd3b53cd896f7d3ca23bb49d22034 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6669) * 14366cac6e233cb85ee94307a7f62f6184ed5b34 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6812) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way
hudi-bot removed a comment on pull request #4971: URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064705899 ## CI report: * 8e89371fed3d147b43959a73e3e6a33cfaefd32c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6650) * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4971: [HUDI-3556] Re-use rollback instant for rolling back of clustering and compaction if rollback failed mid-way
hudi-bot commented on pull request #4971: URL: https://github.com/apache/hudi/pull/4971#issuecomment-1064772913 ## CI report: * 74ace6ca3f717a41d54047bb44ea52fedb94e1ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6810) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-3593) AsyncClustering failed because of ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HUDI-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504719#comment-17504719 ] shibei edited comment on HUDI-3593 at 3/11/22, 4:10 AM: {code:java} at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) at org.apache.spark.SparkContext.clean(SparkContext.scala:2326) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:371) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.map(RDD.scala:370) at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:93) at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:45) at org.apache.hudi.client.clustering.run.strategy.MultipleSparkJobExecutionStrategy.readRecordsForGroupBaseFiles(MultipleSparkJobExecutionStrategy.java:269) at org.apache.hudi.client.clustering.run.strategy.MultipleSparkJobExecutionStrategy.readRecordsForGroup(MultipleSparkJobExecutionStrategy.java:191) at org.apache.hudi.client.clustering.run.strategy.MultipleSparkJobExecutionStrategy.lambda$runClusteringForGroupAsync$4(MultipleSparkJobExecutionStrategy.java:171) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ... 1 more Caused by: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) at java.util.HashSet.writeObject(HashSet.java:287) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1154) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400) ... 16 more {code} was (Author: JIRAUSER279853): Another failure {code:java} at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1154) at java.io
[jira] [Commented] (HUDI-3593) AsyncClustering failed because of ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HUDI-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504719#comment-17504719 ] shibei commented on HUDI-3593: -- Another failure {code:java} at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1154) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400) ... 16 more {code} > AsyncClustering failed because of ConcurrentModificationException > - > > Key: HUDI-3593 > URL: https://issues.apache.org/jira/browse/HUDI-3593 > Project: Apache Hudi > Issue Type: Bug >Reporter: Hui An >Assignee: Hui An >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2022-03-10 at 9.53.13 AM.png > > > Following is the stacktrace I met, > {code:java} > ERROR AsyncClusteringService: Clustering executor failed > java.util.concurrent.CompletionException: org.apache.spark.SparkException: > Task not serializable > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) > > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) > > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606) > > at > java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596) > > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) > Caused by: org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416) > > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2467) > at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$1(RDD.scala:912) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) > at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:911) > at > org.apache.spark.api.java.JavaRDDLike.mapPartitionsWithIndex(JavaRDDLike.scala:103) > > at
[GitHub] [hudi] hudi-bot removed a comment on pull request #4489: [HUDI-3135] Fix Delete partitions with metadata table and fix show partitions in spark sql
hudi-bot removed a comment on pull request #4489: URL: https://github.com/apache/hudi/pull/4489#issuecomment-1064705595 ## CI report: * e74a30e1b9f4395780cfe412d3574dabe2ae9f57 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6795) * d17343318be38b5a9b0953004700aa72f4fed689 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6809) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4489: [HUDI-3135] Fix Delete partitions with metadata table and fix show partitions in spark sql
hudi-bot commented on pull request #4489: URL: https://github.com/apache/hudi/pull/4489#issuecomment-1064751722 ## CI report: * d17343318be38b5a9b0953004700aa72f4fed689 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6809) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] melin opened a new issue #5016: [SUPPORT] Add AS OF syntax support
melin opened a new issue #5016: URL: https://github.com/apache/hudi/issues/5016 Use sql to query the specified version data ``` SELECT * FROM default.people10m VERSION AS OF 0; SELECT * FROM default.people10m TIMESTAMP AS OF '2019-01-29 00:37:58'; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4888: [HUDI-3396][Stacked on 4877] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected columns
hudi-bot commented on pull request #4888: URL: https://github.com/apache/hudi/pull/4888#issuecomment-1064748224 ## CI report: * e0afa9f1de90411220a6c1d25c0c9e43f09f6baf Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6815) * b07cca5112163e153385c690203603b74542ace6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6820) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4888: [HUDI-3396][Stacked on 4877] Refactoring `MergeOnReadRDD` to avoid duplication, fetch only projected columns
hudi-bot removed a comment on pull request #4888: URL: https://github.com/apache/hudi/pull/4888#issuecomment-1064723457 ## CI report: * e0afa9f1de90411220a6c1d25c0c9e43f09f6baf Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6815) * b07cca5112163e153385c690203603b74542ace6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (fa5e750 -> 9dc6df5)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from fa5e750 [HUDI-3586] Add Trino Queries in integration tests (#4988) add 9dc6df5 [HUDI-3595] Fixing NULL schema provider for empty batch (#5002) No new revisions were added by this update. Summary of changes: .../org/apache/hudi/common/util/CommitUtils.java | 5 ++- .../functional/TestHoodieDeltaStreamer.java| 28 - .../sources/TestParquetDFSSourceEmptyBatch.java| 49 ++ 3 files changed, 80 insertions(+), 2 deletions(-) create mode 100644 hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestParquetDFSSourceEmptyBatch.java
[GitHub] [hudi] nsivabalan merged pull request #5002: [HUDI-3595] Fixing NULL schema provider for empty batch
nsivabalan merged pull request #5002: URL: https://github.com/apache/hudi/pull/5002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org