[GitHub] [hudi] xiarixiaoyao commented on pull request #4308: [HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields
xiarixiaoyao commented on pull request #4308: URL: https://github.com/apache/hudi/pull/4308#issuecomment-997159950 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4308: [HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields
hudi-bot commented on pull request #4308: URL: https://github.com/apache/hudi/pull/4308#issuecomment-99715 ## CI report: * a28311298525e0713ef000c79633a73162a304bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4438) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4474) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4308: [HUDI-3008] Fixing HoodieFileIndex partition column parsing for nested fields
hudi-bot removed a comment on pull request #4308: URL: https://github.com/apache/hudi/pull/4308#issuecomment-996819976 ## CI report: * a28311298525e0713ef000c79633a73162a304bc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4438) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4346: [HUDI-3045] New ClusteringPlanStrategy to use regex choose partitions when building clustering plan
hudi-bot commented on pull request #4346: URL: https://github.com/apache/hudi/pull/4346#issuecomment-997153375 ## CI report: * 2227d98a76c74d94538a57467fe4d72f0a0daeae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4399) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4406) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4408) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4425) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4430) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4435) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4458) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f 0d7039a0cc/_build/results?buildId=4473) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4346: [HUDI-3045] New ClusteringPlanStrategy to use regex choose partitions when building clustering plan
hudi-bot removed a comment on pull request #4346: URL: https://github.com/apache/hudi/pull/4346#issuecomment-997147548 ## CI report: * 2227d98a76c74d94538a57467fe4d72f0a0daeae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4399) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4406) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4408) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4425) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4430) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4435) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4458) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f 0d7039a0cc/_build/results?buildId=4473) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.
hudi-bot removed a comment on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-997147506 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * 9b9620a298b45a57af6e596c9305a49ccc69345a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4432) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4427) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4457) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4472) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.
hudi-bot commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-997153336 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * 9b9620a298b45a57af6e596c9305a49ccc69345a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4432) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4427) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4457) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4472) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-3060) DROP TABLE for spark sql
Forward Xu created HUDI-3060: Summary: DROP TABLE for spark sql Key: HUDI-3060 URL: https://issues.apache.org/jira/browse/HUDI-3060 Project: Apache Hudi Issue Type: New Feature Components: Spark Integration Reporter: Forward Xu Assignee: Forward Xu drop table [if exists] ; -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot commented on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997149401 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * 8ebdbe56f2cec0198f5f19a518906d4d9b834b73 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4471) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot removed a comment on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997143022 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4467) * 8ebdbe56f2cec0198f5f19a518906d4d9b834b73 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4471) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4346: [HUDI-3045] New ClusteringPlanStrategy to use regex choose partitions when building clustering plan
hudi-bot removed a comment on pull request #4346: URL: https://github.com/apache/hudi/pull/4346#issuecomment-997110367 ## CI report: * 2227d98a76c74d94538a57467fe4d72f0a0daeae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4399) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4406) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4408) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4425) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4430) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4435) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4458) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4346: [HUDI-3045] New ClusteringPlanStrategy to use regex choose partitions when building clustering plan
hudi-bot commented on pull request #4346: URL: https://github.com/apache/hudi/pull/4346#issuecomment-997147548 ## CI report: * 2227d98a76c74d94538a57467fe4d72f0a0daeae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4399) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4406) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4408) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4425) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4430) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4435) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4458) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f 0d7039a0cc/_build/results?buildId=4473) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.
hudi-bot removed a comment on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-997105300 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * 9b9620a298b45a57af6e596c9305a49ccc69345a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4432) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4427) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4457) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.
hudi-bot commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-997147506 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * 9b9620a298b45a57af6e596c9305a49ccc69345a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4432) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4427) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4457) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4472) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on pull request #4346: [HUDI-3045] New ClusteringPlanStrategy to use regex choose partitions when building clustering plan
zhangyue19921010 commented on pull request #4346: URL: https://github.com/apache/hudi/pull/4346#issuecomment-997147500 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.
zhangyue19921010 commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-997147483 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot commented on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997146359 ## CI report: * 7e13f359550fed056bc3315d245736f7596d320b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4470) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot removed a comment on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997140256 ## CI report: * 29113e6aff644be7511d84ae8428a8597a5b10b2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4465) * 7e13f359550fed056bc3315d245736f7596d320b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4470) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4333: [HUDI-431] Adding support for Parquet in MOR `LogBlock`s
hudi-bot removed a comment on pull request #4333: URL: https://github.com/apache/hudi/pull/4333#issuecomment-997131996 ## CI report: * 286aa8b95627eaaa01114567797186263a830774 UNKNOWN * e722499ee75403ab62f646fdabca1a2c59570164 UNKNOWN * de0d4385394dc5d820964cefc872f099cee7a02b UNKNOWN * 67cbb2f4ab421fb7a90e4c5d1061613ed331c837 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4460) * cecde3b6734576c5f2863ec2b4b90689600cb746 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4469) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4333: [HUDI-431] Adding support for Parquet in MOR `LogBlock`s
hudi-bot commented on pull request #4333: URL: https://github.com/apache/hudi/pull/4333#issuecomment-997144563 ## CI report: * 286aa8b95627eaaa01114567797186263a830774 UNKNOWN * e722499ee75403ab62f646fdabca1a2c59570164 UNKNOWN * de0d4385394dc5d820964cefc872f099cee7a02b UNKNOWN * cecde3b6734576c5f2863ec2b4b90689600cb746 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4469) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot commented on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997143022 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4467) * 8ebdbe56f2cec0198f5f19a518906d4d9b834b73 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4471) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot removed a comment on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997139942 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4467) * 8ebdbe56f2cec0198f5f19a518906d4d9b834b73 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zztttt edited a comment on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6
zz edited a comment on issue #4072: URL: https://github.com/apache/hudi/issues/4072#issuecomment-997140947 > yes, I read the related documents: [https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-hive-metastore.html](url) and find a sentence saying "You can configure javax.jdo.option properties in hive-site.xml or using options with spark.hadoop prefix." , then I can achieve the target. These config are written in hard code by scala. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zztttt edited a comment on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6
zz edited a comment on issue #4072: URL: https://github.com/apache/hudi/issues/4072#issuecomment-997140947 > yes, I read the related documents: [https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-hive-metastore.html](url) and find a sentences saying "You can configure javax.jdo.option properties in hive-site.xml or using options with spark.hadoop prefix." , then I can achieve the target. These config is write in hard code by scala. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zztttt commented on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6
zz commented on issue #4072: URL: https://github.com/apache/hudi/issues/4072#issuecomment-997140947 > yes, I read the related documents: [https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-hive-metastore.html](url) and find a sentences saying "You can configure javax.jdo.option properties in hive-site.xml or using options with spark.hadoop prefix." , then I can achieve the target. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot removed a comment on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997139964 ## CI report: * 29113e6aff644be7511d84ae8428a8597a5b10b2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4465) * 7e13f359550fed056bc3315d245736f7596d320b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot commented on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997140256 ## CI report: * 29113e6aff644be7511d84ae8428a8597a5b10b2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4465) * 7e13f359550fed056bc3315d245736f7596d320b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4470) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot removed a comment on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997133775 ## CI report: * 29113e6aff644be7511d84ae8428a8597a5b10b2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4465) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot commented on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997139964 ## CI report: * 29113e6aff644be7511d84ae8428a8597a5b10b2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4465) * 7e13f359550fed056bc3315d245736f7596d320b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot commented on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997139942 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4467) * 8ebdbe56f2cec0198f5f19a518906d4d9b834b73 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot removed a comment on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997138989 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4467) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot commented on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997138989 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4467) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot removed a comment on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997129617 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * 6ae6b2781237d7e4af95bd78062c3da765ebe9a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4434) * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4467) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] esaeki commented on issue #4348: [SUPPORT] How to set timezone for "_hoodie_commit_time" column?
esaeki commented on issue #4348: URL: https://github.com/apache/hudi/issues/4348#issuecomment-997135786 Thank you for your response. I develop datalake for Japanese client, and Japan's standard time zone is UTC +9 hours. That's why, it's better to adjust the timezone for proper data management. I would appreciate if you tell me an alternative for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
manojpec commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771776844 ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestTransactionManager.java ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction; + +import org.apache.hudi.client.transaction.lock.InProcessLockProvider; +import org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy; +import org.apache.hudi.common.model.WriteConcurrencyMode; +import org.apache.hudi.common.testutils.HoodieCommonTestHarness; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieLockConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieLockException; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +public class TestTransactionManager extends HoodieCommonTestHarness { + HoodieWriteConfig writeConfig; + TransactionManager transactionManager; + + @BeforeEach + private void init() throws IOException { +initPath(); +initMetaClient(); +this.writeConfig = getWriteConfig(); +this.transactionManager = new TransactionManager(this.writeConfig, this.metaClient.getFs()); + } + + private HoodieWriteConfig getWriteConfig() { +return HoodieWriteConfig.newBuilder() +.withPath(basePath) +.withCompactionConfig(HoodieCompactionConfig.newBuilder() + .withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.LAZY) +.build()) + .withWriteConcurrencyMode(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL) +.withLockConfig(HoodieLockConfig.newBuilder() +.withLockProvider(InProcessLockProvider.class) +.build()) +.build(); + } + + @Test + public void testSingleWriterTransaction() { +transactionManager.beginTransaction(); +transactionManager.endTransaction(); + } + + @Test + public void testSingleWriterNestedTransaction() { +transactionManager.beginTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.beginTransaction(); +}); + +transactionManager.endTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.endTransaction(); +}); + } + + @Test + public void testSingleWriterMultipleTransactions() { +for (int i = 0; i < 32; i++) { + transactionManager.beginTransaction(); + transactionManager.endTransaction(); +} + } + + @Test + public void testMultiWriterTransactions() { +final int threadCount = 3; +final long awaitMaxTimeoutMs = 2000L; +final CountDownLatch latch = new CountDownLatch(threadCount); +final AtomicBoolean writer1Completed = new AtomicBoolean(false); +final AtomicBoolean writer2Completed = new AtomicBoolean(false); + +// Let writer1 get the lock first, then wait for others +// to join the sync up point. +Thread writer1 = new Thread(() -> { + assertDoesNotThrow(() -> { +transactionManager.beginTransaction(); + }); + latch.countDown(); + try { +latch.await(awaitMaxTimeoutMs, TimeUnit.MILLISECONDS); +// Following sleep is to make sure writer2 attempts +// to try lock and to get bocked on the lock which +// this thread is currently holding. +Thread.sleep(50); + } catch (InterruptedException e) { +// + } + assertDoesNotThrow(() -> { +transactionManager.endTransaction(); + }); + writer1Completed.set(true); +}); +writer1.start(); + +// Writer2 will block on trying to acquire the lock +// and will eventually get the lock before the timeout. +Thread writer2 = new Thread(() -> { + latch.countDown()
[GitHub] [hudi] manojpec commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
manojpec commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771776738 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java ## @@ -55,28 +55,32 @@ public void beginTransaction() { public void beginTransaction(Option currentTxnOwnerInstant, Option lastCompletedTxnOwnerInstant) { -if (supportsOptimisticConcurrency) { +if (isOptimisticConcurrencyControlEnabled) { LOG.info("Transaction starting for " + currentTxnOwnerInstant + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); lockManager.lock(); - this.currentTxnOwnerInstant = currentTxnOwnerInstant; - this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; + reset(currentTxnOwnerInstant, lastCompletedTxnOwnerInstant); LOG.info("Transaction started for " + currentTxnOwnerInstant + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); } } public void endTransaction() { -if (supportsOptimisticConcurrency) { +if (isOptimisticConcurrencyControlEnabled) { Review comment: Good catch, will close the gap in the reset with CAS like operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3879: [SUPPORT] Incomplete Table Migration
nsivabalan commented on issue #3879: URL: https://github.com/apache/hudi/issues/3879#issuecomment-997135231 @jardel-lima : let us know if you have any updates or if you can share the dataset. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
hudi-bot commented on pull request #4363: URL: https://github.com/apache/hudi/pull/4363#issuecomment-997134928 ## CI report: * f0555fa1c09b27744084d20199683a1f8e68d9b7 UNKNOWN * a10ec8a603b8297e0a69246b4d33866c9b7f5ad6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4466) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
hudi-bot removed a comment on pull request #4363: URL: https://github.com/apache/hudi/pull/4363#issuecomment-997128379 ## CI report: * f0555fa1c09b27744084d20199683a1f8e68d9b7 UNKNOWN * 46bfd4cb47cb7cba1185b9e146cfc8396a91af88 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4464) * a10ec8a603b8297e0a69246b4d33866c9b7f5ad6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4466) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3890: [SUPPORT] Hudi Sync did not add previous partitions
nsivabalan commented on issue #3890: URL: https://github.com/apache/hudi/issues/3890#issuecomment-997134256 @stym06 : Can you respond to my questions above. would like to get to the bottom of this. But hive sync in general, keeps track of last synced time. so not sure how this could happen. If you were able to resolve the issue, feel free to close it out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot removed a comment on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997126936 ## CI report: * 1677eab2ead6910016c2ed0b67640c97757633bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4410) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4431) * 29113e6aff644be7511d84ae8428a8597a5b10b2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4465) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot commented on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997133775 ## CI report: * 29113e6aff644be7511d84ae8428a8597a5b10b2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4465) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on issue #4200: spark-sql query timestamp partition error
YannByron commented on issue #4200: URL: https://github.com/apache/hudi/issues/4200#issuecomment-997132424 @nsivabalan i'll locate this in next days and reply asap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4333: [HUDI-431] Adding support for Parquet in MOR `LogBlock`s
hudi-bot removed a comment on pull request #4333: URL: https://github.com/apache/hudi/pull/4333#issuecomment-997129628 ## CI report: * 286aa8b95627eaaa01114567797186263a830774 UNKNOWN * e722499ee75403ab62f646fdabca1a2c59570164 UNKNOWN * de0d4385394dc5d820964cefc872f099cee7a02b UNKNOWN * 67cbb2f4ab421fb7a90e4c5d1061613ed331c837 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4460) * cecde3b6734576c5f2863ec2b4b90689600cb746 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4333: [HUDI-431] Adding support for Parquet in MOR `LogBlock`s
hudi-bot commented on pull request #4333: URL: https://github.com/apache/hudi/pull/4333#issuecomment-997131996 ## CI report: * 286aa8b95627eaaa01114567797186263a830774 UNKNOWN * e722499ee75403ab62f646fdabca1a2c59570164 UNKNOWN * de0d4385394dc5d820964cefc872f099cee7a02b UNKNOWN * 67cbb2f4ab421fb7a90e4c5d1061613ed331c837 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4460) * cecde3b6734576c5f2863ec2b4b90689600cb746 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4469) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
YannByron commented on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-997131446 @nsivabalan I failed to reproduce this. @danny0405 can you reproduce this issues? And @BenjMaq just execute `create table`, `insert into`, and `insert overwrite` these three steps? any other commits? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
alexeykudinkin commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771773363 ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestTransactionManager.java ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction; + +import org.apache.hudi.client.transaction.lock.InProcessLockProvider; +import org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy; +import org.apache.hudi.common.model.WriteConcurrencyMode; +import org.apache.hudi.common.testutils.HoodieCommonTestHarness; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieLockConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieLockException; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +public class TestTransactionManager extends HoodieCommonTestHarness { + HoodieWriteConfig writeConfig; + TransactionManager transactionManager; + + @BeforeEach + private void init() throws IOException { +initPath(); +initMetaClient(); +this.writeConfig = getWriteConfig(); +this.transactionManager = new TransactionManager(this.writeConfig, this.metaClient.getFs()); + } + + private HoodieWriteConfig getWriteConfig() { +return HoodieWriteConfig.newBuilder() +.withPath(basePath) +.withCompactionConfig(HoodieCompactionConfig.newBuilder() + .withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.LAZY) +.build()) + .withWriteConcurrencyMode(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL) +.withLockConfig(HoodieLockConfig.newBuilder() +.withLockProvider(InProcessLockProvider.class) +.build()) +.build(); + } + + @Test + public void testSingleWriterTransaction() { +transactionManager.beginTransaction(); +transactionManager.endTransaction(); + } + + @Test + public void testSingleWriterNestedTransaction() { +transactionManager.beginTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.beginTransaction(); +}); + +transactionManager.endTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.endTransaction(); +}); + } + + @Test + public void testSingleWriterMultipleTransactions() { +for (int i = 0; i < 32; i++) { + transactionManager.beginTransaction(); + transactionManager.endTransaction(); +} + } + + @Test + public void testMultiWriterTransactions() { +final int threadCount = 3; +final long awaitMaxTimeoutMs = 2000L; +final CountDownLatch latch = new CountDownLatch(threadCount); +final AtomicBoolean writer1Completed = new AtomicBoolean(false); +final AtomicBoolean writer2Completed = new AtomicBoolean(false); + +// Let writer1 get the lock first, then wait for others +// to join the sync up point. +Thread writer1 = new Thread(() -> { + assertDoesNotThrow(() -> { +transactionManager.beginTransaction(); + }); + latch.countDown(); + try { +latch.await(awaitMaxTimeoutMs, TimeUnit.MILLISECONDS); +// Following sleep is to make sure writer2 attempts +// to try lock and to get bocked on the lock which +// this thread is currently holding. +Thread.sleep(50); + } catch (InterruptedException e) { +// + } + assertDoesNotThrow(() -> { +transactionManager.endTransaction(); + }); + writer1Completed.set(true); +}); +writer1.start(); + +// Writer2 will block on trying to acquire the lock +// and will eventually get the lock before the timeout. +Thread writer2 = new Thread(() -> { + latch.count
[jira] [Updated] (HUDI-3029) TransactionManager synchronized begin/endTransaction() leading to deadlock
[ https://issues.apache.org/jira/browse/HUDI-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HUDI-3029: - Description: I see the TransactionManager has begin and end transactions as synchronized methods. Based on the lock provider implementation, this can have adverse effects. Say the lock provider has the blocking call for the lock() or tryLock() (which is genereally the case), then the following sequence will lead to a deadlock. Client 1: beginTransaction() => txn manager instance lock acquired, lock() went through, instance lock released Client 2: beginTransaction() => txn manager instance lock acquired, lock() is blocking Cilent 1: endTransaction() => Waiting to lock the txn manager instance to enter the synchronized method {noformat} public synchronized void beginTransaction(Option currentTxnOwnerInstant, Option lastCompletedTxnOwnerInstant) { if (supportsOptimisticConcurrency) { this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant); LOG.info("Latest completed transaction instant " + lastCompletedTxnOwnerInstant); this.currentTxnOwnerInstant = currentTxnOwnerInstant; LOG.info("Transaction starting with transaction owner " + currentTxnOwnerInstant); lockManager.lock(); LOG.info("Transaction started"); } } public synchronized void endTransaction() { if (supportsOptimisticConcurrency) { LOG.info("Transaction ending with transaction owner " + currentTxnOwnerInstant); lockManager.unlock(); LOG.info("Transaction ended"); this.lastCompletedTxnOwnerInstant = Option.empty(); lockManager.resetLatestCompletedWriteInstant(); } }{noformat} The reason why it may be working with the current model is when the lock provider implementation of tryLock() has sleep() or retry with timeout etc., But, we can't assume on the lock provider implementation at the transaction manager layer. cc: [~nishith29] [~shivnarayan] was: I see the TransactionManager has begin and end transactions as synchronized methods. Based on the lock provider implementation, this can have adverse effects. Say the lock provider has the blocking call for the lock() or tryLock() (which is genereally the case), then the following sequence will lead to a deadlock. Client 1: beginTransaction() => txn manager instance lock acquired, lock() went through, instance lock released Client 2: beginTransaction() => txn manager instance lock acquired, lock() is blocking Cilent 3: endTransaction() => Waiting to lock the txn manager instance to enter the synchronized method {noformat} public synchronized void beginTransaction(Option currentTxnOwnerInstant, Option lastCompletedTxnOwnerInstant) { if (supportsOptimisticConcurrency) { this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant); LOG.info("Latest completed transaction instant " + lastCompletedTxnOwnerInstant); this.currentTxnOwnerInstant = currentTxnOwnerInstant; LOG.info("Transaction starting with transaction owner " + currentTxnOwnerInstant); lockManager.lock(); LOG.info("Transaction started"); } } public synchronized void endTransaction() { if (supportsOptimisticConcurrency) { LOG.info("Transaction ending with transaction owner " + currentTxnOwnerInstant); lockManager.unlock(); LOG.info("Transaction ended"); this.lastCompletedTxnOwnerInstant = Option.empty(); lockManager.resetLatestCompletedWriteInstant(); } }{noformat} The reason why it may be working with the current model is when the lock provider implementation of tryLock() has sleep() or retry with timeout etc., But, we can't assume on the lock provider implementation at the transaction manager layer. cc: [~nishith29] [~shivnarayan] > TransactionManager synchronized begin/endTransaction() leading to deadlock > --- > > Key: HUDI-3029 > URL: https://issues.apache.org/jira/browse/HUDI-3029 > Project: Apache Hudi > Issue Type: Task > Components: Writer Core >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > I see the TransactionManager has begin and end transactions as synchronized > methods. Based on the lock provider implementation, this can have adverse > effects. Say the lock provider has the blocking call for the lock() or > tryLock() (which is genereally the case), then the following sequence will > lead to a deadlock. > Client 1: beginTransaction() => txn manager instance lock acquired, lock() > went through, insta
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
alexeykudinkin commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771772917 ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestTransactionManager.java ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction; + +import org.apache.hudi.client.transaction.lock.InProcessLockProvider; +import org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy; +import org.apache.hudi.common.model.WriteConcurrencyMode; +import org.apache.hudi.common.testutils.HoodieCommonTestHarness; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieLockConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieLockException; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +public class TestTransactionManager extends HoodieCommonTestHarness { + HoodieWriteConfig writeConfig; + TransactionManager transactionManager; + + @BeforeEach + private void init() throws IOException { +initPath(); +initMetaClient(); +this.writeConfig = getWriteConfig(); +this.transactionManager = new TransactionManager(this.writeConfig, this.metaClient.getFs()); + } + + private HoodieWriteConfig getWriteConfig() { +return HoodieWriteConfig.newBuilder() +.withPath(basePath) +.withCompactionConfig(HoodieCompactionConfig.newBuilder() + .withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.LAZY) +.build()) + .withWriteConcurrencyMode(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL) +.withLockConfig(HoodieLockConfig.newBuilder() +.withLockProvider(InProcessLockProvider.class) +.build()) +.build(); + } + + @Test + public void testSingleWriterTransaction() { +transactionManager.beginTransaction(); +transactionManager.endTransaction(); + } + + @Test + public void testSingleWriterNestedTransaction() { +transactionManager.beginTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.beginTransaction(); +}); + +transactionManager.endTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.endTransaction(); +}); + } + + @Test + public void testSingleWriterMultipleTransactions() { +for (int i = 0; i < 32; i++) { + transactionManager.beginTransaction(); + transactionManager.endTransaction(); +} + } + + @Test + public void testMultiWriterTransactions() { +final int threadCount = 3; +final long awaitMaxTimeoutMs = 2000L; +final CountDownLatch latch = new CountDownLatch(threadCount); +final AtomicBoolean writer1Completed = new AtomicBoolean(false); +final AtomicBoolean writer2Completed = new AtomicBoolean(false); + +// Let writer1 get the lock first, then wait for others +// to join the sync up point. +Thread writer1 = new Thread(() -> { + assertDoesNotThrow(() -> { +transactionManager.beginTransaction(); + }); + latch.countDown(); + try { +latch.await(awaitMaxTimeoutMs, TimeUnit.MILLISECONDS); +// Following sleep is to make sure writer2 attempts +// to try lock and to get bocked on the lock which +// this thread is currently holding. +Thread.sleep(50); + } catch (InterruptedException e) { +// + } + assertDoesNotThrow(() -> { +transactionManager.endTransaction(); + }); + writer1Completed.set(true); +}); +writer1.start(); + +// Writer2 will block on trying to acquire the lock +// and will eventually get the lock before the timeout. +Thread writer2 = new Thread(() -> { + latch.count
[GitHub] [hudi] nsivabalan commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
nsivabalan commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771772721 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java ## @@ -55,28 +55,32 @@ public void beginTransaction() { public void beginTransaction(Option currentTxnOwnerInstant, Option lastCompletedTxnOwnerInstant) { -if (supportsOptimisticConcurrency) { +if (isOptimisticConcurrencyControlEnabled) { LOG.info("Transaction starting for " + currentTxnOwnerInstant + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); lockManager.lock(); - this.currentTxnOwnerInstant = currentTxnOwnerInstant; - this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; + reset(currentTxnOwnerInstant, lastCompletedTxnOwnerInstant); LOG.info("Transaction started for " + currentTxnOwnerInstant + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); } } public void endTransaction() { -if (supportsOptimisticConcurrency) { +if (isOptimisticConcurrencyControlEnabled) { Review comment: yes, but the failure happens only at L 72 right. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
manojpec commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771772619 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java ## @@ -55,28 +55,32 @@ public void beginTransaction() { public void beginTransaction(Option currentTxnOwnerInstant, Option lastCompletedTxnOwnerInstant) { -if (supportsOptimisticConcurrency) { +if (isOptimisticConcurrencyControlEnabled) { LOG.info("Transaction starting for " + currentTxnOwnerInstant + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); lockManager.lock(); - this.currentTxnOwnerInstant = currentTxnOwnerInstant; - this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; + reset(currentTxnOwnerInstant, lastCompletedTxnOwnerInstant); LOG.info("Transaction started for " + currentTxnOwnerInstant + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); } } public void endTransaction() { -if (supportsOptimisticConcurrency) { +if (isOptimisticConcurrencyControlEnabled) { Review comment: writer2 end transaction will fail as he doesn't hold the lock -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
nsivabalan commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771772587 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java ## @@ -55,28 +55,32 @@ public void beginTransaction() { public void beginTransaction(Option currentTxnOwnerInstant, Option lastCompletedTxnOwnerInstant) { -if (supportsOptimisticConcurrency) { +if (isOptimisticConcurrencyControlEnabled) { LOG.info("Transaction starting for " + currentTxnOwnerInstant + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); lockManager.lock(); - this.currentTxnOwnerInstant = currentTxnOwnerInstant; - this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; + reset(currentTxnOwnerInstant, lastCompletedTxnOwnerInstant); LOG.info("Transaction started for " + currentTxnOwnerInstant + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); } } public void endTransaction() { -if (supportsOptimisticConcurrency) { +if (isOptimisticConcurrencyControlEnabled) { Review comment: help me understand something. lets say writer1 acquires the lock and takes lot of time to release. writer2 tries to acquire the lock, but times out. In finally block of any transaction handling code, we do end transaction right. In this case when writer2 fails to acquire, will end transaction be called? if yes, wouldn't writer2 resets the transaction owner at L 71 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4333: [HUDI-431] Adding support for Parquet in MOR `LogBlock`s
hudi-bot removed a comment on pull request #4333: URL: https://github.com/apache/hudi/pull/4333#issuecomment-997118146 ## CI report: * 286aa8b95627eaaa01114567797186263a830774 UNKNOWN * e722499ee75403ab62f646fdabca1a2c59570164 UNKNOWN * de0d4385394dc5d820964cefc872f099cee7a02b UNKNOWN * 67cbb2f4ab421fb7a90e4c5d1061613ed331c837 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4460) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot commented on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997129617 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * 6ae6b2781237d7e4af95bd78062c3da765ebe9a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4434) * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4467) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4333: [HUDI-431] Adding support for Parquet in MOR `LogBlock`s
hudi-bot commented on pull request #4333: URL: https://github.com/apache/hudi/pull/4333#issuecomment-997129628 ## CI report: * 286aa8b95627eaaa01114567797186263a830774 UNKNOWN * e722499ee75403ab62f646fdabca1a2c59570164 UNKNOWN * de0d4385394dc5d820964cefc872f099cee7a02b UNKNOWN * 67cbb2f4ab421fb7a90e4c5d1061613ed331c837 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4460) * cecde3b6734576c5f2863ec2b4b90689600cb746 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot removed a comment on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997128969 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * 6ae6b2781237d7e4af95bd78062c3da765ebe9a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4434) * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
manojpec commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771772308 ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestTransactionManager.java ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction; + +import org.apache.hudi.client.transaction.lock.InProcessLockProvider; +import org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy; +import org.apache.hudi.common.model.WriteConcurrencyMode; +import org.apache.hudi.common.testutils.HoodieCommonTestHarness; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieLockConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieLockException; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +public class TestTransactionManager extends HoodieCommonTestHarness { + HoodieWriteConfig writeConfig; + TransactionManager transactionManager; + + @BeforeEach + private void init() throws IOException { +initPath(); +initMetaClient(); +this.writeConfig = getWriteConfig(); +this.transactionManager = new TransactionManager(this.writeConfig, this.metaClient.getFs()); + } + + private HoodieWriteConfig getWriteConfig() { +return HoodieWriteConfig.newBuilder() +.withPath(basePath) +.withCompactionConfig(HoodieCompactionConfig.newBuilder() + .withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.LAZY) +.build()) + .withWriteConcurrencyMode(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL) +.withLockConfig(HoodieLockConfig.newBuilder() +.withLockProvider(InProcessLockProvider.class) +.build()) +.build(); + } + + @Test + public void testSingleWriterTransaction() { +transactionManager.beginTransaction(); +transactionManager.endTransaction(); + } + + @Test + public void testSingleWriterNestedTransaction() { +transactionManager.beginTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.beginTransaction(); +}); + +transactionManager.endTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.endTransaction(); +}); + } + + @Test + public void testSingleWriterMultipleTransactions() { +for (int i = 0; i < 32; i++) { + transactionManager.beginTransaction(); + transactionManager.endTransaction(); +} + } + + @Test + public void testMultiWriterTransactions() { +final int threadCount = 3; +final long awaitMaxTimeoutMs = 2000L; +final CountDownLatch latch = new CountDownLatch(threadCount); +final AtomicBoolean writer1Completed = new AtomicBoolean(false); +final AtomicBoolean writer2Completed = new AtomicBoolean(false); + +// Let writer1 get the lock first, then wait for others +// to join the sync up point. +Thread writer1 = new Thread(() -> { + assertDoesNotThrow(() -> { +transactionManager.beginTransaction(); + }); + latch.countDown(); + try { +latch.await(awaitMaxTimeoutMs, TimeUnit.MILLISECONDS); +// Following sleep is to make sure writer2 attempts +// to try lock and to get bocked on the lock which +// this thread is currently holding. +Thread.sleep(50); + } catch (InterruptedException e) { +// + } + assertDoesNotThrow(() -> { +transactionManager.endTransaction(); + }); + writer1Completed.set(true); +}); +writer1.start(); + +// Writer2 will block on trying to acquire the lock +// and will eventually get the lock before the timeout. +Thread writer2 = new Thread(() -> { + latch.countDown()
[GitHub] [hudi] manojpec commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
manojpec commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771772205 ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestTransactionManager.java ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction; + +import org.apache.hudi.client.transaction.lock.InProcessLockProvider; +import org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy; +import org.apache.hudi.common.model.WriteConcurrencyMode; +import org.apache.hudi.common.testutils.HoodieCommonTestHarness; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieLockConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieLockException; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +public class TestTransactionManager extends HoodieCommonTestHarness { + HoodieWriteConfig writeConfig; + TransactionManager transactionManager; + + @BeforeEach + private void init() throws IOException { +initPath(); +initMetaClient(); +this.writeConfig = getWriteConfig(); +this.transactionManager = new TransactionManager(this.writeConfig, this.metaClient.getFs()); + } + + private HoodieWriteConfig getWriteConfig() { +return HoodieWriteConfig.newBuilder() +.withPath(basePath) +.withCompactionConfig(HoodieCompactionConfig.newBuilder() + .withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.LAZY) +.build()) + .withWriteConcurrencyMode(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL) +.withLockConfig(HoodieLockConfig.newBuilder() +.withLockProvider(InProcessLockProvider.class) +.build()) +.build(); + } + + @Test + public void testSingleWriterTransaction() { +transactionManager.beginTransaction(); +transactionManager.endTransaction(); + } + + @Test + public void testSingleWriterNestedTransaction() { +transactionManager.beginTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.beginTransaction(); +}); + +transactionManager.endTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.endTransaction(); +}); + } + + @Test + public void testSingleWriterMultipleTransactions() { +for (int i = 0; i < 32; i++) { + transactionManager.beginTransaction(); + transactionManager.endTransaction(); +} + } + + @Test + public void testMultiWriterTransactions() { +final int threadCount = 3; +final long awaitMaxTimeoutMs = 2000L; +final CountDownLatch latch = new CountDownLatch(threadCount); +final AtomicBoolean writer1Completed = new AtomicBoolean(false); +final AtomicBoolean writer2Completed = new AtomicBoolean(false); + +// Let writer1 get the lock first, then wait for others +// to join the sync up point. +Thread writer1 = new Thread(() -> { + assertDoesNotThrow(() -> { +transactionManager.beginTransaction(); + }); + latch.countDown(); + try { +latch.await(awaitMaxTimeoutMs, TimeUnit.MILLISECONDS); +// Following sleep is to make sure writer2 attempts +// to try lock and to get bocked on the lock which +// this thread is currently holding. +Thread.sleep(50); + } catch (InterruptedException e) { +// + } + assertDoesNotThrow(() -> { +transactionManager.endTransaction(); + }); + writer1Completed.set(true); +}); +writer1.start(); + +// Writer2 will block on trying to acquire the lock +// and will eventually get the lock before the timeout. +Thread writer2 = new Thread(() -> { + latch.countDown()
[GitHub] [hudi] hudi-bot commented on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot commented on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997128969 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * 6ae6b2781237d7e4af95bd78062c3da765ebe9a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4434) * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN * a3e3b87be58f705d665f73e938977ac13b314657 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot removed a comment on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997126921 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * 6ae6b2781237d7e4af95bd78062c3da765ebe9a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4434) * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (7784249 -> 4785244)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 7784249 [HUDI-2962] InProcess lock provider to guard single writer process with async table operations (#4259) add 4785244 [HUDI-3043] De-coupling multi writer tests (#4362) No new revisions were added by this update. Summary of changes: .../TestHoodieDeltaStreamerWithMultiWriter.java| 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-)
[GitHub] [hudi] nsivabalan merged pull request #4362: [HUDI-3043] De-coupling multi writer tests
nsivabalan merged pull request #4362: URL: https://github.com/apache/hudi/pull/4362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
manojpec commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771771785 ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestTransactionManager.java ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction; + +import org.apache.hudi.client.transaction.lock.InProcessLockProvider; +import org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy; +import org.apache.hudi.common.model.WriteConcurrencyMode; +import org.apache.hudi.common.testutils.HoodieCommonTestHarness; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieLockConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieLockException; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +public class TestTransactionManager extends HoodieCommonTestHarness { + HoodieWriteConfig writeConfig; + TransactionManager transactionManager; + + @BeforeEach + private void init() throws IOException { +initPath(); +initMetaClient(); +this.writeConfig = getWriteConfig(); +this.transactionManager = new TransactionManager(this.writeConfig, this.metaClient.getFs()); + } + + private HoodieWriteConfig getWriteConfig() { +return HoodieWriteConfig.newBuilder() +.withPath(basePath) +.withCompactionConfig(HoodieCompactionConfig.newBuilder() + .withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.LAZY) +.build()) + .withWriteConcurrencyMode(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL) +.withLockConfig(HoodieLockConfig.newBuilder() +.withLockProvider(InProcessLockProvider.class) +.build()) +.build(); + } + + @Test + public void testSingleWriterTransaction() { +transactionManager.beginTransaction(); +transactionManager.endTransaction(); + } + + @Test + public void testSingleWriterNestedTransaction() { +transactionManager.beginTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.beginTransaction(); +}); + +transactionManager.endTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.endTransaction(); +}); + } + + @Test + public void testSingleWriterMultipleTransactions() { Review comment: Same thread able to do multiple transactions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
hudi-bot removed a comment on pull request #4363: URL: https://github.com/apache/hudi/pull/4363#issuecomment-997126961 ## CI report: * f0555fa1c09b27744084d20199683a1f8e68d9b7 UNKNOWN * 46bfd4cb47cb7cba1185b9e146cfc8396a91af88 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4464) * a10ec8a603b8297e0a69246b4d33866c9b7f5ad6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
hudi-bot commented on pull request #4363: URL: https://github.com/apache/hudi/pull/4363#issuecomment-997128379 ## CI report: * f0555fa1c09b27744084d20199683a1f8e68d9b7 UNKNOWN * 46bfd4cb47cb7cba1185b9e146cfc8396a91af88 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4464) * a10ec8a603b8297e0a69246b4d33866c9b7f5ad6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4466) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-3059) save point rollback not working with hudi-cli
[ https://issues.apache.org/jira/browse/HUDI-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3059: - Assignee: sivabalan narayanan > save point rollback not working with hudi-cli > - > > Key: HUDI-3059 > URL: https://issues.apache.org/jira/browse/HUDI-3059 > Project: Apache Hudi > Issue Type: Bug > Components: Usability >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: sev:critical > > Ref issue: > [https://github.com/apache/hudi/issues/3870] > > # create Hudi dataset > # add some data so there are multiple commits > # create a savepoint > # try to rollback savepoint > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3059) save point rollback not working with hudi-cli
[ https://issues.apache.org/jira/browse/HUDI-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3059: -- Labels: sev:critical (was: ) > save point rollback not working with hudi-cli > - > > Key: HUDI-3059 > URL: https://issues.apache.org/jira/browse/HUDI-3059 > Project: Apache Hudi > Issue Type: Bug > Components: Usability >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:critical > > Ref issue: > [https://github.com/apache/hudi/issues/3870] > > # create Hudi dataset > # add some data so there are multiple commits > # create a savepoint > # try to rollback savepoint > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3059) save point rollback not working with hudi-cli
sivabalan narayanan created HUDI-3059: - Summary: save point rollback not working with hudi-cli Key: HUDI-3059 URL: https://issues.apache.org/jira/browse/HUDI-3059 Project: Apache Hudi Issue Type: Bug Components: Usability Reporter: sivabalan narayanan Ref issue: [https://github.com/apache/hudi/issues/3870] # create Hudi dataset # add some data so there are multiple commits # create a savepoint # try to rollback savepoint -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3058) SqlQueryEqualityPreCommitValidator errors with java.util.ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HUDI-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3058: - Assignee: satish > SqlQueryEqualityPreCommitValidator errors with > java.util.ConcurrentModificationException > > > Key: HUDI-3058 > URL: https://issues.apache.org/jira/browse/HUDI-3058 > Project: Apache Hudi > Issue Type: Bug > Components: Usability >Affects Versions: 0.10.0 >Reporter: sivabalan narayanan >Assignee: satish >Priority: Major > Labels: sev:high > Fix For: 0.11.0 > > > Ref issue: [https://github.com/apache/hudi/issues/4109] > > Faced concurrentModificationException when trying to test > SqlQueryEqualityPreCommitValidator in quickstart guide > *To Reproduce* > Steps to reproduce the behavior: > # Insert data without any pre commit validations > # Update data (ensured the updates dont touch the fare column in quickstart > example) with the following precommit validator props > {{option("hoodie.precommit.validators", > "org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator"). > option("hoodie.precommit.validators.equality.sql.queries", "select sum(fare) > from ").}} > stacktrace: > {code:java} > org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit > time 20211124114945342 > at > org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:62) > at > org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:46) > at > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:111) > at > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:95) > at > org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:174) > at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) > at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:276) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) > ... 70 elided > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1633) > at > java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(Referen
[jira] [Updated] (HUDI-3058) SqlQueryEqualityPreCommitValidator errors with java.util.ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HUDI-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3058: -- Labels: sev:high (was: ) > SqlQueryEqualityPreCommitValidator errors with > java.util.ConcurrentModificationException > > > Key: HUDI-3058 > URL: https://issues.apache.org/jira/browse/HUDI-3058 > Project: Apache Hudi > Issue Type: Bug > Components: Usability >Affects Versions: 0.10.0 >Reporter: sivabalan narayanan >Priority: Major > Labels: sev:high > Fix For: 0.11.0 > > > Ref issue: [https://github.com/apache/hudi/issues/4109] > > Faced concurrentModificationException when trying to test > SqlQueryEqualityPreCommitValidator in quickstart guide > *To Reproduce* > Steps to reproduce the behavior: > # Insert data without any pre commit validations > # Update data (ensured the updates dont touch the fare column in quickstart > example) with the following precommit validator props > {{option("hoodie.precommit.validators", > "org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator"). > option("hoodie.precommit.validators.equality.sql.queries", "select sum(fare) > from ").}} > stacktrace: > {code:java} > org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit > time 20211124114945342 > at > org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:62) > at > org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:46) > at > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:111) > at > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:95) > at > org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:174) > at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) > at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:276) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) > ... 70 elided > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1633) > at > java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at ja
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
alexeykudinkin commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771769971 ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestTransactionManager.java ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction; + +import org.apache.hudi.client.transaction.lock.InProcessLockProvider; +import org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy; +import org.apache.hudi.common.model.WriteConcurrencyMode; +import org.apache.hudi.common.testutils.HoodieCommonTestHarness; +import org.apache.hudi.config.HoodieCompactionConfig; +import org.apache.hudi.config.HoodieLockConfig; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieLockException; +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.assertDoesNotThrow; +import static org.junit.jupiter.api.Assertions.assertThrows; + +public class TestTransactionManager extends HoodieCommonTestHarness { + HoodieWriteConfig writeConfig; + TransactionManager transactionManager; + + @BeforeEach + private void init() throws IOException { +initPath(); +initMetaClient(); +this.writeConfig = getWriteConfig(); +this.transactionManager = new TransactionManager(this.writeConfig, this.metaClient.getFs()); + } + + private HoodieWriteConfig getWriteConfig() { +return HoodieWriteConfig.newBuilder() +.withPath(basePath) +.withCompactionConfig(HoodieCompactionConfig.newBuilder() + .withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.LAZY) +.build()) + .withWriteConcurrencyMode(WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL) +.withLockConfig(HoodieLockConfig.newBuilder() +.withLockProvider(InProcessLockProvider.class) +.build()) +.build(); + } + + @Test + public void testSingleWriterTransaction() { +transactionManager.beginTransaction(); +transactionManager.endTransaction(); + } + + @Test + public void testSingleWriterNestedTransaction() { +transactionManager.beginTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.beginTransaction(); +}); + +transactionManager.endTransaction(); +assertThrows(HoodieLockException.class, () -> { + transactionManager.endTransaction(); +}); + } + + @Test + public void testSingleWriterMultipleTransactions() { Review comment: Not sure i understand what exactly we're testing with this one ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestTransactionManager.java ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client.transaction; + +import org.apache.hudi.client.transaction.lock.InProcessLockProvider; +import org.apache.hudi.common.model.HoodieFailedWritesCleaningPolicy; +import org.apache.hudi.common.model.WriteConcurrencyMode; +import org.apache.hudi.common.testutils.HoodieCommonTestHarness; +import org.apac
[GitHub] [hudi] nsivabalan commented on issue #4135: [SUPPORT] Zordering clustering on a moderate size dataset taking large amounts of time.
nsivabalan commented on issue #4135: URL: https://github.com/apache/hudi/issues/4135#issuecomment-997127574 Hey folks, is there any pending things to be resolved. If not, can we close this one out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #3989: [HUDI-2589] RFC-37: Metadata table based bloom index
manojpec commented on a change in pull request #3989: URL: https://github.com/apache/hudi/pull/3989#discussion_r771612579 ## File path: rfc/rfc-37/rfc-37.md ## @@ -0,0 +1,286 @@ + +# RFC-37: Metadata based Bloom Index + +## Proposers +- @nsivabalan +- @manojpec + +## Approvers + - @vinothchandar + - @satishkotha + +## Status +JIRA: https://issues.apache.org/jira/browse/HUDI-2703 + +## Abstract +Hudi maintains several indices to locate/map incoming records to file groups during writes. Most commonly +used record index is the HoodieBloomIndex. Larger tables and global index has performance issues +as the bloom filter from a large number of data files needed to be read and looked up. Reading from several +files over the cloud object storage like S3 also faces request throttling issues. We are proposing to +build a new Metadata index (metadata table based bloom index) to boost the performance of existing bloom index. + +## Background +HoodieBloomIndex is used to find the location of incoming records during every write. Bloom index assists Hudi in +deterministically routing records to a given file group and to distinguish inserts vs updates. This aggregate bloom +index is built from several bloom filters stored in the base file footers. Prior to bloom filter lookup, the file +pruning for the incoming records is also done based on the record key min/max stats stored in the base file footers. +In this RFC, we plan to build a new index for the bloom filters under the metadata table which to assist in +bloom index based record location tagging. + +## Design +HoodieBloomIndex involves the following steps to find the right location of incoming records +1. Find all the interested partitions and list all its data files. +2. File Pruning: Load record key min/max details from all the interested data file footers. Filter files and generate + files to keys mapping for the incoming records based on the key ranges using range interval tree built from + previously loaded min/max details. +3. Bloom Filter lookup: Filter files and prune files to keys mapping for the incoming keys mapping based on the bloom + filter key lookup +4. Final Look up in actual data files to find the right location of every incoming record + +As we could see from step 1 and 2, we are in need of min and max values for "_hoodie_record_key" and bloom filters +from all interested data files to perform the location tagging. In this design, we will add these key stats and +bloom filter to the metadata table and thereby able to quickly load the interested details and do faster lookups. + +Metadata table already has one partition `files` to help in partition file listing. For the metadata table based +indices, we are proposing to add following two new partitions: +1. `bloom_filter` - for the file level bloom filter +2. `column_stats` - for the key range stats + +Why metadata table: +Metadata table uses HBase HFile - the map file format to store and retrieve data. HFile is an indexed file format and +supports map like faster lookups by keys. Since, we will be storing stats/bloom for every file and the index will do +lookups based on files, we should be able to benefit from the faster lookups in HFile. + + + +Following sections will talk about different partitions, key formats and then dive into the data and control flows. + +### MetaIndex/BloomFilter: + +A new partition `bloom_filter` will be added under the metadata table. Bloom filters from all the base files in the +data table will be added here. Metadata table is already in the HFile format. The existing metadata payload schema will +be extended and shared for this partition also. The type field will be used to detect the bloom filter payload record. +Here is the schema for the bloom filter payload record. +``` + { +"doc": "Metadata about base file bloom filters", +"name": "BloomFilterMetadata", +"type": [ +"null", +{ +"doc": "Base FileID and its BloomFilter details", +"name": "HoodieMetadataBloomFilter", +"type": "record", +"fields": [ +{ +"doc": "Version/type of the bloom filter metadata", +"name": "version", +"type": "string" +}, +{ +"doc": "Instant timestamp when this metadata was created/updated", +"name": "timestamp", +"type": "string" +}, +{ +"doc": "Bloom filter binary byte array", +"name": "bloomfilter", +"type": "bytes" +}, +{ +"doc": "True if
[GitHub] [hudi] nsivabalan commented on issue #4184: [SUPPORT]parquet is not a Parquet file (too small length:4)
nsivabalan commented on issue #4184: URL: https://github.com/apache/hudi/issues/4184#issuecomment-997127027 @bhasudha @bvaradar @leesf @danny0405 : have you folks encountered this before. a parquet file of size 4 bytes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot removed a comment on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997126501 ## CI report: * 1677eab2ead6910016c2ed0b67640c97757633bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4410) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4431) * 29113e6aff644be7511d84ae8428a8597a5b10b2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot removed a comment on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-996752326 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * 6ae6b2781237d7e4af95bd78062c3da765ebe9a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4434) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
hudi-bot removed a comment on pull request #4363: URL: https://github.com/apache/hudi/pull/4363#issuecomment-997126514 ## CI report: * d017173b44682dd26fa7238635ba9eb8fd750a1a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4461) * f0555fa1c09b27744084d20199683a1f8e68d9b7 UNKNOWN * 46bfd4cb47cb7cba1185b9e146cfc8396a91af88 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4464) * a10ec8a603b8297e0a69246b4d33866c9b7f5ad6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot commented on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997126936 ## CI report: * 1677eab2ead6910016c2ed0b67640c97757633bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4410) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4431) * 29113e6aff644be7511d84ae8428a8597a5b10b2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4465) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
hudi-bot commented on pull request #4363: URL: https://github.com/apache/hudi/pull/4363#issuecomment-997126961 ## CI report: * f0555fa1c09b27744084d20199683a1f8e68d9b7 UNKNOWN * 46bfd4cb47cb7cba1185b9e146cfc8396a91af88 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4464) * a10ec8a603b8297e0a69246b4d33866c9b7f5ad6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #4230: [SUPPORT] org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file
yihua commented on issue #4230: URL: https://github.com/apache/hudi/issues/4230#issuecomment-997126934 > This is happening also in 'Delete archive instants' > @h7kanna This could be due to FS timeout. The writer may still proceed with retries after the exception. Do you see this failing the write actions constantly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4306: [HUDI-3014] add table option to set utc timezone
hudi-bot commented on pull request #4306: URL: https://github.com/apache/hudi/pull/4306#issuecomment-997126921 ## CI report: * a39258ca69c6302da42cdb1fe1a0794676480952 UNKNOWN * a1ba1e2c81b74948a93589c3192ab24ef320107b UNKNOWN * c347bb78b3c799dce34db7a00c7f6a07c95ec777 UNKNOWN * d6a0ac9027bf12362b56729a86e9755dbe1c21db UNKNOWN * 4fd974b6b45e337f75bfaa9e6d54dc7e82cf1473 UNKNOWN * 0afe75ecfe523bdc74c8c37ba50de0cb0601166d UNKNOWN * 70457e0ba0b8dfd4ae63fd8c096abbbf051d6256 UNKNOWN * 6ae6b2781237d7e4af95bd78062c3da765ebe9a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4434) * bfdbb7db27a02f6c414769e58aa8cb1e841c3a21 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4200: spark-sql query timestamp partition error
nsivabalan commented on issue #4200: URL: https://github.com/apache/hudi/issues/4200#issuecomment-997126706 @YannByron : Can we please follow up on this one. If its a bug, please do file a tracking jira and close this one out. But lets try to work towards a fix it its a valid bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
hudi-bot removed a comment on pull request #4363: URL: https://github.com/apache/hudi/pull/4363#issuecomment-997119164 ## CI report: * d017173b44682dd26fa7238635ba9eb8fd750a1a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4461) * f0555fa1c09b27744084d20199683a1f8e68d9b7 UNKNOWN * 46bfd4cb47cb7cba1185b9e146cfc8396a91af88 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4464) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
hudi-bot commented on pull request #4363: URL: https://github.com/apache/hudi/pull/4363#issuecomment-997126514 ## CI report: * d017173b44682dd26fa7238635ba9eb8fd750a1a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4461) * f0555fa1c09b27744084d20199683a1f8e68d9b7 UNKNOWN * 46bfd4cb47cb7cba1185b9e146cfc8396a91af88 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4464) * a10ec8a603b8297e0a69246b4d33866c9b7f5ad6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot commented on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-997126501 ## CI report: * 1677eab2ead6910016c2ed0b67640c97757633bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4410) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4431) * 29113e6aff644be7511d84ae8428a8597a5b10b2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4349: [MINOR] remove unused import in HoodieFileIndex
hudi-bot removed a comment on pull request #4349: URL: https://github.com/apache/hudi/pull/4349#issuecomment-996699650 ## CI report: * 1677eab2ead6910016c2ed0b67640c97757633bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4410) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4431) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4221: [SUPPORT] hudi mor table has a lack of data
nsivabalan commented on issue #4221: URL: https://github.com/apache/hudi/issues/4221#issuecomment-997126353 If the conversation is taken offline, can we close this out. But please file a tracking jira if a bug if triaged as one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
manojpec commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771769770 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java ## @@ -35,44 +35,43 @@ public class TransactionManager implements Serializable { private static final Logger LOG = LogManager.getLogger(TransactionManager.class); - private final LockManager lockManager; + private final boolean supportsOptimisticConcurrency; Review comment: right, without changing the original config and all its usage, am going with `isOptimisticConcurrencyControlEnabled` flag in this class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] manojpec commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
manojpec commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771769708 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java ## @@ -35,44 +35,43 @@ public class TransactionManager implements Serializable { private static final Logger LOG = LogManager.getLogger(TransactionManager.class); - private final LockManager lockManager; + private final boolean supportsOptimisticConcurrency; private Option currentTxnOwnerInstant; private Option lastCompletedTxnOwnerInstant; - private boolean supportsOptimisticConcurrency; public TransactionManager(HoodieWriteConfig config, FileSystem fs) { this.lockManager = new LockManager(config, fs); this.supportsOptimisticConcurrency = config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl(); } - public synchronized void beginTransaction() { + public void beginTransaction() { if (supportsOptimisticConcurrency) { LOG.info("Transaction starting without a transaction owner"); lockManager.lock(); - LOG.info("Transaction started"); + LOG.info("Transaction started without a transaction owner"); } } - public synchronized void beginTransaction(Option currentTxnOwnerInstant, Option lastCompletedTxnOwnerInstant) { + public void beginTransaction(Option currentTxnOwnerInstant, + Option lastCompletedTxnOwnerInstant) { if (supportsOptimisticConcurrency) { - this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; - lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant); - LOG.info("Latest completed transaction instant " + lastCompletedTxnOwnerInstant); - this.currentTxnOwnerInstant = currentTxnOwnerInstant; - LOG.info("Transaction starting with transaction owner " + currentTxnOwnerInstant); + LOG.info("Transaction starting for " + currentTxnOwnerInstant + + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); lockManager.lock(); - LOG.info("Transaction started"); + this.currentTxnOwnerInstant = currentTxnOwnerInstant; + this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; + LOG.info("Transaction started for " + currentTxnOwnerInstant + + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); } } - public synchronized void endTransaction() { + public void endTransaction() { if (supportsOptimisticConcurrency) { LOG.info("Transaction ending with transaction owner " + currentTxnOwnerInstant); - lockManager.unlock(); - LOG.info("Transaction ended"); this.lastCompletedTxnOwnerInstant = Option.empty(); - lockManager.resetLatestCompletedWriteInstant(); + lockManager.unlock(); Review comment: sounds good, fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4241: [SUPPORT] Disaster Recovery (DR) Setup? Questions.
nsivabalan commented on issue #4241: URL: https://github.com/apache/hudi/issues/4241#issuecomment-997125988 @xushiyan @bhasudha @bvaradar @yanghua : Do you folks have any pointes on this regard. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
nsivabalan commented on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-997125124 @YannByron @danny0405 : Can either of you triage this. We might need a fix if its a bug. Feel free to file a tracking jira and work towards it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4131: [SUPPORT] org.apache.hudi.exception.HoodieException: The value of can not be null
nsivabalan commented on issue #4131: URL: https://github.com/apache/hudi/issues/4131#issuecomment-997122358 @YannByron : Can you look into this issue please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4363: [HUDI-3029] Transaction manager: avoid deadlock when doing begin and end transactions
alexeykudinkin commented on a change in pull request #4363: URL: https://github.com/apache/hudi/pull/4363#discussion_r771766731 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java ## @@ -35,44 +35,43 @@ public class TransactionManager implements Serializable { private static final Logger LOG = LogManager.getLogger(TransactionManager.class); - private final LockManager lockManager; + private final boolean supportsOptimisticConcurrency; Review comment: Name of the flag is misleading: had to go and check what it actually refers to to fully understand its semantic -- this one is rather about enabling/disabling CC (which you can disable if you only have a single writer) ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java ## @@ -35,44 +35,43 @@ public class TransactionManager implements Serializable { private static final Logger LOG = LogManager.getLogger(TransactionManager.class); - private final LockManager lockManager; + private final boolean supportsOptimisticConcurrency; private Option currentTxnOwnerInstant; private Option lastCompletedTxnOwnerInstant; - private boolean supportsOptimisticConcurrency; public TransactionManager(HoodieWriteConfig config, FileSystem fs) { this.lockManager = new LockManager(config, fs); this.supportsOptimisticConcurrency = config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl(); } - public synchronized void beginTransaction() { + public void beginTransaction() { if (supportsOptimisticConcurrency) { LOG.info("Transaction starting without a transaction owner"); lockManager.lock(); - LOG.info("Transaction started"); + LOG.info("Transaction started without a transaction owner"); } } - public synchronized void beginTransaction(Option currentTxnOwnerInstant, Option lastCompletedTxnOwnerInstant) { + public void beginTransaction(Option currentTxnOwnerInstant, + Option lastCompletedTxnOwnerInstant) { if (supportsOptimisticConcurrency) { - this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; - lockManager.setLatestCompletedWriteInstant(lastCompletedTxnOwnerInstant); - LOG.info("Latest completed transaction instant " + lastCompletedTxnOwnerInstant); - this.currentTxnOwnerInstant = currentTxnOwnerInstant; - LOG.info("Transaction starting with transaction owner " + currentTxnOwnerInstant); + LOG.info("Transaction starting for " + currentTxnOwnerInstant + + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); lockManager.lock(); - LOG.info("Transaction started"); + this.currentTxnOwnerInstant = currentTxnOwnerInstant; + this.lastCompletedTxnOwnerInstant = lastCompletedTxnOwnerInstant; + LOG.info("Transaction started for " + currentTxnOwnerInstant + + "with latest completed transaction instant " + lastCompletedTxnOwnerInstant); } } - public synchronized void endTransaction() { + public void endTransaction() { if (supportsOptimisticConcurrency) { LOG.info("Transaction ending with transaction owner " + currentTxnOwnerInstant); - lockManager.unlock(); - LOG.info("Transaction ended"); this.lastCompletedTxnOwnerInstant = Option.empty(); - lockManager.resetLatestCompletedWriteInstant(); + lockManager.unlock(); Review comment: Would suggest to create `reset(Instant, Instant)` method that you can invoke from both `lock` and `unlock` ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java ## @@ -35,44 +35,43 @@ public class TransactionManager implements Serializable { private static final Logger LOG = LogManager.getLogger(TransactionManager.class); - private final LockManager lockManager; + private final boolean supportsOptimisticConcurrency; private Option currentTxnOwnerInstant; private Option lastCompletedTxnOwnerInstant; - private boolean supportsOptimisticConcurrency; public TransactionManager(HoodieWriteConfig config, FileSystem fs) { this.lockManager = new LockManager(config, fs); this.supportsOptimisticConcurrency = config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl(); } - public synchronized void beginTransaction() { + public void beginTransaction() { if (supportsOptimisticConcurrency) { LOG.info("Transaction starting without a transaction owner"); lockManager.lock(); - LOG.info("Transaction started"); + LOG.info("Transaction started without a transaction owner"); } } - public synchronized void beginTransaction(Option currentTxnOwnerInstant, Option lastCompletedTxnOwnerInstant) { + public void beginTransaction(Option
[GitHub] [hudi] nsivabalan commented on issue #4340: [SUPPORT] Incremental read fails when no commit in the particular zone
nsivabalan commented on issue #4340: URL: https://github.com/apache/hudi/issues/4340#issuecomment-997121672 @fireking77 : may I know what do you mean by timezone here? do you mean, if there is no commit between begin time and end time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4361: [WIP][DO_NOT_MERGE] Test failure testing5
hudi-bot commented on pull request #4361: URL: https://github.com/apache/hudi/pull/4361#issuecomment-997120007 ## CI report: * 3f3780a32d11ac67c935870beaa460b67363dbbe UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #4109: [SUPPORT] SqlQueryEqualityPreCommitValidator errors with java.util.ConcurrentModificationException
nsivabalan closed issue #4109: URL: https://github.com/apache/hudi/issues/4109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4109: [SUPPORT] SqlQueryEqualityPreCommitValidator errors with java.util.ConcurrentModificationException
nsivabalan commented on issue #4109: URL: https://github.com/apache/hudi/issues/4109#issuecomment-997119344 Have filed a tracking [jira](https://issues.apache.org/jira/browse/HUDI-3058). Will close this out. @satishkotha : Once you have a PR, let me know. I can help review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-3058) SqlQueryEqualityPreCommitValidator errors with java.util.ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HUDI-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461747#comment-17461747 ] sivabalan narayanan commented on HUDI-3058: --- Proposed fix: CocurrentModificationException seems to be coming from here [https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTablePreCommitFileSystemView.java#L83] We need to redo this logic to avoid newFilesWrittenForPartition.remove(...). Simple option to try out: Replace line 72 with {{ Map newFilesWrittenForPartition = new ConcurrentHashMap(filesWritten.stream() .filter(file -> partitionStr.equals(file.getPartitionPath())) .collect(Collectors.toMap(HoodieWriteStat::getFileId, writeStat -> new HoodieBaseFile(new Path(tableMetaClient.getBasePath(), writeStat.getPath()).toString()}} Above is more a short-term workaround. Probably better option is to avoid modifying the Map in first place. This can be done by grouping based on fileId i.e., replace line 78 -88 with: {{Map baseFilesForCommittedFileIds = committedBaseFiles // Remove files replaced by current inflight commit .filter(baseFile -> !replacedFileIdsForPartition.contains(baseFile.getFileId())) collect(Collectors.toMap(HoodieBaseFile::getFileId, baseFile -> baseFile)) baseFilesForCommittedFileIds.putAll(newFilesWrittenForPartition) return baseFilesForCommittedFileIds.values().stream();}} This needs some more testing. I can send PR next week. > SqlQueryEqualityPreCommitValidator errors with > java.util.ConcurrentModificationException > > > Key: HUDI-3058 > URL: https://issues.apache.org/jira/browse/HUDI-3058 > Project: Apache Hudi > Issue Type: Bug > Components: Usability >Affects Versions: 0.10.0 >Reporter: sivabalan narayanan >Priority: Major > Fix For: 0.11.0 > > > Ref issue: [https://github.com/apache/hudi/issues/4109] > > Faced concurrentModificationException when trying to test > SqlQueryEqualityPreCommitValidator in quickstart guide > *To Reproduce* > Steps to reproduce the behavior: > # Insert data without any pre commit validations > # Update data (ensured the updates dont touch the fare column in quickstart > example) with the following precommit validator props > {{option("hoodie.precommit.validators", > "org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator"). > option("hoodie.precommit.validators.equality.sql.queries", "select sum(fare) > from ").}} > stacktrace: > {code:java} > org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit > time 20211124114945342 > at > org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:62) > at > org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:46) > at > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:111) > at > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:95) > at > org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:174) > at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) > at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:276) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at
[jira] [Updated] (HUDI-3058) SqlQueryEqualityPreCommitValidator errors with java.util.ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HUDI-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3058: -- Affects Version/s: 0.10.0 > SqlQueryEqualityPreCommitValidator errors with > java.util.ConcurrentModificationException > > > Key: HUDI-3058 > URL: https://issues.apache.org/jira/browse/HUDI-3058 > Project: Apache Hudi > Issue Type: Bug > Components: Usability >Affects Versions: 0.10.0 >Reporter: sivabalan narayanan >Priority: Major > Fix For: 0.11.0 > > > Faced concurrentModificationException when trying to test > SqlQueryEqualityPreCommitValidator in quickstart guide > *To Reproduce* > Steps to reproduce the behavior: > # Insert data without any pre commit validations > # Update data (ensured the updates dont touch the fare column in quickstart > example) with the following precommit validator props > {{option("hoodie.precommit.validators", > "org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator"). > option("hoodie.precommit.validators.equality.sql.queries", "select sum(fare) > from ").}} > stacktrace: > {code:java} > org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit > time 20211124114945342 > at > org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:62) > at > org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:46) > at > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:111) > at > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:95) > at > org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:174) > at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) > at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:276) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) > ... 70 elided > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1633) > at > java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) > at java.util.HashMap
[jira] [Updated] (HUDI-3058) SqlQueryEqualityPreCommitValidator errors with java.util.ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HUDI-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3058: -- Description: Ref issue: [https://github.com/apache/hudi/issues/4109] Faced concurrentModificationException when trying to test SqlQueryEqualityPreCommitValidator in quickstart guide *To Reproduce* Steps to reproduce the behavior: # Insert data without any pre commit validations # Update data (ensured the updates dont touch the fare column in quickstart example) with the following precommit validator props {{option("hoodie.precommit.validators", "org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator"). option("hoodie.precommit.validators.equality.sql.queries", "select sum(fare) from ").}} stacktrace: {code:java} org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20211124114945342 at org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:62) at org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:46) at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:111) at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:95) at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:174) at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:276) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) ... 70 elided Caused by: java.util.ConcurrentModificationException at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1633) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272) at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1556) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.client.utils.SparkValidatorUtils.getRecordsFromPendingCommits(SparkValidatorUtils.java:159) at org.apache.hudi.client.utils.SparkValidatorUtils.runValidators(SparkValidatorUtils.java:78)