[GitHub] [hudi] bvaradar closed issue #2432: [SUPPORT] write hudi data failed when using Deltastreamer
bvaradar closed issue #2432: URL: https://github.com/apache/hudi/issues/2432 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2432: [SUPPORT] write hudi data failed when using Deltastreamer
bvaradar commented on issue #2432: URL: https://github.com/apache/hudi/issues/2432#issuecomment-765213298 Closing due to inactivity !! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2423: Performance Issues due to significant Parallel Create-Dir being issued to Azure ADLS_V2
bvaradar commented on issue #2423: URL: https://github.com/apache/hudi/issues/2423#issuecomment-765213042 @nsivabalan : Can you open a PR with code changes in https://github.com/apache/hudi/issues/2423#issuecomment-758433327 to have it landed ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2397: [SUPPORT]
bvaradar commented on issue #2397: URL: https://github.com/apache/hudi/issues/2397#issuecomment-765212048 Please reopen if this is still an issue ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar closed issue #2397: [SUPPORT]
bvaradar closed issue #2397: URL: https://github.com/apache/hudi/issues/2397 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2448: [SUPPORT] deltacommit for client 172.16.116.102 already exists
bvaradar commented on issue #2448: URL: https://github.com/apache/hudi/issues/2448#issuecomment-765211520 @peng-xin : Can you enable hoodie.compact.inline -> true and hoodie.auto.commit -> true. The log files are growing because they need to be compacted and if you set the first config, it will periodically run compactions. Cleaner will eventually remove old log files and parquet files after that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2463: [SUPPORT] Tuning Hudi Upsert Job
bvaradar commented on issue #2463: URL: https://github.com/apache/hudi/issues/2463#issuecomment-765206339 @rubenssoto : The time taken is coming from the shuffling the data to route for writing to parquet file and the writing part. I think if you increase the number of executors (reduce cores per executor to 2) and try, it would be better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #2461: All records are present in athena query result on glue crawled Hudi tables
vinothchandar commented on issue #2461: URL: https://github.com/apache/hudi/issues/2461#issuecomment-765188290 @vrtrepp you could potentially modify your existing glue step to run the hive-sync tool as an additonal step? At a high level, without the `HoodieParquetInputFormat` as the registered format, its impossible for athena to do the filtering necessary to just give you the latest records. Just saw this today : https://aws.amazon.com/blogs/big-data/writing-to-apache-hudi-tables-using-aws-glue-connector/ , not sure if its directly helpful for you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua merged pull request #2459: [MINOR] Improve code readability,remove the continue keyword
yanghua merged pull request #2459: URL: https://github.com/apache/hudi/pull/2459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #2450: [HUDI-1538] Try to init class trying different signatures instead of checking its name.
yanghua commented on pull request #2450: URL: https://github.com/apache/hudi/pull/2450#issuecomment-765144294 @vburenin Would you please rebase with the master branch and see if the Travis is OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua merged pull request #2471: [MINOR] Remove InstantGeneratorOperator parallelism limit in HoodieFli…
yanghua merged pull request #2471: URL: https://github.com/apache/hudi/pull/2471 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2471: [MINOR] Remove InstantGeneratorOperator parallelism limit in HoodieFli…
codecov-io edited a comment on pull request #2471: URL: https://github.com/apache/hudi/pull/2471#issuecomment-765116001 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=h1) Report > Merging [#2471](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=desc) (ef62c24) into [master](https://codecov.io/gh/apache/hudi/commit/976420c49a3fe764f7cecd30b6cdb32861be5537?el=desc) (976420c) will **increase** coverage by `19.20%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2471/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2471 +/- ## = + Coverage 50.27% 69.48% +19.20% + Complexity 3050 358 -2692 = Files 419 53 -366 Lines 18897 1930-16967 Branches 1937 230 -1707 = - Hits 9500 1341 -8159 + Misses 8622 456 -8166 + Partials775 133 -642 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.48% <ø> (+0.05%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=) | | | | | [.../java/org/apache/hudi/hadoop/InputPathHandler.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0lucHV0UGF0aEhhbmRsZXIuamF2YQ==) | | | | | [...udi/common/table/timeline/dto/FSPermissionDTO.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GU1Blcm1pc3Npb25EVE8uamF2YQ==) | | | | | [...rg/apache/hudi/common/util/SerializationUtils.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU2VyaWFsaXphdGlvblV0aWxzLmphdmE=) | | | | | [...adoop/realtime/HoodieHFileRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUhGaWxlUmVhbHRpbWVJbnB1dEZvcm1hdC5qYXZh) | | | | | [...hudi/hadoop/hive/HoodieCombineHiveInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZUhpdmVJbnB1dEZvcm1hdC5qYXZh) | | | | | [...di/timeline/service/handlers/FileSliceHandler.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvRmlsZVNsaWNlSGFuZGxlci5qYXZh) | | | | | [...ain/scala/org/apache/hudi/cli/DedupeSparkJob.scala](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9EZWR1cGVTcGFya0pvYi5zY2FsYQ==) | | | | | [...rg/apache/hudi/metadata/MetadataPartitionType.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvTWV0YWRhdGFQYXJ0aXRpb25UeXBlLmphdmE=) | | | | | [...sioning/clean/CleanMetadataV1MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YVYxTWlncmF0aW9uSGFuZGxlci5qYXZh) | | | | | ... and [355 more](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact
[GitHub] [hudi] leesf commented on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory
leesf commented on pull request #2447: URL: https://github.com/apache/hudi/pull/2447#issuecomment-765010589 > @leesf Hello, I have a doubt now. I did not modify the code of `hudi-ineg-test`, but every time the check fails because of it, do you know why? would you please rebase to master since there is a flaky test fix against the master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 commented on pull request #2412: [HUDI-1512] Fix spark 2 unit tests failure with Spark 3
garyli1019 commented on pull request #2412: URL: https://github.com/apache/hudi/pull/2412#issuecomment-764705432 should be ok to merge since we cut the release This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua merged pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism
yanghua merged pull request #2434: URL: https://github.com/apache/hudi/pull/2434 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jiangjiguang commented on issue #143: Tracking ticket for folks to be added to slack group
jiangjiguang commented on issue #143: URL: https://github.com/apache/hudi/issues/143#issuecomment-764596209 Hi, could you add me to the slack group? My email is jiangjiguang...@163.com This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2388: [HUDI-1353] add incremental timeline support for pending clustering ops
codecov-io edited a comment on pull request #2388: URL: https://github.com/apache/hudi/pull/2388#issuecomment-751907604 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2388?src=pr=h1) Report > Merging [#2388](https://codecov.io/gh/apache/hudi/pull/2388?src=pr=desc) (4423c0c) into [master](https://codecov.io/gh/apache/hudi/commit/81ccb0c71ad2c17c5613698d4fb50f3b49b21fb4?el=desc) (81ccb0c) will **increase** coverage by `19.15%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2388/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2388?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2388 +/- ## = + Coverage 50.27% 69.43% +19.15% + Complexity 3050 357 -2693 = Files 419 53 -366 Lines 18897 1930-16967 Branches 1937 230 -1707 = - Hits 9500 1340 -8160 + Misses 8622 456 -8166 + Partials775 134 -641 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2388?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../java/org/apache/hudi/common/util/CommitUtils.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29tbWl0VXRpbHMuamF2YQ==) | | | | | [...rg/apache/hudi/common/bloom/SimpleBloomFilter.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL1NpbXBsZUJsb29tRmlsdGVyLmphdmE=) | | | | | [...apache/hudi/common/model/HoodieCommitMetadata.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUNvbW1pdE1ldGFkYXRhLmphdmE=) | | | | | [...pache/hudi/io/storage/HoodieFileReaderFactory.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVGaWxlUmVhZGVyRmFjdG9yeS5qYXZh) | | | | | [...i/common/table/log/block/HoodieHFileDataBlock.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVIRmlsZURhdGFCbG9jay5qYXZh) | | | | | [...hudi/common/config/DFSPropertiesConfiguration.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9ERlNQcm9wZXJ0aWVzQ29uZmlndXJhdGlvbi5qYXZh) | | | | | [.../org/apache/hudi/hadoop/utils/HoodieHiveUtils.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZUhpdmVVdGlscy5qYXZh) | | | | | [...e/hudi/common/model/HoodieRollingStatMetadata.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJvbGxpbmdTdGF0TWV0YWRhdGEuamF2YQ==) | | | | | [...pache/hudi/common/model/HoodieArchivedLogFile.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUFyY2hpdmVkTG9nRmlsZS5qYXZh) | | | | | [.../hive/SlashEncodedHourPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkSG91clBhcnRpdGlvblZhbHVlRXh0cmFjdG9yLmphdmE=) | | | | | ... and [346 more](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2468: [MINOR] Disabling problematic tests temporarily to stabilize CI
vinothchandar commented on pull request #2468: URL: https://github.com/apache/hudi/pull/2468#issuecomment-763990996 https://travis-ci.com/github/vinothchandar/hudi/builds/213836867 Build passes here (ran on my fork, since apache queue is overloaded now). Landing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2426: [HUDI-304] Configure spotless and java style
codecov-io edited a comment on pull request #2426: URL: https://github.com/apache/hudi/pull/2426#issuecomment-757443274 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2426?src=pr=h1) Report > Merging [#2426](https://codecov.io/gh/apache/hudi/pull/2426?src=pr=desc) (9cb1268) into [master](https://codecov.io/gh/apache/hudi/commit/e926c1a45ca95fa1911f6f88a0577554f2797760?el=desc) (e926c1a) will **decrease** coverage by `0.20%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2426/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2426?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2426 +/- ## - Coverage 50.73% 50.52% -0.21% + Complexity 3064 3032 -32 Files 419 417 -2 Lines 1879718725 -72 Branches 1922 1917 -5 - Hits 9536 9461 -75 - Misses 8485 8489 +4 + Partials776 775 -1 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.28% <ø> (+0.01%)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.61% <ø> (-0.41%)` | `0.00 <ø> (ø)` | | | hudiflink | `10.20% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.00% <ø> (-0.06%)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `66.07% <ø> (+0.16%)` | `0.00 <ø> (ø)` | | | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `66.84% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.41% <ø> (-0.02%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2426?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/common/engine/TaskContextSupplier.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9UYXNrQ29udGV4dFN1cHBsaWVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...e/hudi/metadata/FileSystemBackedTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvRmlsZVN5c3RlbUJhY2tlZFRhYmxlTWV0YWRhdGEuamF2YQ==) | `0.00% <0.00%> (-94.60%)` | `0.00% <0.00%> (-13.00%)` | | | [...apache/hudi/common/engine/HoodieEngineContext.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Ib29kaWVFbmdpbmVDb250ZXh0LmphdmE=) | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-1.00%)` | | | [...g/apache/hudi/common/function/FunctionWrapper.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Z1bmN0aW9uL0Z1bmN0aW9uV3JhcHBlci5qYXZh) | `0.00% <0.00%> (-10.53%)` | `0.00% <0.00%> (-2.00%)` | | | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | | | [...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==) | `45.71% <0.00%> (-6.18%)` | `55.00% <0.00%> (-6.00%)` | | | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `68.96% <0.00%> (-1.73%)` | `43.00% <0.00%> (-2.00%)` | | | [...i/common/table/timeline/TimelineMetadataUtils.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTWV0YWRhdGFVdGlscy5qYXZh) | `71.69% <0.00%> (-1.03%)` | `17.00% <0.00%> (ø%)` | | | [...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=) | `63.15% <0.00%> (-0.48%)` | `12.00% <0.00%> (ø%)` | | |
[GitHub] [hudi] bvaradar commented on issue #2464: [SUPPORT] ExecutorLostFailure - Try Processing 1TB Of Data
bvaradar commented on issue #2464: URL: https://github.com/apache/hudi/issues/2464#issuecomment-763888673 @rubenssoto : This would need trial and error. Can you try with 6GB+ and see if the error goes away ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2463: [SUPPORT] Tuning Hudi Upsert Job
rubenssoto commented on issue #2463: URL: https://github.com/apache/hudi/issues/2463#issuecomment-764683609 Hi @bvaradar , Could you helpme with this ? :) thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jessica0530 commented on issue #143: Tracking ticket for folks to be added to slack group
jessica0530 commented on issue #143: URL: https://github.com/apache/hudi/issues/143#issuecomment-764415027 please add me wjxdtc10...@gmail.com thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] teeyog commented on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory
teeyog commented on pull request #2447: URL: https://github.com/apache/hudi/pull/2447#issuecomment-764544455 @leesf Hello, I have a doubt now. I did not modify the code of ```hudi-ineg-test```, but every time the check fails because of it, do you know why? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Trevor-zhang commented on a change in pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive
Trevor-zhang commented on a change in pull request #2449: URL: https://github.com/apache/hudi/pull/2449#discussion_r561621697 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java ## @@ -284,6 +284,9 @@ public static HiveSyncConfig buildHiveSyncConfig(TypedProperties props, String b props.getString(DataSourceWriteOptions.HIVE_PASS_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_PASS_OPT_VAL()); hiveSyncConfig.jdbcUrl = props.getString(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_URL_OPT_VAL()); +if (hiveSyncConfig.hiveMetaStoreUri != null) { + hiveSyncConfig.hiveMetaStoreUri = props.getString(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_METASTORE_URI_OPT_VAL()); +} Review comment: Because this is not a required option . ## File path: hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java ## @@ -284,6 +284,9 @@ public static HiveSyncConfig buildHiveSyncConfig(TypedProperties props, String b props.getString(DataSourceWriteOptions.HIVE_PASS_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_PASS_OPT_VAL()); hiveSyncConfig.jdbcUrl = props.getString(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_URL_OPT_VAL()); +if (hiveSyncConfig.hiveMetaStoreUri != null) { + hiveSyncConfig.hiveMetaStoreUri = props.getString(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_METASTORE_URI_OPT_VAL()); +} Review comment: (1) When synchronizing hudi data with local hive,` hiveSyncConfig.hiveMetaStoreUri` can be set to null. (2) There is no priority distinction between `hiveSyncConfig` and `props`. Because there is no props attribute in `hiveSyncConfig`, only a single attribute is set. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #2454: [HUDI-393] Set up CI with Azure Pipelines
xushiyan commented on pull request #2454: URL: https://github.com/apache/hudi/pull/2454#issuecomment-764489544 auto-closed due to personal repo force update... will re-create PR once tried out some settings This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism
vinothchandar commented on pull request #2434: URL: https://github.com/apache/hudi/pull/2434#issuecomment-764820382 @yanghua should be good to land now if you are happy with it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 merged pull request #2412: [HUDI-1512] Fix spark 2 unit tests failure with Spark 3
garyli1019 merged pull request #2412: URL: https://github.com/apache/hudi/pull/2412 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2459: [MINOR] Improve code readability,remove the continue keyword
codecov-io edited a comment on pull request #2459: URL: https://github.com/apache/hudi/pull/2459#issuecomment-763007666 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2464: [SUPPORT] ExecutorLostFailure - Try Processing 1TB Of Data
rubenssoto commented on issue #2464: URL: https://github.com/apache/hudi/issues/2464#issuecomment-764682521 Thank you so much for your help @bvaradar , it worked. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2471: [MINOR] Remove InstantGeneratorOperator parallelism limit in HoodieFli…
codecov-io commented on pull request #2471: URL: https://github.com/apache/hudi/pull/2471#issuecomment-765116001 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=h1) Report > Merging [#2471](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=desc) (ef62c24) into [master](https://codecov.io/gh/apache/hudi/commit/976420c49a3fe764f7cecd30b6cdb32861be5537?el=desc) (976420c) will **decrease** coverage by `40.58%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2471/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #2471 +/- ## - Coverage 50.27% 9.68% -40.59% + Complexity 3050 48 -3002 Files 419 53 -366 Lines 188971930-16967 Branches 1937 230 -1707 - Hits 9500 187 -9313 + Misses 86221730 -6892 + Partials775 13 -762 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.68% <ø> (-59.75%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%>
[GitHub] [hudi] wangxianghu commented on a change in pull request #2375: [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client
wangxianghu commented on a change in pull request #2375: URL: https://github.com/apache/hudi/pull/2375#discussion_r562308054 ## File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/bloom/HoodieFlinkBloomIndexCheckFunction.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.index.bloom; + +import org.apache.hudi.client.utils.LazyIterableIterator; +import org.apache.hudi.common.model.HoodieKey; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.exception.HoodieIndexException; +import org.apache.hudi.io.HoodieKeyLookupHandle; +import org.apache.hudi.io.HoodieKeyLookupHandle.KeyLookupResult; +import org.apache.hudi.table.HoodieTable; + +import java.util.function.Function; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + +import scala.Tuple2; + +/** + * Function performing actual checking of List partition containing (fileId, hoodieKeys) against the actual files. + */ +//TODO we can move this class into the hudi-client-common and reuse it for spark client +public class HoodieFlinkBloomIndexCheckFunction +implements Function>, Iterator>> { + + private final HoodieTable hoodieTable; + + private final HoodieWriteConfig config; + + public HoodieFlinkBloomIndexCheckFunction(HoodieTable hoodieTable, HoodieWriteConfig config) { +this.hoodieTable = hoodieTable; +this.config = config; + } + + @Override + public Iterator> apply(Iterator> fileParitionRecordKeyTripletItr) { +return new LazyKeyCheckIterator(fileParitionRecordKeyTripletItr); + } + + @Override + public Function>> compose(Function>> before) { +return null; + } + + @Override + public Function>, V> andThen(Function>, ? extends V> after) { +return null; + } + + class LazyKeyCheckIterator extends LazyIterableIterator, List> { Review comment: > maybe we can move HoodieFlinkBloomIndexCheckFunction into the hudi-client-common later then spark can reuse it. yes, could be annother pr This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive
danny0405 commented on a change in pull request #2449: URL: https://github.com/apache/hudi/pull/2449#discussion_r561587235 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java ## @@ -284,6 +284,9 @@ public static HiveSyncConfig buildHiveSyncConfig(TypedProperties props, String b props.getString(DataSourceWriteOptions.HIVE_PASS_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_PASS_OPT_VAL()); hiveSyncConfig.jdbcUrl = props.getString(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_URL_OPT_VAL()); +if (hiveSyncConfig.hiveMetaStoreUri != null) { + hiveSyncConfig.hiveMetaStoreUri = props.getString(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_METASTORE_URI_OPT_VAL()); +} Review comment: What is the purpose of the decision `hiveSyncConfig.hiveMetaStoreUri != null` ? ## File path: hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java ## @@ -284,6 +284,9 @@ public static HiveSyncConfig buildHiveSyncConfig(TypedProperties props, String b props.getString(DataSourceWriteOptions.HIVE_PASS_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_PASS_OPT_VAL()); hiveSyncConfig.jdbcUrl = props.getString(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_URL_OPT_VAL()); +if (hiveSyncConfig.hiveMetaStoreUri != null) { + hiveSyncConfig.hiveMetaStoreUri = props.getString(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), DataSourceWriteOptions.DEFAULT_HIVE_METASTORE_URI_OPT_VAL()); +} Review comment: Looks weird, in which case the `hiveSyncConfig.hiveMetaStoreUri` can be null and what is the priority between the `hiveSyncConfig` and the `props` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
codecov-io edited a comment on pull request #2430: URL: https://github.com/apache/hudi/pull/2430#issuecomment-757736411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand
codecov-io edited a comment on pull request #2325: URL: https://github.com/apache/hudi/pull/2325#issuecomment-742860619 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar merged pull request #2469: [MINOR] Make a separate travis CI job for hudi-utilities
vinothchandar merged pull request #2469: URL: https://github.com/apache/hudi/pull/2469 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism
yanghua commented on pull request #2434: URL: https://github.com/apache/hudi/pull/2434#issuecomment-765047901 > @yanghua should be good to land now if you are happy with it ack, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r561768669 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/StreamWriteFunction.java ## @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator; + +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.client.FlinkTaskContextSupplier; +import org.apache.hudi.client.HoodieFlinkWriteClient; +import org.apache.hudi.client.WriteStatus; +import org.apache.hudi.client.common.HoodieFlinkEngineContext; +import org.apache.hudi.common.config.SerializableConfiguration; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.model.WriteOperationType; +import org.apache.hudi.common.util.ObjectSizeCalculator; +import org.apache.hudi.keygen.KeyGenerator; +import org.apache.hudi.operator.event.BatchWriteSuccessEvent; +import org.apache.hudi.util.StreamerUtil; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.formats.avro.RowDataToAvroConverters; +import org.apache.flink.runtime.operators.coordination.OperatorEventGateway; +import org.apache.flink.runtime.state.FunctionInitializationContext; +import org.apache.flink.runtime.state.FunctionSnapshotContext; +import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction; +import org.apache.flink.streaming.api.functions.KeyedProcessFunction; +import org.apache.flink.table.types.logical.RowType; +import org.apache.flink.util.Collector; +import org.apache.flink.util.Preconditions; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Random; +import java.util.concurrent.locks.Condition; +import java.util.concurrent.locks.ReentrantLock; +import java.util.function.BiFunction; + +/** + * Sink function to write the data to the underneath filesystem. + * + * Work Flow + * + * The function firstly buffers the data as a batch of {@link HoodieRecord}s, + * It flushes(write) the records batch when a Flink checkpoint starts. After a batch has been written successfully, + * the function notifies its operator coordinator {@link StreamWriteOperatorCoordinator} to mark a successful write. + * + * Exactly-once Semantics + * + * The task implements exactly-once semantics by buffering the data between checkpoints. The operator coordinator + * starts a new instant on the time line when a checkpoint triggers, the coordinator checkpoints always + * start before its operator, so when this function starts a checkpoint, a REQUESTED instant already exists. + * The function process thread then block data buffering and the checkpoint thread starts flushing the existing data buffer. + * When the existing data buffer write successfully, the process thread unblock and start buffering again for the next round checkpoint. + * Because any checkpoint failures would trigger the write rollback, it implements the exactly-once semantics. + * + * Fault Tolerance + * + * The operator coordinator checks the validity for the last instant when it starts a new one. The operator rolls back + * the written data and throws when any error occurs. This means any checkpoint or task failure would trigger a failover. + * The operator coordinator would try several times when committing the writestatus. + * + * Note: The function task requires the input stream be partitioned by the partition fields to avoid different write tasks + * write to the same file group that conflict. The general case for partition path is a datetime field, + * so the sink task is very possible to have IO bottleneck, the more flexible solution is to shuffle the + * data by the file group IDs. + * + * @param Type of the input record + * @see StreamWriteOperatorCoordinator + */ +public class StreamWriteFunction extends KeyedProcessFunction implements CheckpointedFunction { + + private static
[GitHub] [hudi] vinothchandar commented on pull request #2469: [MINOR] Make a separate travis CI job for hudi-utilities
vinothchandar commented on pull request #2469: URL: https://github.com/apache/hudi/pull/2469#issuecomment-764395733 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory
codecov-io edited a comment on pull request #2447: URL: https://github.com/apache/hudi/pull/2447#issuecomment-760949326 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] teeyog commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field
teeyog commented on a change in pull request #2431: URL: https://github.com/apache/hudi/pull/2431#discussion_r561454150 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -46,6 +46,11 @@ class DefaultSource extends RelationProvider with StreamSinkProvider with Serializable { + SparkSession.getActiveSession.foreach { spark => +// Enable "passPartitionByAsOptions" to support "write.partitionBy(...)" +spark.conf.set("spark.sql.legacy.sources.write.passPartitionByAsOptions", "true") Review comment: Thank you for your review, todo has been added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on pull request #2471: [MINOR]Remove InstantGeneratorOperator parallelism limit in HoodieFli…
wangxianghu commented on pull request #2471: URL: https://github.com/apache/hudi/pull/2471#issuecomment-765067817 @yanghua @loukey-lj please take a look when free This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (748dcc9 -> 048633d)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 748dcc9 [MINOR] Remove InstantGeneratorOperator parallelism limit in HoodieFlinkStreamer and update docs (#2471) add 048633d [MINOR] Improve code readability,remove the continue keyword (#2459) No new revisions were added by this update. Summary of changes: .../java/org/apache/hudi/integ/testsuite/reader/DFSDeltaInputReader.java | 1 - 1 file changed, 1 deletion(-)
[hudi] branch master updated (641abe8 -> 748dcc9)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 641abe8 [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client (#2375) add 748dcc9 [MINOR] Remove InstantGeneratorOperator parallelism limit in HoodieFlinkStreamer and update docs (#2471) No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/hudi/HoodieFlinkStreamer.java | 3 +-- .../org/apache/hudi/operator/InstantGenerateOperator.java | 13 ++--- 2 files changed, 7 insertions(+), 9 deletions(-)
[GitHub] [hudi] xushiyan closed pull request #2454: [HUDI-393] Set up CI with Azure Pipelines
xushiyan closed pull request #2454: URL: https://github.com/apache/hudi/pull/2454 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2374: [HUDI-845] Added locking capability to allow multiple writers
codecov-io edited a comment on pull request #2374: URL: https://github.com/apache/hudi/pull/2374#issuecomment-750782300 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=h1) Report > Merging [#2374](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=desc) (7ababea) into [master](https://codecov.io/gh/apache/hudi/commit/a38612b10f6ae04644519270f9b5eb631a77c148?el=desc) (a38612b) will **decrease** coverage by `41.00%`. > The diff coverage is `100.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2374/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #2374 +/- ## - Coverage 50.69% 9.68% -41.01% + Complexity 3059 48 -3011 Files 419 53 -366 Lines 188101930-16880 Branches 1924 230 -1694 - Hits 9535 187 -9348 + Misses 84981730 -6768 + Partials777 13 -764 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.68% <100.00%> (-59.80%)` | `0.00 <1.00> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `83.62% <100.00%> (-5.18%)` | `28.00 <1.00> (ø)` | | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)`
[jira] [Commented] (HUDI-1503) Implement a Hash(Bucket)-based Index
[ https://issues.apache.org/jira/browse/HUDI-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269828#comment-17269828 ] Mihir Shah commented on HUDI-1503: -- h4. [Shimin Yang|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=dangdangdang] Hello Mr. Yang, I would be interested in working on this issue, I was wondering if there is some documentation about the index or the project's design so I could understand the problem better? Thank you! > Implement a Hash(Bucket)-based Index > > > Key: HUDI-1503 > URL: https://issues.apache.org/jira/browse/HUDI-1503 > Project: Apache Hudi > Issue Type: Wish > Components: Index, Performance >Reporter: Shimin Yang >Priority: Major > > This ticket is to introduce a new hash based index, which can improve the > performance of write operations and speed up the queries at the same > time(removing shuffle for Spark/Hive). > The new hash-based index works with a customized hash-based partitioner, > which partition records based on the hash value of index keys and a fixed > bucket number. So there's no need to visit the existing files to determine > which file group each record belongs. > Meanwhile, the file group id, hash mode and bucket num can be used by the > query engines to eliminate shuffle introduced by aggregation and join. > We implemented an HoodieIndex based on hive hash function which used on > production environment of ByteDance for many very-large volume dataset, and > we hope this feature can be contributed to the community soon. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vrtrepp commented on issue #2461: All records are present in athena query result on glue crawled Hudi tables
vrtrepp commented on issue #2461: URL: https://github.com/apache/hudi/issues/2461#issuecomment-76438 Hi Rubenssoto, That is how we are planning but it will involve writing few more steps in the pipeline.However our current architecture is based on running glue crawlers and removing Glue crawlers will come with making changes in many pipelines again a month's task atleast. What I was curious about will there be any support that Hudi is going to add in future ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 merged pull request #2375: [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client
garyli1019 merged pull request #2375: URL: https://github.com/apache/hudi/pull/2375 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhedoubushishi commented on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field
zhedoubushishi commented on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-764178360 LGTM! This is also something I plan to do. Let's wait for others' review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r561716925 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator; + +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.streamer.FlinkStreamerConfig; +import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload; +import org.apache.hudi.keygen.SimpleAvroKeyGenerator; +import org.apache.hudi.keygen.constant.KeyGeneratorOptions; +import org.apache.hudi.util.StreamerUtil; + +import org.apache.flink.configuration.ConfigOption; +import org.apache.flink.configuration.ConfigOptions; +import org.apache.flink.configuration.Configuration; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Hoodie Flink config options. + * + * It has the options for Hoodie table read and write. It also defines some utilities. + */ +public class FlinkOptions { + private FlinkOptions() { + } + + // + // Base Options + // + public static final ConfigOption PATH = ConfigOptions + .key("path") + .stringType() + .noDefaultValue() + .withDescription("Base path for the target hoodie table." + + "\nThe path would be created if it does not exist,\n" + + "otherwise a Hoodie table expects to be initialized successfully"); + + public static final ConfigOption PROPS_FILE_PATH = ConfigOptions + .key("properties-file.path") + .stringType() + .noDefaultValue() + .withDescription("Path to properties file on local-fs or dfs, with configurations for \n" + + "hoodie client, schema provider, key generator and data source. For hoodie client props, sane defaults are\n" + + "used, but recommend use to provide basic things like metrics endpoints, hive configs etc. For sources, refer\n" + + "to individual classes, for supported properties"); + + // + // Read Options + // + public static final ConfigOption READ_SCHEMA_FILE_PATH = ConfigOptions + .key("read.schema.file.path") + .stringType() + .noDefaultValue() + .withDescription("Avro schema file path, the parsed schema is used for deserializing"); + + // + // Write Options + // + public static final ConfigOption TABLE_NAME = ConfigOptions + .key(HoodieWriteConfig.TABLE_NAME) + .stringType() + .noDefaultValue() + .withDescription("Table name to register to Hive metastore"); + + public static final ConfigOption TABLE_TYPE = ConfigOptions Review comment: So can we make `COPY_ON_WRITE` and `copy_on_write` equivalence? ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator; + +import org.apache.hudi.config.HoodieWriteConfig;
[GitHub] [hudi] rubenssoto closed issue #2464: [SUPPORT] ExecutorLostFailure - Try Processing 1TB Of Data
rubenssoto closed issue #2464: URL: https://github.com/apache/hudi/issues/2464 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field
codecov-io edited a comment on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-757929313 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2431?src=pr=h1) Report > Merging [#2431](https://codecov.io/gh/apache/hudi/pull/2431?src=pr=desc) (f1d0fda) into [master](https://codecov.io/gh/apache/hudi/commit/a38612b10f6ae04644519270f9b5eb631a77c148?el=desc) (a38612b) will **decrease** coverage by `41.00%`. > The diff coverage is `100.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2431/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2431?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #2431 +/- ## - Coverage 50.69% 9.68% -41.01% + Complexity 3059 48 -3011 Files 419 53 -366 Lines 188101930-16880 Branches 1924 230 -1694 - Hits 9535 187 -9348 + Misses 84981730 -6768 + Partials777 13 -764 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.68% <100.00%> (-59.80%)` | `0.00 <1.00> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2431?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `83.62% <100.00%> (-5.18%)` | `28.00 <1.00> (ø)` | | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2431/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) |
[GitHub] [hudi] vinothchandar commented on issue #2470: [SUPPORT] Heavy skew in ListingBasedRollbackHelper
vinothchandar commented on issue #2470: URL: https://github.com/apache/hudi/issues/2470#issuecomment-764870104 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar merged pull request #2468: [MINOR] Disabling problematic tests temporarily to stabilize CI
vinothchandar merged pull request #2468: URL: https://github.com/apache/hudi/pull/2468 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed
codecov-io edited a comment on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-729530724 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field
zhedoubushishi commented on a change in pull request #2431: URL: https://github.com/apache/hudi/pull/2431#discussion_r561334344 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -46,6 +46,11 @@ class DefaultSource extends RelationProvider with StreamSinkProvider with Serializable { + SparkSession.getActiveSession.foreach { spark => +// Enable "passPartitionByAsOptions" to support "write.partitionBy(...)" +spark.conf.set("spark.sql.legacy.sources.write.passPartitionByAsOptions", "true") Review comment: Could you also add a TODO comment here to indicate that we can remove this line after upgrading to Spark 3? I think in the future, Hudi will move to Spark 3. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1389) [UMBRELLA] Survey indexing technique for better query performance
[ https://issues.apache.org/jira/browse/HUDI-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269794#comment-17269794 ] Mihir Shah commented on HUDI-1389: -- [Raymond Xu|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=xushiyan] Hello, Mr. Xu, This project seems very interesting and I would be interested in working on it. I am experienced with C/C++, Java, Python (and ML packages), have worked with SQL, Bash, etc as well as some functional languages like Haskell, and have some experience with data mining. Could you please let me know how I could start on this project? Thank you! > [UMBRELLA] Survey indexing technique for better query performance > - > > Key: HUDI-1389 > URL: https://issues.apache.org/jira/browse/HUDI-1389 > Project: Apache Hudi > Issue Type: Improvement > Components: Index, Performance >Reporter: Raymond Xu >Priority: Major > Labels: gsoc, gsoc2021, mentor > > (More details to be added) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-io edited a comment on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory
codecov-io edited a comment on pull request #2447: URL: https://github.com/apache/hudi/pull/2447#issuecomment-760949326 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2471: [MINOR] Remove InstantGeneratorOperator parallelism limit in HoodieFli…
codecov-io edited a comment on pull request #2471: URL: https://github.com/apache/hudi/pull/2471#issuecomment-765116001 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=h1) Report > Merging [#2471](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=desc) (ef62c24) into [master](https://codecov.io/gh/apache/hudi/commit/976420c49a3fe764f7cecd30b6cdb32861be5537?el=desc) (976420c) will **increase** coverage by `19.20%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2471/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2471 +/- ## = + Coverage 50.27% 69.48% +19.20% + Complexity 3050 358 -2692 = Files 419 53 -366 Lines 18897 1930-16967 Branches 1937 230 -1707 = - Hits 9500 1341 -8159 + Misses 8622 456 -8166 + Partials775 133 -642 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.48% <ø> (+0.05%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=) | | | | | [.../java/org/apache/hudi/hadoop/InputPathHandler.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0lucHV0UGF0aEhhbmRsZXIuamF2YQ==) | | | | | [...udi/common/table/timeline/dto/FSPermissionDTO.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GU1Blcm1pc3Npb25EVE8uamF2YQ==) | | | | | [...rg/apache/hudi/common/util/SerializationUtils.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU2VyaWFsaXphdGlvblV0aWxzLmphdmE=) | | | | | [...adoop/realtime/HoodieHFileRealtimeInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZUhGaWxlUmVhbHRpbWVJbnB1dEZvcm1hdC5qYXZh) | | | | | [...hudi/hadoop/hive/HoodieCombineHiveInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2hpdmUvSG9vZGllQ29tYmluZUhpdmVJbnB1dEZvcm1hdC5qYXZh) | | | | | [...di/timeline/service/handlers/FileSliceHandler.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvRmlsZVNsaWNlSGFuZGxlci5qYXZh) | | | | | [...ain/scala/org/apache/hudi/cli/DedupeSparkJob.scala](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL2NsaS9EZWR1cGVTcGFya0pvYi5zY2FsYQ==) | | | | | [...rg/apache/hudi/metadata/MetadataPartitionType.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvTWV0YWRhdGFQYXJ0aXRpb25UeXBlLmphdmE=) | | | | | [...sioning/clean/CleanMetadataV1MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YVYxTWlncmF0aW9uSGFuZGxlci5qYXZh) | | | | | ... and [355 more](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact
[GitHub] [hudi] codecov-io edited a comment on pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand
codecov-io edited a comment on pull request #2325: URL: https://github.com/apache/hudi/pull/2325#issuecomment-742860619 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2325?src=pr=h1) Report > Merging [#2325](https://codecov.io/gh/apache/hudi/pull/2325?src=pr=desc) (333436e) into [master](https://codecov.io/gh/apache/hudi/commit/749f6578561cbf065c7f74ab51b1c01881a1bd97?el=desc) (749f657) will **increase** coverage by `18.71%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2325/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2325?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2325 +/- ## = + Coverage 50.71% 69.43% +18.71% + Complexity 3060 357 -2703 = Files 419 53 -366 Lines 18796 1930-16866 Branches 1922 230 -1692 = - Hits 9533 1340 -8193 + Misses 8488 456 -8032 + Partials775 134 -641 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.43% <ø> (-0.06%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2325?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.50% <0.00%> (-0.36%)` | `50.00% <0.00%> (-1.00%)` | | | [...di/common/table/log/block/HoodieAvroDataBlock.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVBdnJvRGF0YUJsb2NrLmphdmE=) | | | | | [...hadoop/LocatedFileStatusWithBootstrapBaseFile.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0xvY2F0ZWRGaWxlU3RhdHVzV2l0aEJvb3RzdHJhcEJhc2VGaWxlLmphdmE=) | | | | | [...rg/apache/hudi/cli/commands/SavepointsCommand.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1NhdmVwb2ludHNDb21tYW5kLmphdmE=) | | | | | [...til/jvm/HotSpotMemoryLayoutSpecification64bit.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvanZtL0hvdFNwb3RNZW1vcnlMYXlvdXRTcGVjaWZpY2F0aW9uNjRiaXQuamF2YQ==) | | | | | [...rg/apache/hudi/metadata/MetadataPartitionType.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvTWV0YWRhdGFQYXJ0aXRpb25UeXBlLmphdmE=) | | | | | [...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh) | | | | | [...e/hudi/metadata/FileSystemBackedTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvRmlsZVN5c3RlbUJhY2tlZFRhYmxlTWV0YWRhdGEuamF2YQ==) | | | | | [...hudi/common/model/HoodieReplaceCommitMetadata.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlcGxhY2VDb21taXRNZXRhZGF0YS5qYXZh) | | | | | [...che/hudi/common/table/timeline/dto/InstantDTO.java](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9JbnN0YW50RFRPLmphdmE=) | | | | | ... and [352 more](https://codecov.io/gh/apache/hudi/pull/2325/diff?src=pr=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this
[GitHub] [hudi] codecov-io edited a comment on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory
codecov-io edited a comment on pull request #2447: URL: https://github.com/apache/hudi/pull/2447#issuecomment-760949326 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=h1) Report > Merging [#2447](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=desc) (b5aad46) into [master](https://codecov.io/gh/apache/hudi/commit/a38612b10f6ae04644519270f9b5eb631a77c148?el=desc) (a38612b) will **increase** coverage by `18.73%`. > The diff coverage is `100.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2447/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2447 +/- ## = + Coverage 50.69% 69.43% +18.73% + Complexity 3059 357 -2702 = Files 419 53 -366 Lines 18810 1930-16880 Branches 1924 230 -1694 = - Hits 9535 1340 -8195 + Misses 8498 456 -8042 + Partials777 134 -643 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.43% <100.00%> (-0.06%)` | `0.00 <1.00> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `88.79% <100.00%> (ø)` | `28.00 <1.00> (ø)` | | | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.50% <0.00%> (-0.36%)` | `50.00% <0.00%> (-1.00%)` | | | [...a/org/apache/hudi/common/util/Base64CodecUtil.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQmFzZTY0Q29kZWNVdGlsLmphdmE=) | | | | | [...rg/apache/hudi/metadata/MetadataPartitionType.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvTWV0YWRhdGFQYXJ0aXRpb25UeXBlLmphdmE=) | | | | | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | | | | | [...ommon/util/queue/BoundedInMemoryQueueConsumer.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvQm91bmRlZEluTWVtb3J5UXVldWVDb25zdW1lci5qYXZh) | | | | | [...java/org/apache/hudi/common/util/NumericUtils.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvTnVtZXJpY1V0aWxzLmphdmE=) | | | | | [.../common/table/timeline/HoodieArchivedTimeline.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFyY2hpdmVkVGltZWxpbmUuamF2YQ==) | | | | | [.../apache/hudi/exception/TableNotFoundException.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1RhYmxlTm90Rm91bmRFeGNlcHRpb24uamF2YQ==) | | | | | [...udi/common/util/queue/BoundedInMemoryExecutor.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvQm91bmRlZEluTWVtb3J5RXhlY3V0b3IuamF2YQ==) | | | | | ... and [336 more](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For
[GitHub] [hudi] codecov-io commented on pull request #2471: [MINOR] Remove InstantGeneratorOperator parallelism limit in HoodieFli…
codecov-io commented on pull request #2471: URL: https://github.com/apache/hudi/pull/2471#issuecomment-765116001 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=h1) Report > Merging [#2471](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=desc) (ef62c24) into [master](https://codecov.io/gh/apache/hudi/commit/976420c49a3fe764f7cecd30b6cdb32861be5537?el=desc) (976420c) will **decrease** coverage by `40.58%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2471/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #2471 +/- ## - Coverage 50.27% 9.68% -40.59% + Complexity 3050 48 -3002 Files 419 53 -366 Lines 188971930-16967 Branches 1937 230 -1707 - Hits 9500 187 -9313 + Misses 86221730 -6892 + Partials775 13 -762 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.68% <ø> (-59.75%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2471?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2471/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%>
[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r562375198 ## File path: hudi-flink/src/test/resources/test_source.data ## @@ -0,0 +1,8 @@ +{"uuid": "id1", "name": "Danny", "age": 23, "ts": "1970-01-01T00:00:01", "partition": "par1"} Review comment: No, i'm old This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r562371379 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/StreamWriteFunction.java ## @@ -0,0 +1,342 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator; + +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.client.FlinkTaskContextSupplier; +import org.apache.hudi.client.HoodieFlinkWriteClient; +import org.apache.hudi.client.WriteStatus; +import org.apache.hudi.client.common.HoodieFlinkEngineContext; +import org.apache.hudi.common.config.SerializableConfiguration; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.model.WriteOperationType; +import org.apache.hudi.common.util.ObjectSizeCalculator; +import org.apache.hudi.keygen.KeyGenerator; +import org.apache.hudi.operator.event.BatchWriteSuccessEvent; +import org.apache.hudi.util.StreamerUtil; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.formats.avro.RowDataToAvroConverters; +import org.apache.flink.runtime.operators.coordination.OperatorEventGateway; +import org.apache.flink.runtime.state.FunctionInitializationContext; +import org.apache.flink.runtime.state.FunctionSnapshotContext; +import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction; +import org.apache.flink.streaming.api.functions.KeyedProcessFunction; +import org.apache.flink.table.types.logical.RowType; +import org.apache.flink.util.Collector; +import org.apache.flink.util.Preconditions; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Random; +import java.util.concurrent.locks.Condition; +import java.util.concurrent.locks.ReentrantLock; +import java.util.function.BiFunction; + +/** + * Sink function to write the data to the underneath filesystem. + * + * Work Flow + * + * The function firstly buffers the data as a batch of {@link HoodieRecord}s, + * It flushes(write) the records batch when a Flink checkpoint starts. After a batch has been written successfully, + * the function notifies its operator coordinator {@link StreamWriteOperatorCoordinator} to mark a successful write. + * + * Exactly-once Semantics + * + * The task implements exactly-once semantics by buffering the data between checkpoints. The operator coordinator + * starts a new instant on the time line when a checkpoint triggers, the coordinator checkpoints always + * start before its operator, so when this function starts a checkpoint, a REQUESTED instant already exists. + * The function process thread then block data buffering and the checkpoint thread starts flushing the existing data buffer. + * When the existing data buffer write successfully, the process thread unblock and start buffering again for the next round checkpoint. + * Because any checkpoint failures would trigger the write rollback, it implements the exactly-once semantics. + * + * Fault Tolerance + * + * The operator coordinator checks the validity for the last instant when it starts a new one. The operator rolls back + * the written data and throws when any error occurs. This means any checkpoint or task failure would trigger a failover. + * The operator coordinator would try several times when committing the writestatus. + * + * Note: The function task requires the input stream be partitioned by the partition fields to avoid different write tasks + * write to the same file group that conflict. The general case for partition path is a datetime field, + * so the sink task is very possible to have IO bottleneck, the more flexible solution is to shuffle the + * data by the file group IDs. + * + * @param Type of the input record + * @see StreamWriteOperatorCoordinator + */ +public class StreamWriteFunction +extends KeyedProcessFunction +implements CheckpointedFunction { + +
[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r562371014 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator; + +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.streamer.FlinkStreamerConfig; +import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload; +import org.apache.hudi.keygen.SimpleAvroKeyGenerator; +import org.apache.hudi.keygen.constant.KeyGeneratorOptions; +import org.apache.hudi.util.StreamerUtil; + +import org.apache.flink.configuration.ConfigOption; +import org.apache.flink.configuration.ConfigOptions; +import org.apache.flink.configuration.Configuration; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Hoodie Flink config options. + * + * It has the options for Hoodie table read and write. It also defines some utilities. + */ +public class FlinkOptions { + private FlinkOptions() { + } + + // + // Base Options + // + public static final ConfigOption PATH = ConfigOptions + .key("path") + .stringType() + .noDefaultValue() + .withDescription("Base path for the target hoodie table." + + "\nThe path would be created if it does not exist,\n" + + "otherwise a Hoodie table expects to be initialized successfully"); + + public static final ConfigOption PROPS_FILE_PATH = ConfigOptions + .key("properties-file.path") + .stringType() + .noDefaultValue() + .withDescription("Path to properties file on local-fs or dfs, with configurations for \n" + + "hoodie client, schema provider, key generator and data source. For hoodie client props, sane defaults are\n" + + "used, but recommend use to provide basic things like metrics endpoints, hive configs etc. For sources, refer\n" + + "to individual classes, for supported properties"); + + // + // Read Options + // + public static final ConfigOption READ_SCHEMA_FILE_PATH = ConfigOptions + .key("read.schema.file.path") + .stringType() + .noDefaultValue() + .withDescription("Avro schema file path, the parsed schema is used for deserializing"); + + // + // Write Options + // + public static final ConfigOption TABLE_NAME = ConfigOptions + .key(HoodieWriteConfig.TABLE_NAME) + .stringType() + .noDefaultValue() + .withDescription("Table name to register to Hive metastore"); + + public static final ConfigOption TABLE_TYPE = ConfigOptions + .key("write.table.type") + .stringType() + .defaultValue("COPY_ON_WRITE") + .withDescription("Type of table to write. COPY_ON_WRITE (or) MERGE_ON_READ"); + + public static final ConfigOption OPERATION = ConfigOptions + .key("write.operation") + .stringType() + .defaultValue("upsert") + .withDescription("The write operation, that this write should do"); + + public static final ConfigOption PRECOMBINE_FIELD = ConfigOptions + .key("write.precombine.field") Review comment: We can, but i would suggest to do this is a separate issue, we need to define some rules for config option name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r562367018 ## File path: hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamerV2.java ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.streamer; + +import org.apache.hudi.operator.FlinkOptions; +import org.apache.hudi.operator.StreamWriteOperatorFactory; +import org.apache.hudi.util.StreamerUtil; + +import com.beust.jcommander.JCommander; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.formats.avro.typeutils.AvroSchemaConverter; +import org.apache.flink.formats.json.JsonRowDataDeserializationSchema; +import org.apache.flink.formats.json.TimestampFormat; +import org.apache.flink.runtime.state.filesystem.FsStateBackend; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.runtime.typeutils.InternalTypeInfo; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.RowType; + +import java.util.Properties; + +/** + * An Utility which can incrementally consume data from Kafka and apply it to the target table. + * currently, it only support COW table and insert, upsert operation. + */ +public class HoodieFlinkStreamerV2 { + public static void main(String[] args) throws Exception { +StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + +final FlinkStreamerConfig cfg = new FlinkStreamerConfig(); +JCommander cmd = new JCommander(cfg, null, args); +if (cfg.help || args.length == 0) { + cmd.usage(); + System.exit(1); +} +env.enableCheckpointing(cfg.checkpointInterval); +env.getConfig().setGlobalJobParameters(cfg); +// We use checkpoint to trigger write operation, including instant generating and committing, +// There can only be one checkpoint at one time. +env.getCheckpointConfig().setMaxConcurrentCheckpoints(1); +env.disableOperatorChaining(); Review comment: Not necessary, removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r562366909 ## File path: hudi-flink/src/main/java/org/apache/hudi/streamer/FlinkStreamerConfig.java ## @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.streamer; + +import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload; +import org.apache.hudi.common.model.WriteOperationType; +import org.apache.hudi.keygen.SimpleAvroKeyGenerator; + +import com.beust.jcommander.Parameter; +import org.apache.flink.configuration.Configuration; + +import java.util.ArrayList; +import java.util.List; + +/** + * Configurations for Hoodie Flink streamer. + */ +public class FlinkStreamerConfig extends Configuration { + @Parameter(names = {"--kafka-topic"}, description = "Kafka topic name.", required = true) + public String kafkaTopic; + + @Parameter(names = {"--kafka-group-id"}, description = "Kafka consumer group id.", required = true) + public String kafkaGroupId; + + @Parameter(names = {"--kafka-bootstrap-servers"}, description = "Kafka bootstrap.servers.", required = true) + public String kafkaBootstrapServers; + + @Parameter(names = {"--flink-checkpoint-path"}, description = "Flink checkpoint path.") + public String flinkCheckPointPath; + + @Parameter(names = {"--flink-block-retry-times"}, description = "Times to retry when latest instant has not completed.") Review comment: Copied from the old code, how about `instant-retry-times` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r562366032 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/event/BatchWriteSuccessEvent.java ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator.event; + +import org.apache.flink.runtime.operators.coordination.OperatorEvent; + +import org.apache.hudi.client.WriteStatus; + +import java.util.List; + +/** + * An operator even to mark successful checkpoint batch write. + */ +public class BatchWriteSuccessEvent implements OperatorEvent { + private static final long serialVersionUID = 1L; + + private final List writeStatuses; + private final int taskID; + private final String instantTime; + + /** + * Creates an event. + * + * @param taskIDThe task ID + * @param instantTime The instant time under which to write the data + * @param writeStatuses The write statues list + */ + public BatchWriteSuccessEvent( Review comment: Removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r562365771 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/StreamWriteFunction.java ## @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator; + +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.client.FlinkTaskContextSupplier; +import org.apache.hudi.client.HoodieFlinkWriteClient; +import org.apache.hudi.client.WriteStatus; +import org.apache.hudi.client.common.HoodieFlinkEngineContext; +import org.apache.hudi.common.config.SerializableConfiguration; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.model.WriteOperationType; +import org.apache.hudi.common.util.ObjectSizeCalculator; +import org.apache.hudi.keygen.KeyGenerator; +import org.apache.hudi.operator.event.BatchWriteSuccessEvent; +import org.apache.hudi.util.StreamerUtil; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.formats.avro.RowDataToAvroConverters; +import org.apache.flink.runtime.operators.coordination.OperatorEventGateway; +import org.apache.flink.runtime.state.FunctionInitializationContext; +import org.apache.flink.runtime.state.FunctionSnapshotContext; +import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction; +import org.apache.flink.streaming.api.functions.KeyedProcessFunction; +import org.apache.flink.table.types.logical.RowType; +import org.apache.flink.util.Collector; +import org.apache.flink.util.Preconditions; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Random; +import java.util.concurrent.locks.Condition; +import java.util.concurrent.locks.ReentrantLock; +import java.util.function.BiFunction; + +/** + * Sink function to write the data to the underneath filesystem. + * + * Work Flow + * + * The function firstly buffers the data as a batch of {@link HoodieRecord}s, + * It flushes(write) the records batch when a Flink checkpoint starts. After a batch has been written successfully, + * the function notifies its operator coordinator {@link StreamWriteOperatorCoordinator} to mark a successful write. + * + * Exactly-once Semantics + * + * The task implements exactly-once semantics by buffering the data between checkpoints. The operator coordinator + * starts a new instant on the time line when a checkpoint triggers, the coordinator checkpoints always + * start before its operator, so when this function starts a checkpoint, a REQUESTED instant already exists. + * The function process thread then block data buffering and the checkpoint thread starts flushing the existing data buffer. + * When the existing data buffer write successfully, the process thread unblock and start buffering again for the next round checkpoint. + * Because any checkpoint failures would trigger the write rollback, it implements the exactly-once semantics. + * + * Fault Tolerance + * + * The operator coordinator checks the validity for the last instant when it starts a new one. The operator rolls back + * the written data and throws when any error occurs. This means any checkpoint or task failure would trigger a failover. + * The operator coordinator would try several times when committing the writestatus. + * + * Note: The function task requires the input stream be partitioned by the partition fields to avoid different write tasks + * write to the same file group that conflict. The general case for partition path is a datetime field, + * so the sink task is very possible to have IO bottleneck, the more flexible solution is to shuffle the + * data by the file group IDs. + * + * @param Type of the input record + * @see StreamWriteOperatorCoordinator + */ +public class StreamWriteFunction extends KeyedProcessFunction implements CheckpointedFunction { + + private static
[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r562365771 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/StreamWriteFunction.java ## @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator; + +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.client.FlinkTaskContextSupplier; +import org.apache.hudi.client.HoodieFlinkWriteClient; +import org.apache.hudi.client.WriteStatus; +import org.apache.hudi.client.common.HoodieFlinkEngineContext; +import org.apache.hudi.common.config.SerializableConfiguration; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.model.WriteOperationType; +import org.apache.hudi.common.util.ObjectSizeCalculator; +import org.apache.hudi.keygen.KeyGenerator; +import org.apache.hudi.operator.event.BatchWriteSuccessEvent; +import org.apache.hudi.util.StreamerUtil; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.formats.avro.RowDataToAvroConverters; +import org.apache.flink.runtime.operators.coordination.OperatorEventGateway; +import org.apache.flink.runtime.state.FunctionInitializationContext; +import org.apache.flink.runtime.state.FunctionSnapshotContext; +import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction; +import org.apache.flink.streaming.api.functions.KeyedProcessFunction; +import org.apache.flink.table.types.logical.RowType; +import org.apache.flink.util.Collector; +import org.apache.flink.util.Preconditions; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Random; +import java.util.concurrent.locks.Condition; +import java.util.concurrent.locks.ReentrantLock; +import java.util.function.BiFunction; + +/** + * Sink function to write the data to the underneath filesystem. + * + * Work Flow + * + * The function firstly buffers the data as a batch of {@link HoodieRecord}s, + * It flushes(write) the records batch when a Flink checkpoint starts. After a batch has been written successfully, + * the function notifies its operator coordinator {@link StreamWriteOperatorCoordinator} to mark a successful write. + * + * Exactly-once Semantics + * + * The task implements exactly-once semantics by buffering the data between checkpoints. The operator coordinator + * starts a new instant on the time line when a checkpoint triggers, the coordinator checkpoints always + * start before its operator, so when this function starts a checkpoint, a REQUESTED instant already exists. + * The function process thread then block data buffering and the checkpoint thread starts flushing the existing data buffer. + * When the existing data buffer write successfully, the process thread unblock and start buffering again for the next round checkpoint. + * Because any checkpoint failures would trigger the write rollback, it implements the exactly-once semantics. + * + * Fault Tolerance + * + * The operator coordinator checks the validity for the last instant when it starts a new one. The operator rolls back + * the written data and throws when any error occurs. This means any checkpoint or task failure would trigger a failover. + * The operator coordinator would try several times when committing the writestatus. + * + * Note: The function task requires the input stream be partitioned by the partition fields to avoid different write tasks + * write to the same file group that conflict. The general case for partition path is a datetime field, + * so the sink task is very possible to have IO bottleneck, the more flexible solution is to shuffle the + * data by the file group IDs. + * + * @param Type of the input record + * @see StreamWriteOperatorCoordinator + */ +public class StreamWriteFunction extends KeyedProcessFunction implements CheckpointedFunction { + + private static
[GitHub] [hudi] codecov-io edited a comment on pull request #2426: [HUDI-304] Configure spotless and java style
codecov-io edited a comment on pull request #2426: URL: https://github.com/apache/hudi/pull/2426#issuecomment-757443274 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2426?src=pr=h1) Report > Merging [#2426](https://codecov.io/gh/apache/hudi/pull/2426?src=pr=desc) (9cb1268) into [master](https://codecov.io/gh/apache/hudi/commit/e926c1a45ca95fa1911f6f88a0577554f2797760?el=desc) (e926c1a) will **decrease** coverage by `0.20%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2426/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2426?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2426 +/- ## - Coverage 50.73% 50.52% -0.21% + Complexity 3064 3032 -32 Files 419 417 -2 Lines 1879718725 -72 Branches 1922 1917 -5 - Hits 9536 9461 -75 - Misses 8485 8489 +4 + Partials776 775 -1 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.28% <ø> (+0.01%)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.61% <ø> (-0.41%)` | `0.00 <ø> (ø)` | | | hudiflink | `10.20% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.00% <ø> (-0.06%)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `66.07% <ø> (+0.16%)` | `0.00 <ø> (ø)` | | | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `66.84% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.41% <ø> (-0.02%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2426?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/common/engine/TaskContextSupplier.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9UYXNrQ29udGV4dFN1cHBsaWVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...e/hudi/metadata/FileSystemBackedTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvRmlsZVN5c3RlbUJhY2tlZFRhYmxlTWV0YWRhdGEuamF2YQ==) | `0.00% <0.00%> (-94.60%)` | `0.00% <0.00%> (-13.00%)` | | | [...apache/hudi/common/engine/HoodieEngineContext.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Ib29kaWVFbmdpbmVDb250ZXh0LmphdmE=) | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-1.00%)` | | | [...g/apache/hudi/common/function/FunctionWrapper.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Z1bmN0aW9uL0Z1bmN0aW9uV3JhcHBlci5qYXZh) | `0.00% <0.00%> (-10.53%)` | `0.00% <0.00%> (-2.00%)` | | | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | | | [...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==) | `45.71% <0.00%> (-6.18%)` | `55.00% <0.00%> (-6.00%)` | | | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `68.96% <0.00%> (-1.73%)` | `43.00% <0.00%> (-2.00%)` | | | [...i/common/table/timeline/TimelineMetadataUtils.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTWV0YWRhdGFVdGlscy5qYXZh) | `71.69% <0.00%> (-1.03%)` | `17.00% <0.00%> (ø%)` | | | [...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/hudi/pull/2426/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=) | `63.15% <0.00%> (-0.48%)` | `12.00% <0.00%> (ø%)` | | |
[jira] [Closed] (HUDI-1332) Introduce FlinkHoodieBloomIndex to hudi-flink-client
[ https://issues.apache.org/jira/browse/HUDI-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li closed HUDI-1332. - > Introduce FlinkHoodieBloomIndex to hudi-flink-client > > > Key: HUDI-1332 > URL: https://issues.apache.org/jira/browse/HUDI-1332 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: wangxianghu >Assignee: Xiang Yang >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > a flink implementation for bloom index -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1332) Introduce FlinkHoodieBloomIndex to hudi-flink-client
[ https://issues.apache.org/jira/browse/HUDI-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Li resolved HUDI-1332. --- Resolution: Fixed > Introduce FlinkHoodieBloomIndex to hudi-flink-client > > > Key: HUDI-1332 > URL: https://issues.apache.org/jira/browse/HUDI-1332 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: wangxianghu >Assignee: Xiang Yang >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > a flink implementation for bloom index -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated (b64d22e -> 641abe8)
This is an automated email from the ASF dual-hosted git repository. garyli pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from b64d22e [HUDI-1511] InstantGenerateOperator support multiple parallelism (#2434) add 641abe8 [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client (#2375) No new revisions were added by this update. Summary of changes: .../org/apache/hudi/index/FlinkHoodieIndex.java| 3 + .../hudi/index/bloom/FlinkHoodieBloomIndex.java| 267 + .../bloom/HoodieFlinkBloomIndexCheckFunction.java} | 31 ++- .../src/main}/resources/log4j-surefire.properties | 0 .../index/bloom/TestFlinkHoodieBloomIndex.java}| 203 .../testutils/HoodieFlinkClientTestHarness.java| 94 .../testutils/HoodieFlinkWriteableTestTable.java | 136 +++ 7 files changed, 626 insertions(+), 108 deletions(-) create mode 100644 hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/bloom/FlinkHoodieBloomIndex.java copy hudi-client/{hudi-spark-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndexCheckFunction.java => hudi-flink-client/src/main/java/org/apache/hudi/index/bloom/HoodieFlinkBloomIndexCheckFunction.java} (77%) copy {hudi-spark-datasource/hudi-spark/src/test => hudi-client/hudi-flink-client/src/main}/resources/log4j-surefire.properties (100%) copy hudi-client/{hudi-spark-client/src/test/java/org/apache/hudi/index/bloom/TestHoodieBloomIndex.java => hudi-flink-client/src/test/java/org/apache/hudi/index/bloom/TestFlinkHoodieBloomIndex.java} (72%) create mode 100644 hudi-client/hudi-flink-client/src/test/java/org/apache/hudi/testutils/HoodieFlinkWriteableTestTable.java
[GitHub] [hudi] garyli1019 merged pull request #2375: [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client
garyli1019 merged pull request #2375: URL: https://github.com/apache/hudi/pull/2375 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on pull request #2471: [MINOR]Remove InstantGeneratorOperator parallelism limit in HoodieFli…
wangxianghu commented on pull request #2471: URL: https://github.com/apache/hudi/pull/2471#issuecomment-765067817 @yanghua @loukey-lj please take a look when free This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu opened a new pull request #2471: [MINOR]Remove InstantGeneratorOperator parallelism limit in HoodieFli…
wangxianghu opened a new pull request #2471: URL: https://github.com/apache/hudi/pull/2471 …nkStreamer and update docs ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on a change in pull request #2375: [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client
wangxianghu commented on a change in pull request #2375: URL: https://github.com/apache/hudi/pull/2375#discussion_r562308054 ## File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/bloom/HoodieFlinkBloomIndexCheckFunction.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.index.bloom; + +import org.apache.hudi.client.utils.LazyIterableIterator; +import org.apache.hudi.common.model.HoodieKey; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.exception.HoodieIndexException; +import org.apache.hudi.io.HoodieKeyLookupHandle; +import org.apache.hudi.io.HoodieKeyLookupHandle.KeyLookupResult; +import org.apache.hudi.table.HoodieTable; + +import java.util.function.Function; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + +import scala.Tuple2; + +/** + * Function performing actual checking of List partition containing (fileId, hoodieKeys) against the actual files. + */ +//TODO we can move this class into the hudi-client-common and reuse it for spark client +public class HoodieFlinkBloomIndexCheckFunction +implements Function>, Iterator>> { + + private final HoodieTable hoodieTable; + + private final HoodieWriteConfig config; + + public HoodieFlinkBloomIndexCheckFunction(HoodieTable hoodieTable, HoodieWriteConfig config) { +this.hoodieTable = hoodieTable; +this.config = config; + } + + @Override + public Iterator> apply(Iterator> fileParitionRecordKeyTripletItr) { +return new LazyKeyCheckIterator(fileParitionRecordKeyTripletItr); + } + + @Override + public Function>> compose(Function>> before) { +return null; + } + + @Override + public Function>, V> andThen(Function>, ? extends V> after) { +return null; + } + + class LazyKeyCheckIterator extends LazyIterableIterator, List> { Review comment: > maybe we can move HoodieFlinkBloomIndexCheckFunction into the hudi-client-common later then spark can reuse it. yes, could be annother pr This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-1511) InstantGenerateOperator support multiple parallelism
[ https://issues.apache.org/jira/browse/HUDI-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1511. -- Resolution: Implemented Fixed via master branch: b64d22e0478b967c83be22509beaaf8c5f114e19 > InstantGenerateOperator support multiple parallelism > > > Key: HUDI-1511 > URL: https://issues.apache.org/jira/browse/HUDI-1511 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: wangxianghu >Assignee: loukey_j >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated: [HUDI-1511] InstantGenerateOperator support multiple parallelism (#2434)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b64d22e [HUDI-1511] InstantGenerateOperator support multiple parallelism (#2434) b64d22e is described below commit b64d22e0478b967c83be22509beaaf8c5f114e19 Author: luokey <854194...@qq.com> AuthorDate: Fri Jan 22 09:17:50 2021 +0800 [HUDI-1511] InstantGenerateOperator support multiple parallelism (#2434) --- .../hudi/operator/InstantGenerateOperator.java | 170 +++-- 1 file changed, 123 insertions(+), 47 deletions(-) diff --git a/hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java b/hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java index 4e32ec7..7879243 100644 --- a/hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java +++ b/hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java @@ -38,10 +38,13 @@ import org.apache.flink.runtime.state.StateInitializationContext; import org.apache.flink.runtime.state.StateSnapshotContext; import org.apache.flink.streaming.api.operators.AbstractStreamOperator; import org.apache.flink.streaming.api.operators.OneInputStreamOperator; +import org.apache.flink.streaming.api.operators.StreamingRuntimeContext; import org.apache.flink.streaming.runtime.streamrecord.StreamRecord; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.PathFilter; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -49,9 +52,9 @@ import org.slf4j.LoggerFactory; import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; -import java.util.LinkedList; import java.util.List; import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicLong; /** * Operator helps to generate globally unique instant, it must be executed in one parallelism. Before generate a new @@ -71,16 +74,20 @@ public class InstantGenerateOperator extends AbstractStreamOperator latestInstantList = new ArrayList<>(1); private transient ListState latestInstantState; - private List bufferedRecords = new LinkedList(); - private transient ListState recordsState; private Integer retryTimes; private Integer retryInterval; + private static final String DELIMITER = "_"; + private static final String INSTANT_MARKER_FOLDER_NAME = ".instant_marker"; + private transient boolean isMain = false; + private transient AtomicLong recordCounter = new AtomicLong(0); + private StreamingRuntimeContext runtimeContext; + private int indexOfThisSubtask; @Override public void processElement(StreamRecord streamRecord) throws Exception { if (streamRecord.getValue() != null) { - bufferedRecords.add(streamRecord); output.collect(streamRecord); + recordCounter.incrementAndGet(); } } @@ -88,7 +95,7 @@ public class InstantGenerateOperator extends AbstractStreamOperator latestInstantStateDescriptor = new ListStateDescriptor("latestInstant", String.class); -latestInstantState = context.getOperatorStateStore().getListState(latestInstantStateDescriptor); - -// recordState -ListStateDescriptor recordsStateDescriptor = new ListStateDescriptor("recordsState", StreamRecord.class); -recordsState = context.getOperatorStateStore().getListState(recordsStateDescriptor); - -if (context.isRestored()) { - Iterator latestInstantIterator = latestInstantState.get().iterator(); - latestInstantIterator.forEachRemaining(x -> latestInstant = x); - LOG.info("InstantGenerateOperator initializeState get latestInstant [{}]", latestInstant); - - Iterator recordIterator = recordsState.get().iterator(); - bufferedRecords.clear(); - recordIterator.forEachRemaining(x -> bufferedRecords.add(x)); +runtimeContext = getRuntimeContext(); +indexOfThisSubtask = runtimeContext.getIndexOfThisSubtask(); +isMain = indexOfThisSubtask == 0; + +if (isMain) { + // instantState + ListStateDescriptor latestInstantStateDescriptor = new ListStateDescriptor<>("latestInstant", String.class); + latestInstantState = context.getOperatorStateStore().getListState(latestInstantStateDescriptor); + + if (context.isRestored()) { +Iterator latestInstantIterator = latestInstantState.get().iterator(); +latestInstantIterator.forEachRemaining(x -> latestInstant = x); +LOG.info("Restoring the latest instant [{}] from the state", latestInstant); + } } } @Override public void snapshotState(StateSnapshotContext functionSnapshotContext) throws Exception { -if (latestInstantList.isEmpty()) { - latestInstantList.add(latestInstant); +long
[GitHub] [hudi] yanghua merged pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism
yanghua merged pull request #2434: URL: https://github.com/apache/hudi/pull/2434 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism
yanghua commented on pull request #2434: URL: https://github.com/apache/hudi/pull/2434#issuecomment-765047901 > @yanghua should be good to land now if you are happy with it ack, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory
leesf commented on pull request #2447: URL: https://github.com/apache/hudi/pull/2447#issuecomment-765010589 > @leesf Hello, I have a doubt now. I did not modify the code of `hudi-ineg-test`, but every time the check fails because of it, do you know why? would you please rebase to master since there is a flaky test fix against the master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #2470: [SUPPORT] Heavy skew in ListingBasedRollbackHelper
vinothchandar commented on issue #2470: URL: https://github.com/apache/hudi/issues/2470#issuecomment-764870104 @jtmzheng can you just give the marker based rollbacks a shot? We intend to make it the default in the next release. If the issue is from listing, then it would help out a lot. `hoodie.rollback.using.markers=true` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #2470: [SUPPORT] Heavy skew in ListingBasedRollbackHelper
vinothchandar commented on issue #2470: URL: https://github.com/apache/hudi/issues/2470#issuecomment-764870271 we parallelize by partitions, so it must be fast. not sure where the skew comes from. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism
vinothchandar commented on pull request #2434: URL: https://github.com/apache/hudi/pull/2434#issuecomment-764820382 @yanghua should be good to land now if you are happy with it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jtmzheng opened a new issue #2470: [SUPPORT] Heavy skew in ListingBasedRollbackHelper
jtmzheng opened a new issue #2470: URL: https://github.com/apache/hudi/issues/2470 **Describe the problem you faced** We're seeing heavy skew in our Spark Streaming job when processing rollbacks (`mapToPair at ListingBasedRollbackHelper.java:100`). The example below took 2.2h to complete with a p75 of 15 minutes and a max of 2.2h (100 tasks, long tail of 8 tasks that took > 1 hour) https://user-images.githubusercontent.com/3466206/105383583-4184f800-5bdf-11eb-8368-c345d452a6eb.png;> What can cause this skew and what can we do to alleviate/investigate this? We have Hudi configured with: ``` hudi_options = { "hoodie.table.name": "transactions", "hoodie.datasource.write.recordkey.field": "id.value", "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator", "hoodie.datasource.write.partitionpath.field": "year,month,day", "hoodie.datasource.write.table.name": "transactions", "hoodie.datasource.write.table.type": "MERGE_ON_READ", "hoodie.datasource.write.operation": "upsert", "hoodie.consistency.check.enabled": "true", "hoodie.datasource.write.precombine.field": "publishedAtUnixNano", "hoodie.compact.inline": True, "hoodie.compact.inline.max.delta.commits": 10, "hoodie.cleaner.commits.retained": 1, } ``` **Environment Description** * Hudi version : 0.6.0 * Spark version : 2.4.6 (EMR 5.31) * Hive version : 2.3.7 * Hadoop version : Amazon 2.10.0 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r561953785 ## File path: hudi-flink/src/test/resources/test_read_schema.avsc ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +{ + "type" : "record", + "name" : "record", + "fields" : [ { Review comment: Can we follow the same style about the test schema, align with the existed file, e.g. `HoodieCleanerPlan.avsc` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-1512] Fix spark 2 unit tests failure with Spark 3 (#2412)
This is an automated email from the ASF dual-hosted git repository. garyli pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 976420c [HUDI-1512] Fix spark 2 unit tests failure with Spark 3 (#2412) 976420c is described below commit 976420c49a3fe764f7cecd30b6cdb32861be5537 Author: wenningd AuthorDate: Thu Jan 21 07:04:28 2021 -0800 [HUDI-1512] Fix spark 2 unit tests failure with Spark 3 (#2412) * [HUDI-1512] Fix spark 2 unit tests failure with Spark 3 * resolve comments Co-authored-by: Wenning Ding --- hudi-spark-datasource/hudi-spark2/pom.xml | 9 - pom.xml | 9 ++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark2/pom.xml b/hudi-spark-datasource/hudi-spark2/pom.xml index 8668984..1f6f792 100644 --- a/hudi-spark-datasource/hudi-spark2/pom.xml +++ b/hudi-spark-datasource/hudi-spark2/pom.xml @@ -125,6 +125,13 @@ +org.apache.maven.plugins +maven-surefire-plugin + + ${skip.hudi-spark2.unit.tests} + + + org.apache.rat apache-rat-plugin @@ -144,7 +151,7 @@ org.scala-lang scala-library - ${scala.version} + ${scala11.version} diff --git a/pom.xml b/pom.xml index 6a39cef..4f8b153 100644 --- a/pom.xml +++ b/pom.xml @@ -105,14 +105,15 @@ 4.1.1 0.8.0 4.4.1 -2.4.4 +${spark2.version} 1.12.0 2.4.4 3.0.0 1.8.2 -2.11.12 -2.11 +2.11.12 2.12.10 +${scala11.version} +2.11 0.12 3.3.1 3.0.1 @@ -140,6 +141,7 @@ compile org.apache.hudi. true +false @@ -1361,6 +1363,7 @@ ${fasterxml.spark3.version} ${fasterxml.spark3.version} ${fasterxml.spark3.version} +true
[GitHub] [hudi] garyli1019 merged pull request #2412: [HUDI-1512] Fix spark 2 unit tests failure with Spark 3
garyli1019 merged pull request #2412: URL: https://github.com/apache/hudi/pull/2412 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 commented on pull request #2412: [HUDI-1512] Fix spark 2 unit tests failure with Spark 3
garyli1019 commented on pull request #2412: URL: https://github.com/apache/hudi/pull/2412#issuecomment-764705432 should be ok to merge since we cut the release This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r561943377 ## File path: hudi-flink/src/test/resources/test_source.data ## @@ -0,0 +1,8 @@ +{"uuid": "id1", "name": "Danny", "age": 23, "ts": "1970-01-01T00:00:01", "partition": "par1"} Review comment: Are you really so young? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2459: [MINOR] Improve code readability,remove the continue keyword
codecov-io edited a comment on pull request #2459: URL: https://github.com/apache/hudi/pull/2459#issuecomment-763007666 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2459?src=pr=h1) Report > Merging [#2459](https://codecov.io/gh/apache/hudi/pull/2459?src=pr=desc) (4429a75) into [master](https://codecov.io/gh/apache/hudi/commit/81ccb0c71ad2c17c5613698d4fb50f3b49b21fb4?el=desc) (81ccb0c) will **increase** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2459/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2459?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2459 +/- ## + Coverage 50.27% 50.28% +0.01% - Complexity 3050 3051 +1 Files 419 419 Lines 1889718897 Branches 1937 1937 + Hits 9500 9503 +3 + Misses 8622 8619 -3 Partials775 775 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.52% <ø> (+0.03%)` | `0.00 <ø> (ø)` | | | hudiflink | `0.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `65.85% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2459?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2459/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `89.65% <0.00%> (+10.34%)` | `16.00% <0.00%> (+1.00%)` | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2459: [MINOR] Improve code readability,remove the continue keyword
codecov-io edited a comment on pull request #2459: URL: https://github.com/apache/hudi/pull/2459#issuecomment-763007666 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2463: [SUPPORT] Tuning Hudi Upsert Job
rubenssoto commented on issue #2463: URL: https://github.com/apache/hudi/issues/2463#issuecomment-764683609 Hi @bvaradar , Could you helpme with this ? :) thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto closed issue #2464: [SUPPORT] ExecutorLostFailure - Try Processing 1TB Of Data
rubenssoto closed issue #2464: URL: https://github.com/apache/hudi/issues/2464 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rubenssoto commented on issue #2464: [SUPPORT] ExecutorLostFailure - Try Processing 1TB Of Data
rubenssoto commented on issue #2464: URL: https://github.com/apache/hudi/issues/2464#issuecomment-764682521 Thank you so much for your help @bvaradar , it worked. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2388: [HUDI-1353] add incremental timeline support for pending clustering ops
codecov-io edited a comment on pull request #2388: URL: https://github.com/apache/hudi/pull/2388#issuecomment-751907604 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2388?src=pr=h1) Report > Merging [#2388](https://codecov.io/gh/apache/hudi/pull/2388?src=pr=desc) (4423c0c) into [master](https://codecov.io/gh/apache/hudi/commit/81ccb0c71ad2c17c5613698d4fb50f3b49b21fb4?el=desc) (81ccb0c) will **increase** coverage by `19.15%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2388/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2388?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2388 +/- ## = + Coverage 50.27% 69.43% +19.15% + Complexity 3050 357 -2693 = Files 419 53 -366 Lines 18897 1930-16967 Branches 1937 230 -1707 = - Hits 9500 1340 -8160 + Misses 8622 456 -8166 + Partials775 134 -641 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2388?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../java/org/apache/hudi/common/util/CommitUtils.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29tbWl0VXRpbHMuamF2YQ==) | | | | | [...rg/apache/hudi/common/bloom/SimpleBloomFilter.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL1NpbXBsZUJsb29tRmlsdGVyLmphdmE=) | | | | | [...apache/hudi/common/model/HoodieCommitMetadata.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUNvbW1pdE1ldGFkYXRhLmphdmE=) | | | | | [...pache/hudi/io/storage/HoodieFileReaderFactory.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVGaWxlUmVhZGVyRmFjdG9yeS5qYXZh) | | | | | [...i/common/table/log/block/HoodieHFileDataBlock.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVIRmlsZURhdGFCbG9jay5qYXZh) | | | | | [...hudi/common/config/DFSPropertiesConfiguration.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9ERlNQcm9wZXJ0aWVzQ29uZmlndXJhdGlvbi5qYXZh) | | | | | [.../org/apache/hudi/hadoop/utils/HoodieHiveUtils.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZUhpdmVVdGlscy5qYXZh) | | | | | [...e/hudi/common/model/HoodieRollingStatMetadata.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJvbGxpbmdTdGF0TWV0YWRhdGEuamF2YQ==) | | | | | [...pache/hudi/common/model/HoodieArchivedLogFile.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUFyY2hpdmVkTG9nRmlsZS5qYXZh) | | | | | [.../hive/SlashEncodedHourPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkSG91clBhcnRpdGlvblZhbHVlRXh0cmFjdG9yLmphdmE=) | | | | | ... and [346 more](https://codecov.io/gh/apache/hudi/pull/2388/diff?src=pr=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r561848906 ## File path: hudi-flink/src/main/java/org/apache/hudi/streamer/FlinkStreamerConfig.java ## @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.streamer; + +import org.apache.hudi.common.model.OverwriteWithLatestAvroPayload; +import org.apache.hudi.common.model.WriteOperationType; +import org.apache.hudi.keygen.SimpleAvroKeyGenerator; + +import com.beust.jcommander.Parameter; +import org.apache.flink.configuration.Configuration; + +import java.util.ArrayList; +import java.util.List; + +/** + * Configurations for Hoodie Flink streamer. + */ +public class FlinkStreamerConfig extends Configuration { + @Parameter(names = {"--kafka-topic"}, description = "Kafka topic name.", required = true) + public String kafkaTopic; + + @Parameter(names = {"--kafka-group-id"}, description = "Kafka consumer group id.", required = true) + public String kafkaGroupId; + + @Parameter(names = {"--kafka-bootstrap-servers"}, description = "Kafka bootstrap.servers.", required = true) + public String kafkaBootstrapServers; + + @Parameter(names = {"--flink-checkpoint-path"}, description = "Flink checkpoint path.") + public String flinkCheckPointPath; + + @Parameter(names = {"--flink-block-retry-times"}, description = "Times to retry when latest instant has not completed.") Review comment: `block` seems hard to understand. Any better word? ## File path: hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamerV2.java ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.streamer; + +import org.apache.hudi.operator.FlinkOptions; +import org.apache.hudi.operator.StreamWriteOperatorFactory; +import org.apache.hudi.util.StreamerUtil; + +import com.beust.jcommander.JCommander; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.formats.avro.typeutils.AvroSchemaConverter; +import org.apache.flink.formats.json.JsonRowDataDeserializationSchema; +import org.apache.flink.formats.json.TimestampFormat; +import org.apache.flink.runtime.state.filesystem.FsStateBackend; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.runtime.typeutils.InternalTypeInfo; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.RowType; + +import java.util.Properties; + +/** + * An Utility which can incrementally consume data from Kafka and apply it to the target table. + * currently, it only support COW table and insert, upsert operation. + */ +public class HoodieFlinkStreamerV2 { + public static void main(String[] args) throws Exception { +StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + +final FlinkStreamerConfig cfg = new FlinkStreamerConfig(); +JCommander cmd = new JCommander(cfg, null, args); +if (cfg.help || args.length == 0) { + cmd.usage(); + System.exit(1); +} +env.enableCheckpointing(cfg.checkpointInterval); +env.getConfig().setGlobalJobParameters(cfg); +// We use checkpoint to
[GitHub] [hudi] codecov-io edited a comment on pull request #2374: [HUDI-845] Added locking capability to allow multiple writers
codecov-io edited a comment on pull request #2374: URL: https://github.com/apache/hudi/pull/2374#issuecomment-750782300 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=h1) Report > Merging [#2374](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=desc) (7ababea) into [master](https://codecov.io/gh/apache/hudi/commit/a38612b10f6ae04644519270f9b5eb631a77c148?el=desc) (a38612b) will **decrease** coverage by `41.00%`. > The diff coverage is `100.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2374/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #2374 +/- ## - Coverage 50.69% 9.68% -41.01% + Complexity 3059 48 -3011 Files 419 53 -366 Lines 188101930-16880 Branches 1924 230 -1694 - Hits 9535 187 -9348 + Misses 84981730 -6768 + Partials777 13 -764 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.68% <100.00%> (-59.80%)` | `0.00 <1.00> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `83.62% <100.00%> (-5.18%)` | `28.00 <1.00> (ø)` | | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)`