[jira] [Created] (HUDI-1551) Support Partition with BigDecimal field

2021-01-25 Thread Chanh Le (Jira)
Chanh Le created HUDI-1551: -- Summary: Support Partition with BigDecimal field Key: HUDI-1551 URL: https://issues.apache.org/jira/browse/HUDI-1551 Project: Apache Hudi Issue Type: New Feature

[GitHub] [hudi] shenh062326 commented on pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client

2021-01-25 Thread GitBox
shenh062326 commented on pull request #2382: URL: https://github.com/apache/hudi/pull/2382#issuecomment-767297371 > @shenh062326 Thanks for your contribution, would you please add some tests to verify the java client functionally? Add TestJavaCopyOnWriteActionExecutor.

[GitHub] [hudi] vinothchandar commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox
vinothchandar commented on issue #2013: URL: https://github.com/apache/hudi/issues/2013#issuecomment-767265499 This is now out in the 0.7.0 release. See

[GitHub] [hudi] codecov-io edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox
codecov-io edited a comment on pull request #2487: URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2021-01-25 Thread GitBox
nsivabalan commented on issue #1962: URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175 @bvaradar : guess you missed to follow up on this thread. can you check it out and respond when you can. This is

[GitHub] [hudi] nsivabalan commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2021-01-25 Thread GitBox
nsivabalan commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-767206596 @vinothchandar @umehrot2 : can either of you respond here wrt metadata support(rfc-15) in Athena. when can we possibly expect.

[GitHub] [hudi] jingweiz2017 commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer

2021-01-25 Thread GitBox
jingweiz2017 commented on issue #1971: URL: https://github.com/apache/hudi/issues/1971#issuecomment-767242422 @nsivabalan @bvaradar , thanks for the reply. The commit mentioned by bvaradar should work for me case. This is

[GitHub] [hudi] wangxianghu commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-25 Thread GitBox
wangxianghu commented on a change in pull request #2431: URL: https://github.com/apache/hudi/pull/2431#discussion_r563537637 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -181,16 +183,33 @@ object

[GitHub] [hudi] nsivabalan closed issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables

2021-01-25 Thread GitBox
nsivabalan closed issue #1958: URL: https://github.com/apache/hudi/issues/1958 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] codecov-io edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-01-25 Thread GitBox
codecov-io edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-759677298 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] rubenssoto commented on issue #2484: [SUPPORT] Hudi Write Performance

2021-01-25 Thread GitBox
rubenssoto commented on issue #2484: URL: https://github.com/apache/hudi/issues/2484#issuecomment-767143513 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] nsivabalan commented on issue #1982: [SUPPORT] Not able to write to ADLS Gen2 in Azure Databricks, with error has invalid authority.

2021-01-25 Thread GitBox
nsivabalan commented on issue #1982: URL: https://github.com/apache/hudi/issues/1982#issuecomment-767205667 @Ac-Rush : would you mind update the ticket. This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] vinothchandar commented on pull request #2488: 0.7.0 Doc Revamp

2021-01-25 Thread GitBox
vinothchandar commented on pull request #2488: URL: https://github.com/apache/hudi/pull/2488#issuecomment-767158167 I am going to also cut the release versions for the doc, once I finalize everything w.r.t the release.

[GitHub] [hudi] nsivabalan commented on issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables

2021-01-25 Thread GitBox
nsivabalan commented on issue #1958: URL: https://github.com/apache/hudi/issues/1958#issuecomment-767210126 https://github.com/apache/hudi/pull/1978 have fixed it. This is an automated message from the Apache Git Service. To

[GitHub] [hudi] Karl-WangSK commented on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-25 Thread GitBox
Karl-WangSK commented on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-767261660 cc @yanghua This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] nsivabalan commented on a change in pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox
nsivabalan commented on a change in pull request #2487: URL: https://github.com/apache/hudi/pull/2487#discussion_r564142151 ## File path: hudi-common/src/main/java/org/apache/hudi/index/HoodieRecordLevelIndexPayload.java ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] rubenssoto closed issue #2484: [SUPPORT] Hudi Write Performance

2021-01-25 Thread GitBox
rubenssoto closed issue #2484: URL: https://github.com/apache/hudi/issues/2484 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] codecov-io commented on pull request #2486: Filtering abnormal data which the recordKeyField or precombineField is null in avro format

2021-01-25 Thread GitBox
codecov-io commented on pull request #2486: URL: https://github.com/apache/hudi/pull/2486#issuecomment-766863772 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2486?src=pr=h1) Report > Merging [#2486](https://codecov.io/gh/apache/hudi/pull/2486?src=pr=desc) (5476bf0) into

[GitHub] [hudi] vinothchandar commented on pull request #2442: Adding new configurations in 0.7.0

2021-01-25 Thread GitBox
vinothchandar commented on pull request #2442: URL: https://github.com/apache/hudi/pull/2442#issuecomment-767102394 Will close this and open a new one This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] codecov-io edited a comment on pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-25 Thread GitBox
codecov-io edited a comment on pull request #2430: URL: https://github.com/apache/hudi/pull/2430#issuecomment-757736411 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] vinothchandar commented on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-25 Thread GitBox
vinothchandar commented on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766593559 cc @garyli1019 mind taking a first pass at this PR? :) This is an automated message from the Apache Git

[GitHub] [hudi] codecov-io edited a comment on pull request #2443: [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-25 Thread GitBox
codecov-io edited a comment on pull request #2443: URL: https://github.com/apache/hudi/pull/2443#issuecomment-760147630 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io edited a comment on pull request #2486: Filtering abnormal data which the recordKeyField or precombineField is null in avro format

2021-01-25 Thread GitBox
codecov-io edited a comment on pull request #2486: URL: https://github.com/apache/hudi/pull/2486#issuecomment-766863772 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox
codecov-io commented on pull request #2487: URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=h1) Report > Merging [#2487](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=desc) (8b07157) into

[GitHub] [hudi] nsivabalan commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox
nsivabalan commented on issue #2013: URL: https://github.com/apache/hudi/issues/2013#issuecomment-767204986 @garyli1019 : can you give any updates you have on on this regard. This is an automated message from the Apache Git

[GitHub] [hudi] vinothchandar closed issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox
vinothchandar closed issue #2013: URL: https://github.com/apache/hudi/issues/2013 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] kirkuz commented on issue #2323: [SUPPORT] GLOBAL_BLOOM index significantly slowing down processing time

2021-01-25 Thread GitBox
kirkuz commented on issue #2323: URL: https://github.com/apache/hudi/issues/2323#issuecomment-766649165 Hi @nsivabalan, I think we can close this issue for now. I've changed from GLOBAL_BLOOM to SIMPLE index with static partition keys, cause GLOBAL_BLOOM was too slow in my use

[GitHub] [hudi] vinothchandar commented on issue #2484: [SUPPORT] Hudi Write Performance

2021-01-25 Thread GitBox
vinothchandar commented on issue #2484: URL: https://github.com/apache/hudi/issues/2484#issuecomment-767154231 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] nsivabalan commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer

2021-01-25 Thread GitBox
nsivabalan commented on issue #1971: URL: https://github.com/apache/hudi/issues/1971#issuecomment-767208636 @jingweiz2017 : can you please check above response and let us know if you need anything more from Hudi community.

[GitHub] [hudi] vinothchandar closed pull request #2442: Adding new configurations in 0.7.0

2021-01-25 Thread GitBox
vinothchandar closed pull request #2442: URL: https://github.com/apache/hudi/pull/2442 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] rubenssoto commented on pull request #2283: [HUDI-1415] Incorrect query result for hudi hive table when using spa…

2021-01-25 Thread GitBox
rubenssoto commented on pull request #2283: URL: https://github.com/apache/hudi/pull/2283#issuecomment-767117951 I had the same problem, but I saw less rows not more. Reading with spark datasource I have more than 30 million rows and using spark sql with hive only 4 million. I

[GitHub] [hudi] pengzhiwei2018 commented on pull request #1880: [WIP] [HUDI-1125] build framework to support structured streaming

2021-01-25 Thread GitBox
pengzhiwei2018 commented on pull request #1880: URL: https://github.com/apache/hudi/pull/1880#issuecomment-766562247 > Hello, > > Hudi will have nice features like clustering and clustering probably will rewrite a lot of data, so is it possible this rewrites without new data doesn't

[GitHub] [hudi] teeyog commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-25 Thread GitBox
teeyog commented on a change in pull request #2431: URL: https://github.com/apache/hudi/pull/2431#discussion_r563598187 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -181,16 +183,33 @@ object

[GitHub] [hudi] vinothchandar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-25 Thread GitBox
vinothchandar commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766590769 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] satishkotha commented on a change in pull request #2483: [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation

2021-01-25 Thread GitBox
satishkotha commented on a change in pull request #2483: URL: https://github.com/apache/hudi/pull/2483#discussion_r563962124 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala ## @@ -198,6 +198,31 @@ class

[GitHub] [hudi] vinothchandar commented on pull request #2111: [HUDI-1234] Insert new records to data files without merging for "Insert" operation.

2021-01-25 Thread GitBox
vinothchandar commented on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-767103157 @nsivabalan I thought we were going to get this in to 0.7.0? checked back again, to see why this was missing

[GitHub] [hudi] vburenin commented on pull request #2476: [HUDI-1538] Try to init class trying different signatures instead of checking its name

2021-01-25 Thread GitBox
vburenin commented on pull request #2476: URL: https://github.com/apache/hudi/pull/2476#issuecomment-766947415 Can anybody merge this PR, please? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] codecov-io edited a comment on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-25 Thread GitBox
codecov-io edited a comment on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-757929313 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] cadl closed issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed

2021-01-25 Thread GitBox
cadl closed issue #2063: URL: https://github.com/apache/hudi/issues/2063 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] nsivabalan commented on issue #2204: [SUPPORT] Hive count(*) query on _rt table failing with exception

2021-01-25 Thread GitBox
nsivabalan commented on issue #2204: URL: https://github.com/apache/hudi/issues/2204#issuecomment-766437535 @BalaMahesh : Would you mind updating the ticket. We will close this out in a weeks time if there are no activity. But feel free to re-open or create a new ticket if you have more

[GitHub] [hudi] nsivabalan commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?

2021-01-25 Thread GitBox
nsivabalan commented on issue #2284: URL: https://github.com/apache/hudi/issues/2284#issuecomment-766436747 @sanket-khedikar : can you please respond if the suggested approaches work for you. or you still need more enhancements from Hudi? If it's solved, would appreciate if you can close

[GitHub] [hudi] nsivabalan commented on issue #2330: Concurrent writes from multiple Spark drivers to S3 support

2021-01-25 Thread GitBox
nsivabalan commented on issue #2330: URL: https://github.com/apache/hudi/issues/2330#issuecomment-766433970 @vinothchandar @borislitvak : since we have a tracking jira, do you think we can close this? or is there anything pending to be resolved or discussed.

[GitHub] [hudi] nsivabalan closed issue #2429: [SUPPORT] S3 throws ConnectionPoolTimeoutException: Timeout waiting for connection from pool when metadata table is turned on

2021-01-25 Thread GitBox
nsivabalan closed issue #2429: URL: https://github.com/apache/hudi/issues/2429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] xushiyan merged pull request #2478: [HUDI-1476] Introduce unit test infra for java client

2021-01-25 Thread GitBox
xushiyan merged pull request #2478: URL: https://github.com/apache/hudi/pull/2478 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] vinothchandar merged pull request #2481: [MINOR] Removing spring repos from pom

2021-01-25 Thread GitBox
vinothchandar merged pull request #2481: URL: https://github.com/apache/hudi/pull/2481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] git-raj commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?

2021-01-25 Thread GitBox
git-raj commented on issue #2284: URL: https://github.com/apache/hudi/issues/2284#issuecomment-766523668 using AWS Glue pySpark and Hudi and S3 as data store: i'm trying to do the traditional SCD Type 2 where old record gets updated with the insert datetime on 'effective to' field,

[GitHub] [hudi] nsivabalan commented on issue #2323: [SUPPORT] GLOBAL_BLOOM index significantly slowing down processing time

2021-01-25 Thread GitBox
nsivabalan commented on issue #2323: URL: https://github.com/apache/hudi/issues/2323#issuecomment-766435871 @Kirkuz: Do you have any updates in this regard. Can you please respond or let us know if you have more questions.

[GitHub] [hudi] nsivabalan closed issue #2480: [SUPPORT] The Docker demo document description is incorrect

2021-01-25 Thread GitBox
nsivabalan closed issue #2480: URL: https://github.com/apache/hudi/issues/2480 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] rubenssoto edited a comment on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-25 Thread GitBox
rubenssoto edited a comment on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766496187 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] vinothchandar commented on issue #2330: Concurrent writes from multiple Spark drivers to S3 support

2021-01-25 Thread GitBox
vinothchandar commented on issue #2330: URL: https://github.com/apache/hudi/issues/2330#issuecomment-766441408 we can close this out This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] vinothchandar commented on issue #2479: [SUPPORT] Dependency Issue When I Try Build Hudi From Source

2021-01-25 Thread GitBox
vinothchandar commented on issue #2479: URL: https://github.com/apache/hudi/issues/2479#issuecomment-766369989 Great. No thank you for catching :). eventually as m2 caches are lost, I think build would have failed. may be month or so from now :). Will merge the fix

[GitHub] [hudi] rubenssoto commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-25 Thread GitBox
rubenssoto commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766496187 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] zherenyu831 commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-25 Thread GitBox
zherenyu831 commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766482729 @bvaradar Hi Bavaradar, it will be little difficult to replicate the problem, since it only happens on huge amount of data.

[GitHub] [hudi] codecov-io edited a comment on pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client

2021-01-25 Thread GitBox
codecov-io edited a comment on pull request #2382: URL: https://github.com/apache/hudi/pull/2382#issuecomment-751367927 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] nsivabalan edited a comment on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-25 Thread GitBox
nsivabalan edited a comment on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766436364 @zherenyu831 : can you please respond with any updates on your end. @n3nash : can you please take a look when you have time. If you were able to narrow down the issue,

[GitHub] [hudi] nsivabalan commented on issue #2135: [SUPPORT] GDPR safe deletes is complex

2021-01-25 Thread GitBox
nsivabalan commented on issue #2135: URL: https://github.com/apache/hudi/issues/2135#issuecomment-766439085 @andaag : I have created a Hudi ticket for this. Feel free to update the desc of the ticket with more details https://issues.apache.org/jira/browse/HUDI-1549

[GitHub] [hudi] nsivabalan commented on issue #2123: Timestamp not parsed correctly on Athena

2021-01-25 Thread GitBox
nsivabalan commented on issue #2123: URL: https://github.com/apache/hudi/issues/2123#issuecomment-766439219 @satishkotha : when you get a chance, can you please follow up on this. This is an automated message from the

[GitHub] [hudi] nsivabalan commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-25 Thread GitBox
nsivabalan commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766436364 @zherenyu831 : can you please respond with any updates on your end. @n3nash : can you take a look when you have time.

[GitHub] [hudi] nsivabalan commented on issue #2467: [Travis issue] TestJsonStringToHoodieRecordMapFunction.testMapFunction failed

2021-01-25 Thread GitBox
nsivabalan commented on issue #2467: URL: https://github.com/apache/hudi/issues/2467#issuecomment-766427684 Have created a tracking jira https://issues.apache.org/jira/browse/HUDI-1547 This is an automated message from

[GitHub] [hudi] nsivabalan commented on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer

2021-01-25 Thread GitBox
nsivabalan commented on issue #2121: URL: https://github.com/apache/hudi/issues/2121#issuecomment-766439932 @liujinhui1994 : We already have an [example in our

[GitHub] [hudi] vinothchandar closed issue #2330: Concurrent writes from multiple Spark drivers to S3 support

2021-01-25 Thread GitBox
vinothchandar closed issue #2330: URL: https://github.com/apache/hudi/issues/2330 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] nsivabalan commented on issue #2429: [SUPPORT] S3 throws ConnectionPoolTimeoutException: Timeout waiting for connection from pool when metadata table is turned on

2021-01-25 Thread GitBox
nsivabalan commented on issue #2429: URL: https://github.com/apache/hudi/issues/2429#issuecomment-766428773 @vinothchandar : closing this for now. feel free to re-open if you see more issues. This is an automated message

[GitHub] [hudi] vinothchandar commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-25 Thread GitBox
vinothchandar commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766450275 cc @garyli1019 as well This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] nsivabalan commented on issue #2399: [SUPPORT] Hudi deletes not being properly commited

2021-01-25 Thread GitBox
nsivabalan commented on issue #2399: URL: https://github.com/apache/hudi/issues/2399#issuecomment-766431496 @afeldman1 : can you respond when you can. This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] codecov-io commented on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-25 Thread GitBox
codecov-io commented on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=h1) Report > Merging [#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=desc) (91cf083) into

[GitHub] [hudi] nsivabalan edited a comment on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-25 Thread GitBox
nsivabalan edited a comment on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-766440534 @n3nash @bhasudha : sorry the thread is bit long, so couldn't gauge correctly. I see some workarounds have been proposed and it worked. But do we need to fixes in Hudi in

[GitHub] [hudi] nsivabalan commented on issue #2329: [SUPPORT] Time Travel (querying the historical versions of data) ability for Hudi Table

2021-01-25 Thread GitBox
nsivabalan commented on issue #2329: URL: https://github.com/apache/hudi/issues/2329#issuecomment-766435383 https://issues.apache.org/jira/browse/HUDI-1460 This is an automated message from the Apache Git Service. To

[GitHub] [hudi] nsivabalan commented on issue #2480: [SUPPORT] The Docker demo document description is incorrect

2021-01-25 Thread GitBox
nsivabalan commented on issue #2480: URL: https://github.com/apache/hudi/issues/2480#issuecomment-766427153 Sure, will take it up. Closing it as we have a tracking jira. https://issues.apache.org/jira/browse/HUDI-1546

[GitHub] [hudi] nsivabalan edited a comment on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer

2021-01-25 Thread GitBox
nsivabalan edited a comment on issue #2121: URL: https://github.com/apache/hudi/issues/2121#issuecomment-766439932 @liujinhui1994 : Sorry about the delay. We already have an [example in our

[GitHub] [hudi] nsivabalan commented on issue #2066: [SUPPORT] Hudi is increasing the storage size big time

2021-01-25 Thread GitBox
nsivabalan commented on issue #2066: URL: https://github.com/apache/hudi/issues/2066#issuecomment-766449665 @KarthickAN : did you get a chance to try out the suggestion from Balaji. please do update the issue w/ any updates. If the issue is resolved, feel free to close it out.

[GitHub] [hudi] nsivabalan commented on issue #2367: [SUPPORT] Seek error when querying MOR Tables in GCP

2021-01-25 Thread GitBox
nsivabalan commented on issue #2367: URL: https://github.com/apache/hudi/issues/2367#issuecomment-766431687 Sure. sorry about the delay. will get to this in a day or two. This is an automated message from the Apache Git

[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-25 Thread GitBox
codecov-io edited a comment on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] nsivabalan commented on issue #2331: Why does Hudi not support field deletions?

2021-01-25 Thread GitBox
nsivabalan commented on issue #2331: URL: https://github.com/apache/hudi/issues/2331#issuecomment-766432877 @prashantwason : In lieu of this ticket, do you think we can update our documentation wrt schema evolution. If you don't mind can you take it up and fix our documentation.

[GitHub] [hudi] nsivabalan commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced

2021-01-25 Thread GitBox
nsivabalan commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-766438221 @KarthickAN : hope you got a chance to go through our [blog on indexes in Hudi](https://hudi.apache.org/blog/hudi-indexing-mechanisms/). Wrt this gh issue, please do let us know if

[GitHub] [hudi] nsivabalan commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-25 Thread GitBox
nsivabalan commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-766440534 @n3nash @bhasudha : sorry the thread is bit long. I see some workarounds have been proposed and it worked. But do we need to fixes in Hudi in general? if yes, can you file a jira

[GitHub] [hudi] nsivabalan commented on issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed

2021-01-25 Thread GitBox
nsivabalan commented on issue #2063: URL: https://github.com/apache/hudi/issues/2063#issuecomment-766449860 @cadl : did you get a chance to try out the setting? We plan to close out this issue due to inactivity in a weeks time. But feel free to reopen to create a new ticket if you find

[GitHub] [hudi] lshg opened a new issue #2490: spark read hudi data from hive

2021-01-25 Thread GitBox
lshg opened a new issue #2490: URL: https://github.com/apache/hudi/issues/2490 package com.gjr.recommend import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.{SparkConf, SparkContext} object DWDTenderLog {

[GitHub] [hudi] lshg opened a new issue #2489: [SUPPORT]

2021-01-25 Thread GitBox
lshg opened a new issue #2489: URL: https://github.com/apache/hudi/issues/2489 hive (app)> SELECT > projectid, > provinceid, > typeId, > antistop > FROM > app.dwd_recommend_tender_ds > WHERE

[GitHub] [hudi] vinothchandar commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox
vinothchandar commented on issue #2013: URL: https://github.com/apache/hudi/issues/2013#issuecomment-767265499 This is now out in the 0.7.0 release. See

[GitHub] [hudi] vinothchandar closed issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox
vinothchandar closed issue #2013: URL: https://github.com/apache/hudi/issues/2013 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] Karl-WangSK commented on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-25 Thread GitBox
Karl-WangSK commented on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-767261660 cc @yanghua This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] jingweiz2017 commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer

2021-01-25 Thread GitBox
jingweiz2017 commented on issue #1971: URL: https://github.com/apache/hudi/issues/1971#issuecomment-767242422 @nsivabalan @bvaradar , thanks for the reply. The commit mentioned by bvaradar should work for me case. This is

[GitHub] [hudi] codecov-io edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox
codecov-io edited a comment on pull request #2487: URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox
codecov-io commented on pull request #2487: URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=h1) Report > Merging [#2487](https://codecov.io/gh/apache/hudi/pull/2487?src=pr=desc) (8b07157) into

svn commit: r45595 - in /release/hudi: 0.7.0/ hudi-0.7.0/

2021-01-25 Thread vinoth
Author: vinoth Date: Tue Jan 26 01:37:48 2021 New Revision: 45595 Log: Renaming for Hudi 0.7.0 Added: release/hudi/0.7.0/ - copied from r45594, release/hudi/hudi-0.7.0/ Removed: release/hudi/hudi-0.7.0/

[jira] [Commented] (HUDI-1547) CI intermittent failure: TestJsonStringToHoodieRecordMapFunction.testMapFunction

2021-01-25 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271766#comment-17271766 ] wangxianghu commented on HUDI-1547: --- [~vinoth] I can take it > CI intermittent failure: >

[jira] [Assigned] (HUDI-1547) CI intermittent failure: TestJsonStringToHoodieRecordMapFunction.testMapFunction

2021-01-25 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu reassigned HUDI-1547: - Assignee: wangxianghu > CI intermittent failure: >

[GitHub] [hudi] nsivabalan closed issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables

2021-01-25 Thread GitBox
nsivabalan closed issue #1958: URL: https://github.com/apache/hudi/issues/1958 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] nsivabalan commented on issue #1958: [SUPPORT] Global Indexes return old partition value when querying Hive tables

2021-01-25 Thread GitBox
nsivabalan commented on issue #1958: URL: https://github.com/apache/hudi/issues/1958#issuecomment-767210126 https://github.com/apache/hudi/pull/1978 have fixed it. This is an automated message from the Apache Git Service. To

[GitHub] [hudi] nsivabalan commented on a change in pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

2021-01-25 Thread GitBox
nsivabalan commented on a change in pull request #2487: URL: https://github.com/apache/hudi/pull/2487#discussion_r564142151 ## File path: hudi-common/src/main/java/org/apache/hudi/index/HoodieRecordLevelIndexPayload.java ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2021-01-25 Thread GitBox
nsivabalan commented on issue #1962: URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175 @bvaradar : guess you missed to follow up on this thread. can you check it out and respond when you can. This is

[GitHub] [hudi] nsivabalan commented on issue #1971: Schema evoluation causes issue when using kafka source in hudi deltastreamer

2021-01-25 Thread GitBox
nsivabalan commented on issue #1971: URL: https://github.com/apache/hudi/issues/1971#issuecomment-767208636 @jingweiz2017 : can you please check above response and let us know if you need anything more from Hudi community.

[GitHub] [hudi] nsivabalan commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2021-01-25 Thread GitBox
nsivabalan commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-767206596 @vinothchandar @umehrot2 : can either of you respond here wrt metadata support(rfc-15) in Athena. when can we possibly expect.

[GitHub] [hudi] nsivabalan commented on issue #1982: [SUPPORT] Not able to write to ADLS Gen2 in Azure Databricks, with error has invalid authority.

2021-01-25 Thread GitBox
nsivabalan commented on issue #1982: URL: https://github.com/apache/hudi/issues/1982#issuecomment-767205667 @Ac-Rush : would you mind update the ticket. This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] nsivabalan commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

2021-01-25 Thread GitBox
nsivabalan commented on issue #2013: URL: https://github.com/apache/hudi/issues/2013#issuecomment-767204986 @garyli1019 : can you give any updates you have on on this regard. This is an automated message from the Apache Git

[jira] [Reopened] (HUDI-284) Need Tests for Hudi handling of schema evolution

2021-01-25 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reopened HUDI-284: - > Need Tests for Hudi handling of schema evolution >

[jira] [Resolved] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table

2021-01-25 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-575. - Resolution: Fixed > Support Async Compaction for spark streaming writes to hudi table >

[jira] [Resolved] (HUDI-284) Need Tests for Hudi handling of schema evolution

2021-01-25 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-284. - Resolution: Fixed > Need Tests for Hudi handling of schema evolution >

[jira] [Reopened] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table

2021-01-25 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reopened HUDI-575: - > Support Async Compaction for spark streaming writes to hudi table >

[jira] [Resolved] (HUDI-791) Replace null by Option in Delta Streamer

2021-01-25 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-791. - Resolution: Fixed > Replace null by Option in Delta Streamer >

  1   2   >