[GitHub] [hudi] bhasudha commented on issue #1800: [SUPPORT] finalize errors "at org.apache.hudi.table.HoodieTable.cleanFailedWrites"

2020-07-08 Thread GitBox
bhasudha commented on issue #1800: URL: https://github.com/apache/hudi/issues/1800#issuecomment-655908700 @tooptoop4 are you enabling this config - https://hudi.apache.org/docs/configurations.html#withConsistencyCheckEnabled ? Also is this inflight commit one of the lingering pending

[GitHub] [hudi] vinothchandar commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-07-08 Thread GitBox
vinothchandar commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-655886870 > for our use-case some of our tables will be almost exclusively inserts, so I'm worried the current behavior will result in many parquet files and degrading performance.

[hudi] branch master updated (086853c -> d58644b)

2020-07-08 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 086853c [HUDI-1080] Fix backward compatibility for com.uber inputformats add d58644b [HUDI-1062]Remove

[GitHub] [hudi] vinothchandar merged pull request #1779: [HUDI-1062]Remove unnecessary maxEvent check and add some log in KafkaOffsetGen

2020-07-08 Thread GitBox
vinothchandar merged pull request #1779: URL: https://github.com/apache/hudi/pull/1779 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] garyli1019 commented on a change in pull request #1810: [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync

2020-07-08 Thread GitBox
garyli1019 commented on a change in pull request #1810: URL: https://github.com/apache/hudi/pull/1810#discussion_r451945500 ## File path: hudi-sync/hudi-hive-sync/pom.xml ## @@ -43,6 +45,11 @@ hudi-hadoop-mr ${project.version} + + org.apache.hudi +

[GitHub] [hudi] vinothchandar commented on issue #1811: Deltastreamer Offset exception -Prod

2020-07-08 Thread GitBox
vinothchandar commented on issue #1811: URL: https://github.com/apache/hudi/issues/1811#issuecomment-655885342 Seems related to spark bug https://issues.apache.org/jira/browse/SPARK-17147 It must be fixed in 2.4. Can you upgrade spark and try?

[jira] [Commented] (HUDI-1007) When earliestOffsets is greater than checkpoint, Hudi will not be able to successfully consume data

2020-07-08 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154174#comment-17154174 ] Vinoth Chandar commented on HUDI-1007: -- [~liujinhui] please take a look at

[GitHub] [hudi] vinothchandar commented on issue #1811: Deltastreamer Offset exception -Prod

2020-07-08 Thread GitBox
vinothchandar commented on issue #1811: URL: https://github.com/apache/hudi/issues/1811#issuecomment-655883408 This seems related to HUDI-1007 .. great we have a stacktrace now This is an automated message from the Apache

[GitHub] [hudi] vinothchandar commented on issue #1794: [SUPPORT] Hudi delete operation but HiveSync failed

2020-07-08 Thread GitBox
vinothchandar commented on issue #1794: URL: https://github.com/apache/hudi/issues/1794#issuecomment-655881102 @nsivabalan might be worth reproducing this in the docker environment? This is an automated message from the

[GitHub] [hudi] vinothchandar commented on pull request #1793: [HUDI-1068] Fixing deletes in global bloom

2020-07-08 Thread GitBox
vinothchandar commented on pull request #1793: URL: https://github.com/apache/hudi/pull/1793#issuecomment-655880737 @nsivabalan can we also address simple global index This is an automated message from the Apache Git

[GitHub] [hudi] vinothchandar commented on issue #1798: Question reading partition path with less level is more faster than what document mentioned

2020-07-08 Thread GitBox
vinothchandar commented on issue #1798: URL: https://github.com/apache/hudi/issues/1798#issuecomment-655880193 @zherenyu831 one thing I don’t understand from your original description is wat you mean by 4000+ files vs 600+ files. If it’s the same result then how can the files be different

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #333

2020-07-08 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.34 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[GitHub] [hudi] vinothchandar commented on issue #1798: Question reading partition path with less level is more faster than what document mentioned

2020-07-08 Thread GitBox
vinothchandar commented on issue #1798: URL: https://github.com/apache/hudi/issues/1798#issuecomment-655878475 @umehrot2 any ideas? This is an automated message from the Apache Git Service. To respond to the message, please

[jira] [Commented] (HUDI-1079) Cannot upsert on schema with Array of Record with single field

2020-07-08 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154167#comment-17154167 ] Vinoth Chandar commented on HUDI-1079: -- thanks for the methodical analysis[~tase].. at first glance ,

[jira] [Updated] (HUDI-1079) Cannot upsert on schema with Array of Record with single field

2020-07-08 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1079: - Fix Version/s: 0.6.0 > Cannot upsert on schema with Array of Record with single field >

[GitHub] [hudi] vinothchandar commented on pull request #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-07-08 Thread GitBox
vinothchandar commented on pull request #1406: URL: https://github.com/apache/hudi/pull/1406#issuecomment-655873626 @umehrot2 can you please chime in ? @aditanase let me check out the jira as well This is an automated

[GitHub] [hudi] vinothchandar commented on issue #1786: [SUPPORT] Bulk insert slow on MOR

2020-07-08 Thread GitBox
vinothchandar commented on issue #1786: URL: https://github.com/apache/hudi/issues/1786#issuecomment-655871974 @rvd8345 part of the issue here is the sort we do in bulk_insert to seed the dataset such that files are ordered by keys. This helps later in upsert performance. we are

[jira] [Commented] (HUDI-684) Introduce abstraction for writing and reading and compacting from FileGroups

2020-07-08 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154158#comment-17154158 ] liwei commented on HUDI-684: [~pwason] okay , i will do HUDI-1084 latter. And also i think HUDI-957 is very 

[jira] [Resolved] (HUDI-1080) Fix backward compatiblity for com.uber input formats

2020-07-08 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish resolved HUDI-1080. -- Resolution: Fixed > Fix backward compatiblity for com.uber input formats >

[jira] [Assigned] (HUDI-1080) Fix backward compatiblity for com.uber input formats

2020-07-08 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish reassigned HUDI-1080: Assignee: satish > Fix backward compatiblity for com.uber input formats >

[jira] [Updated] (HUDI-1080) Fix backward compatiblity for com.uber input formats

2020-07-08 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-1080: - Status: Open (was: New) > Fix backward compatiblity for com.uber input formats >

[GitHub] [hudi] leesf closed pull request #1810: [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync

2020-07-08 Thread GitBox
leesf closed pull request #1810: URL: https://github.com/apache/hudi/pull/1810 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] prashanthpdesai opened a new issue #1811: Deltastreamer Offset exception -Prod

2020-07-08 Thread GitBox
prashanthpdesai opened a new issue #1811: URL: https://github.com/apache/hudi/issues/1811 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? Yes - Join the mailing list to engage in conversations and

[GitHub] [hudi] prashanthpdesai commented on issue #1775: INCREMETNAL QUERY-Null value Exception

2020-07-08 Thread GitBox
prashanthpdesai commented on issue #1775: URL: https://github.com/apache/hudi/issues/1775#issuecomment-655791558 @bhasudha : if i understood correctly its not back ward compatible , we will try to check in 2.4.4 if its available in any of our environment.

[GitHub] [hudi] n3nash merged pull request #1809: [HUDI-1080] Fix backward compatibility for com.uber inputformats

2020-07-08 Thread GitBox
n3nash merged pull request #1809: URL: https://github.com/apache/hudi/pull/1809 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[hudi] branch master updated (7b2a947 -> 086853c)

2020-07-08 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 7b2a947 [HUDI-1069] Remove duplicate assertNoWriteErrors() (#1797) add 086853c [HUDI-1080] Fix backward

[jira] [Commented] (HUDI-684) Introduce abstraction for writing and reading and compacting from FileGroups

2020-07-08 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153856#comment-17153856 ] Prashant Wason commented on HUDI-684: - I see you have filed issue 1084 for this change. I support the

[jira] [Commented] (HUDI-684) Introduce abstraction for writing and reading and compacting from FileGroups

2020-07-08 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153855#comment-17153855 ] Prashant Wason commented on HUDI-684: - Hi [~309637554], Using HoodieTable.getBaseFileExtension is

[GitHub] [hudi] satishkotha commented on pull request #1809: [HUDI-1080] Fix backward compatibility for com.uber inputformats

2020-07-08 Thread GitBox
satishkotha commented on pull request #1809: URL: https://github.com/apache/hudi/pull/1809#issuecomment-655652601 > @satishkotha I have COW tables with data written by hoodie 0.4.6, I've now followed

[GitHub] [hudi] lw309637554 opened a new pull request #1810: [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync

2020-07-08 Thread GitBox
lw309637554 opened a new pull request #1810: URL: https://github.com/apache/hudi/pull/1810 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] tooptoop4 commented on pull request #1809: [HUDI-1080] Fix backward compatibility for com.uber inputformats

2020-07-08 Thread GitBox
tooptoop4 commented on pull request #1809: URL: https://github.com/apache/hudi/pull/1809#issuecomment-655609363 @satishkotha I have COW tables with data written by hoodie 0.4.6, I've now followed

[jira] [Commented] (HUDI-684) Introduce abstraction for writing and reading and compacting from FileGroups

2020-07-08 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153696#comment-17153696 ] liwei commented on HUDI-684: [~pwason]  when this issue "Introduce abstraction for writing and reading and

[jira] [Created] (HUDI-1084) modify the getFileExtension in some tests to use HoodieTable.getBaseFileExtension

2020-07-08 Thread liwei (Jira)
liwei created HUDI-1084: --- Summary: modify the getFileExtension in some tests to use HoodieTable.getBaseFileExtension Key: HUDI-1084 URL: https://issues.apache.org/jira/browse/HUDI-1084 Project: Apache Hudi

[jira] [Assigned] (HUDI-1084) modify the getFileExtension in some tests to use HoodieTable.getBaseFileExtension

2020-07-08 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei reassigned HUDI-1084: --- Assignee: liwei > modify the getFileExtension in some tests to use > HoodieTable.getBaseFileExtension >

[GitHub] [hudi] lw309637554 removed a comment on pull request #1807: [HUDI-875] Abstract hudi-sync-common ,and support hudi-sync-hive

2020-07-08 Thread GitBox
lw309637554 removed a comment on pull request #1807: URL: https://github.com/apache/hudi/pull/1807#issuecomment-655577007 > @lw309637554 can you please check CI failure? > > Before diving in, are there any backwards compatible changes/special upgrade instructions needed for users

[GitHub] [hudi] lw309637554 commented on pull request #1807: [HUDI-875] Abstract hudi-sync-common ,and support hudi-sync-hive

2020-07-08 Thread GitBox
lw309637554 commented on pull request #1807: URL: https://github.com/apache/hudi/pull/1807#issuecomment-655577007 > @lw309637554 can you please check CI failure? > > Before diving in, are there any backwards compatible changes/special upgrade instructions needed for users with this

[GitHub] [hudi] lw309637554 closed pull request #1807: [HUDI-875] Abstract hudi-sync-common ,and support hudi-sync-hive

2020-07-08 Thread GitBox
lw309637554 closed pull request #1807: URL: https://github.com/apache/hudi/pull/1807 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[jira] [Updated] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-08 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1082: -- Description: In

[jira] [Created] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

2020-07-08 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1083: - Summary: Minor optimization in Determining insert bucket location for a given key Key: HUDI-1083 URL: https://issues.apache.org/jira/browse/HUDI-1083

[jira] [Created] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-08 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1082: - Summary: Bug in deciding the upsert/insert buckets Key: HUDI-1082 URL: https://issues.apache.org/jira/browse/HUDI-1082 Project: Apache Hudi Issue

[GitHub] [hudi] nsivabalan commented on a change in pull request #1792: [HUDI-802] Fixing deletes for inserts in same batch in write path

2020-07-08 Thread GitBox
nsivabalan commented on a change in pull request #1792: URL: https://github.com/apache/hudi/pull/1792#discussion_r451583378 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -59,24 +59,28 @@ public

[GitHub] [hudi] RajasekarSribalan edited a comment on issue #1794: [SUPPORT] Hudi delete operation but HiveSync failed

2020-07-08 Thread GitBox
RajasekarSribalan edited a comment on issue #1794: URL: https://github.com/apache/hudi/issues/1794#issuecomment-655261425 Thanks @vinothchandar @bhasudha . I ll try to fetch the commit file but as of now I have now disabled hive sync for delete operation and now I don't get this error

[GitHub] [hudi] sbernauer commented on issue #1806: [SUPPORT] Deltastreamer can`t validate rewritten record that is valid

2020-07-08 Thread GitBox
sbernauer commented on issue #1806: URL: https://github.com/apache/hudi/issues/1806#issuecomment-655461841 I tracked the failing validation down to this line:

[jira] [Commented] (HUDI-1079) Cannot upsert on schema with Array of Record with single field

2020-07-08 Thread Adrian Tanase (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153364#comment-17153364 ] Adrian Tanase commented on HUDI-1079: - Spent just a bit more time going through the parquet spec.

[GitHub] [hudi] sbernauer edited a comment on issue #1806: [SUPPORT] Deltastreamer can`t validate rewritten record that is valid

2020-07-08 Thread GitBox
sbernauer edited a comment on issue #1806: URL: https://github.com/apache/hudi/issues/1806#issuecomment-655375517 I just wanted to point out, that i use a patched version of avro 1.8.2. If i dont patch avro 1.8.2 the following exception occurs: "org.apache.avro.AvroRuntimeException:

[GitHub] [hudi] sbernauer commented on issue #1806: [SUPPORT] Deltastreamer can`t validate rewritten record that is valid

2020-07-08 Thread GitBox
sbernauer commented on issue #1806: URL: https://github.com/apache/hudi/issues/1806#issuecomment-655375517 I just wanted to point out, that i use a patched version of avro 1.8.2. If i dont patch avro 1.8.2 the following exception occurs: "org.apache.avro.AvroRuntimeException: Unknown

[GitHub] [hudi] bhasudha commented on issue #1785: [SUPPORT] please rollback greater commits first

2020-07-08 Thread GitBox
bhasudha commented on issue #1785: URL: https://github.com/apache/hudi/issues/1785#issuecomment-655352986 > @bhasudha what if i don't clean them up? ie if i just leave them there then what is the issue? It shouldn't have any impact on queries. since inflight commits are not

[GitHub] [hudi] bhasudha commented on issue #1775: INCREMETNAL QUERY-Null value Exception

2020-07-08 Thread GitBox
bhasudha commented on issue #1775: URL: https://github.com/apache/hudi/issues/1775#issuecomment-655350766 @prashanthpdesai This definitely looks like Avro incompatibility. The spark-avro and spark version must match - https://hudi.apache.org/docs/quick-start-guide.html#setup has a note

[GitHub] [hudi] bhasudha closed issue #1325: presto - querying nested object in parquet file created by hudi

2020-07-08 Thread GitBox
bhasudha closed issue #1325: URL: https://github.com/apache/hudi/issues/1325 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bhasudha commented on issue #1325: presto - querying nested object in parquet file created by hudi

2020-07-08 Thread GitBox
bhasudha commented on issue #1325: URL: https://github.com/apache/hudi/issues/1325#issuecomment-655317880 @vinothchandar We can close this due to inactivity. I haven't been able to reproduce it. @adamjoneill Please feel free to re-open this anytime should you see this resurface again.

[jira] [Created] (HUDI-1081) Document AWS Hudi integration

2020-07-08 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1081: --- Summary: Document AWS Hudi integration Key: HUDI-1081 URL: https://issues.apache.org/jira/browse/HUDI-1081 Project: Apache Hudi Issue Type: Improvement

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-07-08 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-655305119