[jira] [Created] (HUDI-1098) Marker file finalizing may block on a data file that was never written

2020-07-14 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1098: Summary: Marker file finalizing may block on a data file that was never written Key: HUDI-1098 URL: https://issues.apache.org/jira/browse/HUDI-1098 Project: Apache

[jira] [Created] (HUDI-1097) Integration test for prestosql queries

2020-07-14 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1097: --- Summary: Integration test for prestosql queries Key: HUDI-1097 URL: https://issues.apache.org/jira/browse/HUDI-1097 Project: Apache Hudi Issue Type: Sub-task

[jira] [Updated] (HUDI-1094) Docker demo integration of Prestosql queries

2020-07-14 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1094: Status: Open (was: New) > Docker demo integration of Prestosql queries >

[jira] [Updated] (HUDI-1095) Add documentation for prestosql support

2020-07-14 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1095: Component/s: Presto Integration > Add documentation for prestosql support >

[jira] [Updated] (HUDI-1093) Add support for COW tables from Prestosql

2020-07-14 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1093: Status: Open (was: New) > Add support for COW tables from Prestosql >

[jira] [Updated] (HUDI-1096) MOR queries support from Prestosql

2020-07-14 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1096: Status: Open (was: New) > MOR queries support from Prestosql > --

[jira] [Updated] (HUDI-1095) Add documentation for prestosql support

2020-07-14 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1095: Status: Open (was: New) > Add documentation for prestosql support >

[jira] [Updated] (HUDI-1094) Docker demo integration of Prestosql queries

2020-07-14 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1094: Component/s: Presto Integration > Docker demo integration of Prestosql queries >

[jira] [Created] (HUDI-1096) MOR queries support from Prestosql

2020-07-14 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1096: --- Summary: MOR queries support from Prestosql Key: HUDI-1096 URL: https://issues.apache.org/jira/browse/HUDI-1096 Project: Apache Hudi Issue Type: Sub-task

[jira] [Created] (HUDI-1095) Add documentation for prestosql support

2020-07-14 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1095: --- Summary: Add documentation for prestosql support Key: HUDI-1095 URL: https://issues.apache.org/jira/browse/HUDI-1095 Project: Apache Hudi Issue Type: Sub-task

[jira] [Created] (HUDI-1094) Docker demo integration of Prestosql queries

2020-07-14 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1094: --- Summary: Docker demo integration of Prestosql queries Key: HUDI-1094 URL: https://issues.apache.org/jira/browse/HUDI-1094 Project: Apache Hudi Issue Type:

[jira] [Created] (HUDI-1093) Add support for COW tables from Prestosql

2020-07-14 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1093: --- Summary: Add support for COW tables from Prestosql Key: HUDI-1093 URL: https://issues.apache.org/jira/browse/HUDI-1093 Project: Apache Hudi Issue Type:

[jira] [Updated] (HUDI-1092) Hudi support from prestosql

2020-07-14 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1092: Status: Open (was: New) > Hudi support from prestosql > --- > >

[jira] [Created] (HUDI-1092) Hudi support from prestosql

2020-07-14 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1092: --- Summary: Hudi support from prestosql Key: HUDI-1092 URL: https://issues.apache.org/jira/browse/HUDI-1092 Project: Apache Hudi Issue Type: Improvement

[jira] [Updated] (HUDI-1092) Hudi support from prestosql

2020-07-14 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1092: Issue Type: New Feature (was: Improvement) > Hudi support from prestosql >

[GitHub] [hudi] Mathieu1124 commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-14 Thread GitBox
Mathieu1124 commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-658535590 > > Hi, @vinothchandar @yanghua @leesf as the refactor is finished, I have filed a Jira ticket to track this work, > > please review the refactor work on this pr :) >

[GitHub] [hudi] Mathieu1124 commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-14 Thread GitBox
Mathieu1124 commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-658535113 > @leesf @Mathieu1124 @lw309637554 so this replaces #1727 right? yes, https://github.com/apache/hudi/pull/1727 can be closed now

[GitHub] [hudi] vinothchandar commented on pull request #1831: [HUDI-1087] Handle decimal type for realtime record reader with SparkSQL

2020-07-14 Thread GitBox
vinothchandar commented on pull request #1831: URL: https://github.com/apache/hudi/pull/1831#issuecomment-658532736 actually nvm.. its a small PR.. LGTM.. This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] vinothchandar commented on pull request #1831: [HUDI-1087] Handle decimal type for realtime record reader with SparkSQL

2020-07-14 Thread GitBox
vinothchandar commented on pull request #1831: URL: https://github.com/apache/hudi/pull/1831#issuecomment-658532404 cc @umehrot2 want to review this one? This is an automated message from the Apache Git Service. To respond

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #339

2020-07-14 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.33 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[GitHub] [hudi] garyli1019 commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-14 Thread GitBox
garyli1019 commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r454762535 ## File path: hudi-spark/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieParquetRealtimeFileFormat.scala ## @@ -0,0 +1,188 @@

[GitHub] [hudi] bvaradar commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-14 Thread GitBox
bvaradar commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-658506263 @garyli1019 @vinothchandar : Yes, I am planning to address the bootstrap PR comments and also give review comments for @umehrot2 changes by this weekend. @umehrot2 : I know

[GitHub] [hudi] vinothchandar commented on pull request #1824: [HUDI-996] Add functional test suite in hudi-client

2020-07-14 Thread GitBox
vinothchandar commented on pull request #1824: URL: https://github.com/apache/hudi/pull/1824#issuecomment-658506081 No all good.. was waitng for you actually :) This is an automated message from the Apache Git Service. To

[GitHub] [hudi] vinothchandar commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-14 Thread GitBox
vinothchandar commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-658505880 Good to get @n3nash 's review here as well to make sure we are not breaking anything for the RDD client users..

[GitHub] [hudi] vinothchandar commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-14 Thread GitBox
vinothchandar commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-658505687 @leesf @Mathieu1124 @lw309637554 so this replaces #1727 right? This is an automated message from the Apache

[GitHub] [hudi] xushiyan commented on pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-14 Thread GitBox
xushiyan commented on pull request #1819: URL: https://github.com/apache/hudi/pull/1819#issuecomment-658505404 @shenh062326 @nsivabalan got it. yup, making it through the constructor looks good. thanks for clarifying. This

[GitHub] [hudi] vinothchandar commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-14 Thread GitBox
vinothchandar commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r454749227 ## File path: hudi-spark/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieParquetRealtimeFileFormat.scala ## @@ -0,0 +1,188 @@

[jira] [Commented] (HUDI-1090) presto读取hudi数据错位

2020-07-14 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157818#comment-17157818 ] Hong Shen commented on HUDI-1090: - [~710514878] Please describe the issue in English. > presto读取hudi数据错位

[GitHub] [hudi] nsivabalan commented on pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-14 Thread GitBox
nsivabalan commented on pull request #1819: URL: https://github.com/apache/hudi/pull/1819#issuecomment-658502484 @xushiyan : nope. As I have mentioned above, we need it in getInsertValue() as well which is called from lot of classes. Hence I suggested to add it as part of constructor.

[GitHub] [hudi] vinothchandar commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-14 Thread GitBox
vinothchandar commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-658501011 I am fine with doing that.. not sure if thats more work for @umehrot2 .. wdyt ? @bvaradar in general, can we get more of the bootstrap landed and work on the follow

[GitHub] [hudi] shenh062326 commented on pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-14 Thread GitBox
shenh062326 commented on pull request #1819: URL: https://github.com/apache/hudi/pull/1819#issuecomment-658495234 @xushiyan It seems good for OverwriteWithLatestAvroPayload. But for AWSDmsAvroPayload, users need to define not only the deletion column, but also the processing method of

[GitHub] [hudi] leesf commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-14 Thread GitBox
leesf commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-658490359 > Hi, @vinothchandar @yanghua @leesf as the refactor is finished, I have filed a Jira ticket to track this work, > please review the refactor work on this pr :) ack.

[jira] [Comment Edited] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157341#comment-17157341 ] Hong Shen edited comment on HUDI-1082 at 7/15/20, 1:12 AM: --- It seems does not

[jira] [Comment Edited] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157341#comment-17157341 ] Hong Shen edited comment on HUDI-1082 at 7/15/20, 1:08 AM: --- It seems does not

[hudi] branch master updated: [HUDI-996] Add functional test in hudi-client (#1824)

2020-07-14 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b399b4a [HUDI-996] Add functional test in

[GitHub] [hudi] yanghua merged pull request #1824: [HUDI-996] Add functional test suite in hudi-client

2020-07-14 Thread GitBox
yanghua merged pull request #1824: URL: https://github.com/apache/hudi/pull/1824 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] garyli1019 commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-14 Thread GitBox
garyli1019 commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r454694800 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -58,26 +60,28 @@ class DefaultSource extends RelationProvider

[GitHub] [hudi] n3nash commented on pull request #1704: [HUDI-115] Enhance OverwriteWithLatestAvroPayload to also respect ordering value of record in storage

2020-07-14 Thread GitBox
n3nash commented on pull request #1704: URL: https://github.com/apache/hudi/pull/1704#issuecomment-658453437 @bhasudha let me know if you need any clarification This is an automated message from the Apache Git Service. To

[GitHub] [hudi] vinothchandar commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-14 Thread GitBox
vinothchandar commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r454670467 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -58,26 +60,28 @@ class DefaultSource extends RelationProvider

[GitHub] [hudi] vinothchandar commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-07-14 Thread GitBox
vinothchandar commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r454670023 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -65,7 +66,7 @@ object DataSourceReadOptions { * This eases

[GitHub] [hudi] michetti edited a comment on issue #1789: [SUPPORT] What jars are needed to run on AWS Glue 1.0 ?

2020-07-14 Thread GitBox
michetti edited a comment on issue #1789: URL: https://github.com/apache/hudi/issues/1789#issuecomment-658432371 Thanks for clarifying @umehrot2. I actually just found out that shading org.eclipse.jetty was recently merged in Hudi, so we should be good without changes from the next

[GitHub] [hudi] michetti commented on issue #1789: [SUPPORT] What jars are needed to run on AWS Glue 1.0 ?

2020-07-14 Thread GitBox
michetti commented on issue #1789: URL: https://github.com/apache/hudi/issues/1789#issuecomment-658432371 Thanks for clarifying @umehrot2. I actually just found out that shading org.eclipse.jetty was actually recently merged in Hudi, so we should be good without changes from the next

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-14 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-658422907 @bvaradar you are right we are looking for clustering. Do you have anytime line in mind when this will be available or any branch to look at.

[GitHub] [hudi] michetti edited a comment on issue #1789: [SUPPORT] What jars are needed to run on AWS Glue 1.0 ?

2020-07-14 Thread GitBox
michetti edited a comment on issue #1789: URL: https://github.com/apache/hudi/issues/1789#issuecomment-658373503 Hey @GrigorievNick, I saw the issue was closed but if I understood correctly, the link you posted is about AWS Athena and how it can work with Hudi tables registered in the AWS

[GitHub] [hudi] umehrot2 commented on issue #1789: [SUPPORT] What jars are needed to run on AWS Glue 1.0 ?

2020-07-14 Thread GitBox
umehrot2 commented on issue #1789: URL: https://github.com/apache/hudi/issues/1789#issuecomment-658387970 @michetti Thanks for the pointers here. You are right that the link posted is from AWS Athena, whereas what you folks are trying is run Hudi on AWS Glue jobs. Since AWS Glue does not

[GitHub] [hudi] umehrot2 commented on issue #1798: Question reading partition path with less level is more faster than what document mentioned

2020-07-14 Thread GitBox
umehrot2 commented on issue #1798: URL: https://github.com/apache/hudi/issues/1798#issuecomment-658385297 Like @bvaradar mentioned, in the first query the glob pattern matches with 950 folders which are then parallely listed across the cluster using spark context. In the second query the

[GitHub] [hudi] michetti commented on issue #1789: [SUPPORT] What jars are needed to run on AWS Glue 1.0 ?

2020-07-14 Thread GitBox
michetti commented on issue #1789: URL: https://github.com/apache/hudi/issues/1789#issuecomment-658373503 Hey @GrigorievNick, I saw the issue was closed but if I understood correctly, the link you posted is about AWS Athena and how it can work with Hudi tables registered in the AWS Glue

[GitHub] [hudi] srsteinmetz edited a comment on issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

2020-07-14 Thread GitBox
srsteinmetz edited a comment on issue #1830: URL: https://github.com/apache/hudi/issues/1830#issuecomment-658363753 Accidentally posted early. Closed and reopened after editing post. This is an automated message from the

[GitHub] [hudi] srsteinmetz commented on issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

2020-07-14 Thread GitBox
srsteinmetz commented on issue #1830: URL: https://github.com/apache/hudi/issues/1830#issuecomment-658363753 Accidentally posted This is an automated message from the Apache Git Service. To respond to the message, please

[jira] [Updated] (HUDI-1087) Realtime Record Reader needs to handle decimal types

2020-07-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1087: - Labels: pull-request-available (was: ) > Realtime Record Reader needs to handle decimal types >

[GitHub] [hudi] zhedoubushishi opened a new pull request #1831: [HUDI-1087] Handle decimal type for realtime record reader with SparkSQL

2020-07-14 Thread GitBox
zhedoubushishi opened a new pull request #1831: URL: https://github.com/apache/hudi/pull/1831 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] srsteinmetz closed issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

2020-07-14 Thread GitBox
srsteinmetz closed issue #1830: URL: https://github.com/apache/hudi/issues/1830 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] srsteinmetz opened a new issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

2020-07-14 Thread GitBox
srsteinmetz opened a new issue #1830: URL: https://github.com/apache/hudi/issues/1830 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster

[GitHub] [hudi] prashantwason commented on pull request #1804: [HUDI-960] Implementation of the HFile base and log file format.

2020-07-14 Thread GitBox
prashantwason commented on pull request #1804: URL: https://github.com/apache/hudi/pull/1804#issuecomment-658325538 @vinothchandar Am not currently blocked on this. Take a fine-combed look so we can make this a model for adding new file formats to HUDI. Please suggest better

[GitHub] [hudi] zuyanton opened a new issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2020-07-14 Thread GitBox
zuyanton opened a new issue #1829: URL: https://github.com/apache/hudi/issues/1829 Hudi MoR reading performance gets slower on tables with many (1000+) partitions stored in S3. When running simple ```spark.sql("select * from table_ro).count``` command, we observe in spark UI that first

[GitHub] [hudi] bhasudha commented on issue #1828: [SUPPORT] Cannot force hudi to retain only last commit

2020-07-14 Thread GitBox
bhasudha commented on issue #1828: URL: https://github.com/apache/hudi/issues/1828#issuecomment-658282320 > Hi Guys, > > Is it possible to retain only last commit? When I put 'hoodie.cleaner.commits.retained': 1 in hudi_options I still have two last commits. One that is being

[GitHub] [hudi] kirkuz opened a new issue #1828: [SUPPORT] Cannot force hudi to retain only last commit

2020-07-14 Thread GitBox
kirkuz opened a new issue #1828: URL: https://github.com/apache/hudi/issues/1828 Hi Guys, Is it possible to retain only last commit? When I put 'hoodie.cleaner.commits.retained': 1 in hudi_options I still have two last commits. One that is being written and the previous one. What

[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

2020-07-14 Thread GitBox
bvaradar commented on issue #1787: URL: https://github.com/apache/hudi/issues/1787#issuecomment-658248015 @asheeshgarg : If the table is represented as simple parquet table, presto queries will start showing duplicates when there are multiple file versions present or could fail when

[jira] [Updated] (HUDI-1087) Realtime Record Reader needs to handle decimal types

2020-07-14 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1087: - Priority: Blocker (was: Major) > Realtime Record Reader needs to handle decimal types >

[GitHub] [hudi] bvaradar closed issue #1789: [SUPPORT] What jars are needed to run on AWS Glue 1.0 ?

2020-07-14 Thread GitBox
bvaradar closed issue #1789: URL: https://github.com/apache/hudi/issues/1789 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1790: [SUPPORT] Querying MoR tables with DecimalType columns via Spark SQL fails

2020-07-14 Thread GitBox
bvaradar commented on issue #1790: URL: https://github.com/apache/hudi/issues/1790#issuecomment-658245237 Closing this ticket as we have a jira to track this issue. This is an automated message from the Apache Git Service.

[GitHub] [hudi] bvaradar commented on issue #1798: Question reading partition path with less level is more faster than what document mentioned

2020-07-14 Thread GitBox
bvaradar commented on issue #1798: URL: https://github.com/apache/hudi/issues/1798#issuecomment-658244220 The related code (HoodieROTablePathFilter) does not seem to have any relevant recent changes. @zherenyu831 From the control flow, since Spark deciphers the glob-path, it is

[GitHub] [hudi] bvaradar closed issue #1806: [SUPPORT] Deltastreamer can`t validate rewritten record that is valid

2020-07-14 Thread GitBox
bvaradar closed issue #1806: URL: https://github.com/apache/hudi/issues/1806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1813: ERROR HoodieDeltaStreamer: Got error running delta sync once.

2020-07-14 Thread GitBox
bvaradar commented on issue #1813: URL: https://github.com/apache/hudi/issues/1813#issuecomment-658237088 @jcunhafonte : This could happen when there are no more files to be ingested when running in non-continuous mode. I have opened a jira to get it fixed in 0.6.0 :

[jira] [Assigned] (HUDI-1091) Handle empty input batch gracefully in ParquetDFSSource

2020-07-14 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-1091: Assignee: Balaji Varadarajan > Handle empty input batch gracefully in

[jira] [Updated] (HUDI-1091) Handle empty input batch gracefully in ParquetDFSSource

2020-07-14 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1091: - Status: Open (was: New) > Handle empty input batch gracefully in ParquetDFSSource >

[jira] [Created] (HUDI-1091) Handle empty input batch gracefully in ParquetDFSSource

2020-07-14 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1091: Summary: Handle empty input batch gracefully in ParquetDFSSource Key: HUDI-1091 URL: https://issues.apache.org/jira/browse/HUDI-1091 Project: Apache Hudi

[GitHub] [hudi] bvaradar commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-14 Thread GitBox
bvaradar commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-658221173 @asheeshgarg : I think what you are looking for is clustering (not compaction) of files which is under development (Please see

[GitHub] [hudi] aditanase commented on pull request #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-07-14 Thread GitBox
aditanase commented on pull request #1406: URL: https://github.com/apache/hudi/pull/1406#issuecomment-658221068 @umehrot2 agreed and it's a valid point for when you control the schema, which in this case is defined by a customer upstream from us.

[GitHub] [hudi] tooptoop4 commented on issue #506: explodeRecordRDDWithFileComparisons is costly with HoodieBloomIndex/range pruning=on

2020-07-14 Thread GitBox
tooptoop4 commented on issue #506: URL: https://github.com/apache/hudi/issues/506#issuecomment-658220450 I seem to have similar issue, running upsert of 700MB csv (twice, ie repeat the same csv upsert next day) with 16gb executor memory and shuffle parallelism of 16

[jira] [Commented] (HUDI-1079) Cannot upsert on schema with Array of Record with single field

2020-07-14 Thread Adrian Tanase (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157429#comment-17157429 ] Adrian Tanase commented on HUDI-1079: - [~vinoth] - thanks for the pointer, I'll take a look around the

[GitHub] [hudi] xushiyan commented on pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-14 Thread GitBox
xushiyan commented on pull request #1819: URL: https://github.com/apache/hudi/pull/1819#issuecomment-658217251 @nsivabalan actually what i commented here is the 3rd option > @shenh062326 Maybe we shouldn't do the check in the payload class itself. Maybe

[jira] [Comment Edited] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157418#comment-17157418 ] sivabalan narayanan edited comment on HUDI-1082 at 7/14/20, 2:26 PM: -

[jira] [Commented] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157418#comment-17157418 ] sivabalan narayanan commented on HUDI-1082: --- I ran some simulations. By and large, distribution

[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

2020-07-14 Thread GitBox
asheeshgarg commented on issue #1787: URL: https://github.com/apache/hudi/issues/1787#issuecomment-658191364 @bhasudha it work with Presto and I am able to query data fine and data seems to be correct based on my queries. The only concern I have is it missing anything that might hit in

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-14 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-658188686 @bvaradar Balaji please let me know if I need to assign additional properties to achieve the behavior. This is an

[jira] [Comment Edited] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157346#comment-17157346 ] sivabalan narayanan edited comment on HUDI-1082 at 7/14/20, 1:21 PM: -

[jira] [Commented] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157346#comment-17157346 ] sivabalan narayanan commented on HUDI-1082: --- Guess you are missing out partially filled buckets.

[jira] [Comment Edited] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157341#comment-17157341 ] Hong Shen edited comment on HUDI-1082 at 7/14/20, 12:53 PM: It seems does not

[jira] [Comment Edited] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157341#comment-17157341 ] Hong Shen edited comment on HUDI-1082 at 7/14/20, 12:52 PM: It seems does not

[jira] [Commented] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157341#comment-17157341 ] Hong Shen commented on HUDI-1082: - It seems does not matter, we just need to ensure the distribution of

[GitHub] [hudi] Mathieu1124 edited a comment on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-14 Thread GitBox
Mathieu1124 edited a comment on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-658151869 Hi, @vinothchandar @yanghua @leesf as the refactor is finished, I have filed a Jira ticket to track this work, please review the refactor work on this pr :)

[GitHub] [hudi] Mathieu1124 commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-14 Thread GitBox
Mathieu1124 commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-658151869 Hi, @vinothchandar @yanghua @leesf as the refactor is finished, I have filed a Jira ticket to track this work, please review this on this pr :)

[GitHub] [hudi] Mathieu1124 commented on pull request #1727: [Review] refactor hudi-client

2020-07-14 Thread GitBox
Mathieu1124 commented on pull request #1727: URL: https://github.com/apache/hudi/pull/1727#issuecomment-658150685 refactor is finished, review goes to https://github.com/apache/hudi/pull/1827 closing this This is an

[GitHub] [hudi] Mathieu1124 edited a comment on pull request #1727: [Review] refactor hudi-client

2020-07-14 Thread GitBox
Mathieu1124 edited a comment on pull request #1727: URL: https://github.com/apache/hudi/pull/1727#issuecomment-658150685 refactor is finished, review goes to https://github.com/apache/hudi/pull/1827 This is an automated

[GitHub] [hudi] Mathieu1124 opened a new pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-14 Thread GitBox
Mathieu1124 opened a new pull request #1827: URL: https://github.com/apache/hudi/pull/1827 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[jira] [Updated] (HUDI-1089) Refactor hudi-client to support multi-engine

2020-07-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1089: - Labels: pull-request-available (was: ) > Refactor hudi-client to support multi-engine >

[GitHub] [hudi] nsivabalan commented on pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-14 Thread GitBox
nsivabalan commented on pull request #1819: URL: https://github.com/apache/hudi/pull/1819#issuecomment-658130555 Looks like there are 2 options here. Option1: Change interface for combineAndGetUpdateValue and getInsertValue to take in delete field. check

[GitHub] [hudi] yanghua commented on pull request #1824: [HUDI-996] Add functional test suite in hudi-client

2020-07-14 Thread GitBox
yanghua commented on pull request #1824: URL: https://github.com/apache/hudi/pull/1824#issuecomment-658128510 @vinothchandar Do you still have any concerns? This is an automated message from the Apache Git Service. To

[jira] [Commented] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-14 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157284#comment-17157284 ] Hong Shen commented on HUDI-1082: - OK, I will add testcase and fix it. > Bug in deciding the

[jira] [Assigned] (HUDI-1067) Replace the integer version field with HoodieLogBlockVersion data structure

2020-07-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang reassigned HUDI-1067: - Assignee: Trevorzhang > Replace the integer version field with HoodieLogBlockVersion data

[jira] [Created] (HUDI-1090) presto读取hudi数据错位

2020-07-14 Thread obar (Jira)
obar created HUDI-1090: -- Summary: presto读取hudi数据错位 Key: HUDI-1090 URL: https://issues.apache.org/jira/browse/HUDI-1090 Project: Apache Hudi Issue Type: Bug Reporter: obar

[jira] [Updated] (HUDI-909) Introduce hudi-client-flink module to support flink engine

2020-07-14 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-909: - Description: Introduce hudi-client-flink module to support flink engine based on new abstraction (was:

[jira] [Updated] (HUDI-909) Introduce hudi-client-flink module to support flink engine

2020-07-14 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-909: - Status: In Progress (was: Open) > Introduce hudi-client-flink module to support flink engine >

[jira] [Updated] (HUDI-909) Introduce hudi-client-flink module to support flink engine

2020-07-14 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-909: - Fix Version/s: 0.6.0 Issue Type: Task (was: Wish) > Introduce hudi-client-flink module to support

[jira] [Updated] (HUDI-909) Introduce hudi-client-flink module to support flink engine

2020-07-14 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-909: - Status: Open (was: New) > Introduce hudi-client-flink module to support flink engine >

[jira] [Updated] (HUDI-909) Introduce hudi-client-flink module to support flink engine

2020-07-14 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-909: - Description: Introduce hudi-client-flink module to support flink engine based on new abstraction

[jira] [Updated] (HUDI-909) Introduce hudi-client-flink module to support flink engine

2020-07-14 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-909: - Summary: Introduce hudi-client-flink module to support flink engine (was: Introduce high level

[jira] [Commented] (HUDI-1088) hive version 1.1.0 integrated with hudi,select * from hudi_table error in HUE

2020-07-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157246#comment-17157246 ] Trevorzhang commented on HUDI-1088: --- hi [~hainanzhongjian],Hue does not support many views of HUDI, this

[jira] [Updated] (HUDI-1089) Refactor hudi-client to support multi-engine

2020-07-14 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1089: -- Status: Open (was: New) > Refactor hudi-client to support multi-engine >

  1   2   >