[GitHub] [hudi] bvaradar commented on issue #2075: [SUPPORT] hoodie.datasource.write.precombine.field not working as expected
bvaradar commented on issue #2075: URL: https://github.com/apache/hudi/issues/2075#issuecomment-690024060 @rajgowtham24 : This is a known in 0.5.x and was fixed in 0.6.0 version This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2068: [SUPPORT]Deltastreamer Upsert Very Slow / Never Completes After Initial Data Load
bvaradar commented on issue #2068: URL: https://github.com/apache/hudi/issues/2068#issuecomment-690012667 @bradleyhurley : The errors are due to shuffle fetch failures. Increasing executor memory and resources in general helps. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-1255) Combine and get updateValue in multiFields
[ https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang resolved HUDI-1255. - Resolution: Fixed > Combine and get updateValue in multiFields > -- > > Key: HUDI-1255 > URL: https://issues.apache.org/jira/browse/HUDI-1255 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: karl wang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > update current value for several fields that you want to change. > The default payload OverwriteWithLatestAvroPayload overwrite the whole record > when > compare to orderingVal.This doesn't meet our need when we just want to change > specified fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1255) Combine and get updateValue in multiFields
[ https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated HUDI-1255: Fix Version/s: 0.6.1 > Combine and get updateValue in multiFields > -- > > Key: HUDI-1255 > URL: https://issues.apache.org/jira/browse/HUDI-1255 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: karl wang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0, 0.6.1 > > > update current value for several fields that you want to change. > The default payload OverwriteWithLatestAvroPayload overwrite the whole record > when > compare to orderingVal.This doesn't meet our need when we just want to change > specified fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1255) Combine and get updateValue in multiFields
[ https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated HUDI-1255: Fix Version/s: (was: 0.6.0) > Combine and get updateValue in multiFields > -- > > Key: HUDI-1255 > URL: https://issues.apache.org/jira/browse/HUDI-1255 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: karl wang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > update current value for several fields that you want to change. > The default payload OverwriteWithLatestAvroPayload overwrite the whole record > when > compare to orderingVal.This doesn't meet our need when we just want to change > specified fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1255) Combine and get updateValue in multiFields
[ https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated HUDI-1255: Fix Version/s: (was: 0.6.1) > Combine and get updateValue in multiFields > -- > > Key: HUDI-1255 > URL: https://issues.apache.org/jira/browse/HUDI-1255 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: karl wang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > update current value for several fields that you want to change. > The default payload OverwriteWithLatestAvroPayload overwrite the whole record > when > compare to orderingVal.This doesn't meet our need when we just want to change > specified fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1255) Combine and get updateValue in multiFields
[ https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated HUDI-1255: Fix Version/s: 0.6.0 0.6.1 > Combine and get updateValue in multiFields > -- > > Key: HUDI-1255 > URL: https://issues.apache.org/jira/browse/HUDI-1255 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: karl wang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0, 0.6.1 > > > update current value for several fields that you want to change. > The default payload OverwriteWithLatestAvroPayload overwrite the whole record > when > compare to orderingVal.This doesn't meet our need when we just want to change > specified fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated (063a98f -> a1cff8a)
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 063a98f [HUDI-1254] TypedProperties can not get values by initializing an existing properties (#2059) add a1cff8a [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage (#2056) No new revisions were added by this update. Summary of changes: .../OverwriteNonDefaultsWithLatestAvroPayload.java | 72 ++ .../model/OverwriteWithLatestAvroPayload.java | 9 ++- ...OverwriteNonDefaultsWithLatestAvroPayload.java} | 54 ++-- 3 files changed, 116 insertions(+), 19 deletions(-) create mode 100644 hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java copy hudi-common/src/test/java/org/apache/hudi/common/model/{TestOverwriteWithLatestAvroPayload.java => TestOverwriteNonDefaultsWithLatestAvroPayload.java} (63%)
[GitHub] [hudi] vinothchandar merged pull request #2056: [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage
vinothchandar merged pull request #2056: URL: https://github.com/apache/hudi/pull/2056 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar closed issue #2076: [SUPPORT] load data partition wise
bvaradar closed issue #2076: URL: https://github.com/apache/hudi/issues/2076 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #1484: [HUDI-316] : Hbase qps repartition writestatus
n3nash commented on pull request #1484: URL: https://github.com/apache/hudi/pull/1484#issuecomment-689961998 Yes, I will do by Friday. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1058) Make delete marker configurable
[ https://issues.apache.org/jira/browse/HUDI-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193288#comment-17193288 ] shenh062326 commented on HUDI-1058: --- [~rxu] sorry for late. I am waiting for https://github.com/apache/hudi/pull/1704 to merge, because HUDI-1058 also needs similar modifications. It is better to work on HUDI-1058 after this merge request is merged, but if [this MR|https://github.com/apache/hudi/pull/1704] has not progressed, I can resolve HUDI-1058 first. > Make delete marker configurable > --- > > Key: HUDI-1058 > URL: https://issues.apache.org/jira/browse/HUDI-1058 > Project: Apache Hudi > Issue Type: Improvement > Components: Usability >Reporter: Raymond Xu >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > > users can specify any boolean field for delete marker and > `_hoodie_is_deleted` remains as default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] shenh062326 commented on pull request #1704: [HUDI-115] Enhance OverwriteWithLatestAvroPayload to also respect ordering value of record in storage
shenh062326 commented on pull request #1704: URL: https://github.com/apache/hudi/pull/1704#issuecomment-689917712 I am working on https://issues.apache.org/jira/browse/HUDI-1058 , it also needs similar modifications. It is better to work on HUDI-1058 after this merge request is merged, but if this MR has not progressed, I can resolve HUDI-1058 first. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine
wangxianghu commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-689916910 > @wangxianghu the issue with the tests is that, now most of the tests are moved to hudi-spark-client. previously we had split tests into hudi-client and others. We need to edit `travis.yml` to adjust the splits again @vinothchandar could you please help me edit travis.yml to adjust the splits .. I am not familiar with that thanks :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #2058: [HUDI-1259] Cache some framework binaries to speed up the progress of building docker image in local env
yanghua commented on pull request #2058: URL: https://github.com/apache/hudi/pull/2058#issuecomment-689916526 > If you are referring to hudi, we don't have to rebuild docker images to pick up latest hudi code. Yes > The hudi codebase is mounted inside docker containers so that you can use the latest version. You mean if I change the code it would reflect into the hudi on docker immediately? Where can I know the configuration of this mechanism in the project? Sorry, I am not familiar with Docker. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table
vinothchandar commented on pull request #1929: URL: https://github.com/apache/hudi/pull/1929#issuecomment-689905849 @satishkotha Can you please help review? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar opened a new pull request #2082: hudi cluster write path poc
vinothchandar opened a new pull request #2082: URL: https://github.com/apache/hudi/pull/2082 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2056: [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage
vinothchandar commented on pull request #2056: URL: https://github.com/apache/hudi/pull/2056#issuecomment-689884262 sorry, long weekend here in the states. will take a look today This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1270) NoSuchMethod PartitionedFile on AWS EMR Spark 2.4.5
[ https://issues.apache.org/jira/browse/HUDI-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193245#comment-17193245 ] Vinoth Chandar commented on HUDI-1270: -- Not sure if we can do much about this in Hudi itself. may be leave it to aws folks? cc [~uditme] > NoSuchMethod PartitionedFile on AWS EMR Spark 2.4.5 > --- > > Key: HUDI-1270 > URL: https://issues.apache.org/jira/browse/HUDI-1270 > Project: Apache Hudi > Issue Type: Bug >Reporter: Gary Li >Priority: Major > > There are some AWS EMR users reporting: > java.lang.NoSuchMethodError: > org.apache.spark.sql.execution.datasources.PartitionedFile. > on EMR (Spark-2.4.5-amzn-0) when using the Spark Datasource to query MOR > table. > [https://github.com/apache/hudi/pull/1848#issuecomment-687392285] > [https://github.com/apache/hudi/issues/2057#issuecomment-685015564] > [~uditme] [~vbalaji] would you guys able to help? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 8afed69 Travis CI build asf-site 8afed69 is described below commit 8afed6902e3d955f9f3977a03f8eb8845198ae78 Author: CI AuthorDate: Wed Sep 9 21:54:43 2020 + Travis CI build asf-site --- content/docs/docker_demo.html | 1 - 1 file changed, 1 deletion(-) diff --git a/content/docs/docker_demo.html b/content/docs/docker_demo.html index 6f23ab8..0d36fae 100644 --- a/content/docs/docker_demo.html +++ b/content/docs/docker_demo.html @@ -484,7 +484,6 @@ This should pull the docker images from docker hub and setup docker cluster. Creating spark-worker-1... done Copying spark default config and setting up configs Copying spark default config and setting up configs -Copying spark default config and setting up configs $ docker ps
[hudi] branch asf-site updated: [MINOR]: removed redundant line from docker-demo page (#2081)
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new ca2f9a8 [MINOR]: removed redundant line from docker-demo page (#2081) ca2f9a8 is described below commit ca2f9a8945afa8bcbdda3f18bee89fcfb65dbf9b Author: Pratyaksh Sharma AuthorDate: Thu Sep 10 03:22:35 2020 +0530 [MINOR]: removed redundant line from docker-demo page (#2081) --- docs/_docs/0_4_docker_demo.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/_docs/0_4_docker_demo.md b/docs/_docs/0_4_docker_demo.md index 4193fd1..22efbe9 100644 --- a/docs/_docs/0_4_docker_demo.md +++ b/docs/_docs/0_4_docker_demo.md @@ -85,7 +85,6 @@ Creating adhoc-2 ... done Creating spark-worker-1... done Copying spark default config and setting up configs Copying spark default config and setting up configs -Copying spark default config and setting up configs $ docker ps ```
[GitHub] [hudi] vinothchandar merged pull request #2081: [MINOR]: removed redundant line from docker-demo page
vinothchandar merged pull request #2081: URL: https://github.com/apache/hudi/pull/2081 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-338) Reduce Hoodie commit/instant time granularity to millis from secs
[ https://issues.apache.org/jira/browse/HUDI-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratyaksh Sharma reassigned HUDI-338: - Assignee: Pratyaksh Sharma (was: Nishith Agarwal) > Reduce Hoodie commit/instant time granularity to millis from secs > - > > Key: HUDI-338 > URL: https://issues.apache.org/jira/browse/HUDI-338 > Project: Apache Hudi > Issue Type: Task > Components: Common Core >Reporter: Nishith Agarwal >Assignee: Pratyaksh Sharma >Priority: Major > Fix For: 0.6.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable
pratyakshsharma commented on a change in pull request #1968: URL: https://github.com/apache/hudi/pull/1968#discussion_r485921982 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -71,6 +71,9 @@ @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url") public Boolean useJdbc = true; + @Parameter(names = {"--enable-create-database"}, description = "Enable create hive database") + public Boolean enableCreateDatabase = false; Review comment: https://lists.apache.org/thread.html/e1b7f97c774e1d7d7fc54fbb46db49aaf2e217303a50d9885150242d%40%3Cdev.hudi.apache.org%3E - this is what I am referring to. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action
vinothchandar commented on pull request #2048: URL: https://github.com/apache/hudi/pull/2048#issuecomment-689812755 >You suggested to remove HoodieReplaceStat I think the suggestion was to simplify HoodieReplaceMetadata such that it only contains the extra information about replaced file groups. and use the HoodieCommitMetadata and its HoodieWriteStat for tracking the new file groups written. We could have HoodieReplaceStat to be part of the WriteStatus itself for tracking the additional information about replaced file groups? On cleaning vs archival, it would be good if we can implement this in cleaning. But can that be a follow-on item? Practically speaking, typical deployments don't configure cleaning that low. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable
pratyakshsharma commented on a change in pull request #1968: URL: https://github.com/apache/hudi/pull/1968#discussion_r485910367 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java ## @@ -117,11 +117,13 @@ private void syncHoodieTable(String tableName, boolean useRealtimeInputFormat) { boolean tableExists = hoodieHiveClient.doesTableExist(tableName); // check if the database exists else create it -try { - hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName); -} catch (Exception e) { - // this is harmless since table creation will fail anyways, creation of DB is needed for in-memory testing - LOG.warn("Unable to create database", e); +if (cfg.enableCreateDatabase) { + try { +hoodieHiveClient.updateHiveSQL("create database if not exists " + cfg.databaseName); Review comment: +1 on throwing the error. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable
pratyakshsharma commented on a change in pull request #1968: URL: https://github.com/apache/hudi/pull/1968#discussion_r485909624 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -71,6 +71,9 @@ @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url") public Boolean useJdbc = true; + @Parameter(names = {"--enable-create-database"}, description = "Enable create hive database") + public Boolean enableCreateDatabase = false; Review comment: @vinothchandar We do not let hudi create databases by default. So false seems to be ok :) @bvaradar to chime in here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1053) Make ComplexKeyGenerator also support non partitioned Hudi dataset
[ https://issues.apache.org/jira/browse/HUDI-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193170#comment-17193170 ] Pratyaksh Sharma commented on HUDI-1053: [~bhavanisudha] Is this not handled now with CustomKeyGenerator? If I am not missing anything here, I guess we can close this. > Make ComplexKeyGenerator also support non partitioned Hudi dataset > -- > > Key: HUDI-1053 > URL: https://issues.apache.org/jira/browse/HUDI-1053 > Project: Apache Hudi > Issue Type: Improvement > Components: Storage Management, Writer Core >Reporter: Bhavani Sudha >Assignee: Bhavani Sudha >Priority: Minor > Fix For: 0.6.1 > > > Currently When using ComplexKeyGenerator a `default` partition is assumed. > Recently there has been interest in supporting non partitioned Hudi datasets > that uses ComplexKeyGenerator. This GitHub issue has context - > https://github.com/apache/hudi/issues/1747 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] pratyakshsharma opened a new pull request #2081: [MINOR]: removed redundant line from docker-demo page
pratyakshsharma opened a new pull request #2081: URL: https://github.com/apache/hudi/pull/2081 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1990: [HUDI-1199]: relocated jetty in hudi-utilities-bundle pom
pratyakshsharma commented on a change in pull request #1990: URL: https://github.com/apache/hudi/pull/1990#discussion_r485892352 ## File path: packaging/hudi-utilities-bundle/pom.xml ## @@ -172,6 +172,10 @@ org.apache.htrace. org.apache.hudi.org.apache.htrace. + + org.eclipse.jetty. + org.apache.hudi.org.apache.jetty. Review comment: @vinothchandar Yes this is how it was done for spark-bundle as well. Let me re-trigger the build for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pratyakshsharma opened a new pull request #2080: [MINOR]: changed apache id for Pratyaksh
pratyakshsharma opened a new pull request #2080: URL: https://github.com/apache/hudi/pull/2080 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-57) [UMBRELLA] Support ORC Storage
[ https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193144#comment-17193144 ] Vinoth Chandar commented on HUDI-57: [~manijndl77] assigned it to you. there is a fair bit of prior work that attempted this. you can search PRs and RFCs, there is probably an easier way to do this now, given the base file format etc have been abstracted out nicely now > [UMBRELLA] Support ORC Storage > -- > > Key: HUDI-57 > URL: https://issues.apache.org/jira/browse/HUDI-57 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration, Writer Core >Reporter: Vinoth Chandar >Assignee: Mani Jindal >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > [https://github.com/uber/hudi/issues/68] > https://github.com/uber/hudi/issues/155 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-89) Clean up placement, naming, defaults of HoodieWriteConfig
[ https://issues.apache.org/jira/browse/HUDI-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193143#comment-17193143 ] Vinoth Chandar commented on HUDI-89: [~manijndl77] awesome. Not sure if the description matches what we have in mind atm though. [~shivnarayan] was thinking about this. Siva, can you please help mani ramp up on this JIRA? > Clean up placement, naming, defaults of HoodieWriteConfig > - > > Key: HUDI-89 > URL: https://issues.apache.org/jira/browse/HUDI-89 > Project: Apache Hudi > Issue Type: Improvement > Components: Code Cleanup, Usability, Writer Core >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > # Rename HoodieWriteConfig to HoodieClientConfig > # Move bunch of configs from CompactionConfig to StorageConfig > # Introduce new HoodieCleanConfig > # Should we consider lombok or something to automate the > defaults/getters/setters > # Consistent name of properties/defaults > # Enforce bounds more strictly -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-57) [UMBRELLA] Support ORC Storage
[ https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-57: -- Assignee: Mani Jindal (was: Vinoth Chandar) > [UMBRELLA] Support ORC Storage > -- > > Key: HUDI-57 > URL: https://issues.apache.org/jira/browse/HUDI-57 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration, Writer Core >Reporter: Vinoth Chandar >Assignee: Mani Jindal >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > [https://github.com/uber/hudi/issues/68] > https://github.com/uber/hudi/issues/155 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-89) Clean up placement, naming, defaults of HoodieWriteConfig
[ https://issues.apache.org/jira/browse/HUDI-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193135#comment-17193135 ] Mani Jindal commented on HUDI-89: - Hi [~vinoth] i am new to community can i pick this up? > Clean up placement, naming, defaults of HoodieWriteConfig > - > > Key: HUDI-89 > URL: https://issues.apache.org/jira/browse/HUDI-89 > Project: Apache Hudi > Issue Type: Improvement > Components: Code Cleanup, Usability, Writer Core >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > # Rename HoodieWriteConfig to HoodieClientConfig > # Move bunch of configs from CompactionConfig to StorageConfig > # Introduce new HoodieCleanConfig > # Should we consider lombok or something to automate the > defaults/getters/setters > # Consistent name of properties/defaults > # Enforce bounds more strictly -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-57) [UMBRELLA] Support ORC Storage
[ https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193130#comment-17193130 ] Mani Jindal commented on HUDI-57: - Hi [~vinoth] i am new to community would love to pick up any task here > [UMBRELLA] Support ORC Storage > -- > > Key: HUDI-57 > URL: https://issues.apache.org/jira/browse/HUDI-57 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration, Writer Core >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > [https://github.com/uber/hudi/issues/68] > https://github.com/uber/hudi/issues/155 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vinothchandar commented on pull request #1484: [HUDI-316] : Hbase qps repartition writestatus
vinothchandar commented on pull request #1484: URL: https://github.com/apache/hudi/pull/1484#issuecomment-689722005 will do. thanks ! @n3nash can take a pass as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1267) Additional Metadata Details for Hudi Transactions
[ https://issues.apache.org/jira/browse/HUDI-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193086#comment-17193086 ] Vinoth Chandar commented on HUDI-1267: -- ah got it. there was a proposal for a UI on top that reads across tables. this is worth discussing again on the mailing list. This was the rough approach. # We run a long running instance of TimelineServer and have all the writers to each table report commits/have the server pull and materialize the table metadata in local rocksDB # We can then build REST Layer on top of it and hook up a UI. > Additional Metadata Details for Hudi Transactions > - > > Key: HUDI-1267 > URL: https://issues.apache.org/jira/browse/HUDI-1267 > Project: Apache Hudi > Issue Type: Improvement > Components: Usability >Reporter: Ashish M G >Priority: Major > Labels: features > Fix For: 0.7.0 > > > Whenever following scenarios happen : > # Custom Datasource ( Kafka for instance ) -> Hudi Table > # Hudi -> Hudi Table > # s3 -> Hudi Table > Following metadata need to be captured : > # Table Level Metadata > * > ** Operation name ( record level ) like Upsert, Insert etc for last > operation performed on the row > # Transaction Level Metadata ( This will be logged on Hudi Level and not > Table Level ) > ** Source ( Kafka Topic Name / S3 url for source data in case of s3 etc ) > ** Target Hudi Table Name > ** Last transaction time ( last commit time ) > Basically , point (1) collects all details on table level and point (2) > collects all the transactions happened on Hudi Level > Point(1) would be just a column addition for operation type > Eg for Point (2) : Suppose we had an ingestion from Kafka topic 'A' to Hudi > table 'ingest_kafka' and another ingestion from RDBMS table ( 'tableA' ) > through Sqoop to Hudi Table 'RDBMSingest' then the metadata captured would be > : > > |Source|Timestamp|Transaction Type|Target| > |Kafka - 'A'|XX|UPSERT|ingest_kafka| > |RDBMS - 'tableA'|XX|INSERT|RDBMSingest| > > The Transaction Details Table in Point (2) should be available as a separate > common table which can be queried as Hudi Table or stored as parquet which > can be queried from Spark -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193006#comment-17193006 ] Felix Kizhakkel Jose edited comment on HUDI-83 at 9/9/20, 4:05 PM: --- [~uditme] we are using EMR 6.1.0, the very recent one. Where Spark 3.0, Hive 3.1.2. So could you please elaborate a little on your response - "Yes, in hive3 it is supported, and we can just replace timestamp column from long to timestamp.". What should be done and is it possible to make in my pyspark script or does this change should happen in HUDI code? was (Author: felixkjose): [~uditme] we are using EMR 6.1.0, the very recent one. Where Spark 3.0, Hive 3.1.2. So could you please elaborate a little on your response - "Yes, in hive3 it is supported, and we can just replace timestamp column from long to timestamp.". What should be done and is it possible to make in the pyspark script? > Map Timestamp type in spark to corresponding Timestamp type in Hive during > Hive sync > > > Key: HUDI-83 > URL: https://issues.apache.org/jira/browse/HUDI-83 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration, Usability >Reporter: Vinoth Chandar >Assignee: cdmikechen >Priority: Major > Labels: bug-bash-0.6.0 > Fix For: 0.6.1 > > > [https://github.com/apache/incubator-hudi/issues/543] &; related issues -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193006#comment-17193006 ] Felix Kizhakkel Jose commented on HUDI-83: -- [~uditme] we are using EMR 6.1.0, the very recent one. Where Spark 3.0, Hive 3.1.2. So could you please elaborate a little on your response - "Yes, in hive3 it is supported, and we can just replace timestamp column from long to timestamp.". What should be done and is it possible to make in the pyspark script? > Map Timestamp type in spark to corresponding Timestamp type in Hive during > Hive sync > > > Key: HUDI-83 > URL: https://issues.apache.org/jira/browse/HUDI-83 > Project: Apache Hudi > Issue Type: Bug > Components: Hive Integration, Usability >Reporter: Vinoth Chandar >Assignee: cdmikechen >Priority: Major > Labels: bug-bash-0.6.0 > Fix For: 0.6.1 > > > [https://github.com/apache/incubator-hudi/issues/543] &; related issues -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated: [HUDI-1254] TypedProperties can not get values by initializing an existing properties (#2059)
This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 063a98f [HUDI-1254] TypedProperties can not get values by initializing an existing properties (#2059) 063a98f is described below commit 063a98fc2b76beac28a4797884973abd2911c887 Author: linshan-ma AuthorDate: Wed Sep 9 23:42:41 2020 +0800 [HUDI-1254] TypedProperties can not get values by initializing an existing properties (#2059) --- .../apache/hudi/common/config/TypedProperties.java | 23 -- .../common/properties/TestTypedProperties.java | 84 ++ 2 files changed, 100 insertions(+), 7 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java b/hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java index 295598c..c780ded 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java @@ -22,6 +22,7 @@ import java.io.Serializable; import java.util.Arrays; import java.util.List; import java.util.Properties; +import java.util.Set; import java.util.stream.Collectors; /** @@ -38,22 +39,30 @@ public class TypedProperties extends Properties implements Serializable { } private void checkKey(String property) { -if (!containsKey(property)) { +if (!keyExists(property)) { throw new IllegalArgumentException("Property " + property + " not found"); } } + private boolean keyExists(String property) { +Set keys = super.stringPropertyNames(); +if (keys.contains(property)) { + return true; +} +return false; + } + public String getString(String property) { checkKey(property); return getProperty(property); } public String getString(String property, String defaultValue) { -return containsKey(property) ? getProperty(property) : defaultValue; +return keyExists(property) ? getProperty(property) : defaultValue; } public List getStringList(String property, String delimiter, List defaultVal) { -if (!containsKey(property)) { +if (!keyExists(property)) { return defaultVal; } return Arrays.stream(getProperty(property).split(delimiter)).map(String::trim).collect(Collectors.toList()); @@ -65,7 +74,7 @@ public class TypedProperties extends Properties implements Serializable { } public int getInteger(String property, int defaultValue) { -return containsKey(property) ? Integer.parseInt(getProperty(property)) : defaultValue; +return keyExists(property) ? Integer.parseInt(getProperty(property)) : defaultValue; } public long getLong(String property) { @@ -74,7 +83,7 @@ public class TypedProperties extends Properties implements Serializable { } public long getLong(String property, long defaultValue) { -return containsKey(property) ? Long.parseLong(getProperty(property)) : defaultValue; +return keyExists(property) ? Long.parseLong(getProperty(property)) : defaultValue; } public boolean getBoolean(String property) { @@ -83,7 +92,7 @@ public class TypedProperties extends Properties implements Serializable { } public boolean getBoolean(String property, boolean defaultValue) { -return containsKey(property) ? Boolean.parseBoolean(getProperty(property)) : defaultValue; +return keyExists(property) ? Boolean.parseBoolean(getProperty(property)) : defaultValue; } public double getDouble(String property) { @@ -92,6 +101,6 @@ public class TypedProperties extends Properties implements Serializable { } public double getDouble(String property, double defaultValue) { -return containsKey(property) ? Double.parseDouble(getProperty(property)) : defaultValue; +return keyExists(property) ? Double.parseDouble(getProperty(property)) : defaultValue; } } diff --git a/hudi-common/src/test/java/org/apache/hudi/common/properties/TestTypedProperties.java b/hudi-common/src/test/java/org/apache/hudi/common/properties/TestTypedProperties.java new file mode 100644 index 000..95955d4 --- /dev/null +++ b/hudi-common/src/test/java/org/apache/hudi/common/properties/TestTypedProperties.java @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the Lic
[GitHub] [hudi] leesf merged pull request #2059: [HUDI-1254] TypedProperties can not get values by initializing an existing properties
leesf merged pull request #2059: URL: https://github.com/apache/hudi/pull/2059 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #2078: [MINOR]Add clinbrain to powered by page
leesf commented on a change in pull request #2078: URL: https://github.com/apache/hudi/pull/2078#discussion_r485708058 ## File path: docs/_docs/1_4_powered_by.md ## @@ -28,6 +29,9 @@ offering real-time analysis on hudi dataset. Amazon Web Services is the World's leading cloud services provider. Apache Hudi is [pre-installed](https://aws.amazon.com/emr/features/hudi/) with the AWS Elastic Map Reduce offering, providing means for AWS users to perform record-level updates/deletes and manage storage efficiently. +### Clinbrain +[Clinbrain](https://www.clinbrain.com/) is the leading of big data platform on medical industry, we have built 200 medical big data centers by integrating Hudi Data Lake solution in numerous hospitals,hudi provides the abablility to upsert and deletes on hdfs, at the same time, it can make the fresh data-stream up-to-date effcienctlly in hadoop system with the hudi incremental view. Review comment: `hospitals,hudi` -> `hospitals, hudi` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1058) Make delete marker configurable
[ https://issues.apache.org/jira/browse/HUDI-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192931#comment-17192931 ] Raymond Xu commented on HUDI-1058: -- [~shenhong] any news on this ticket? :) > Make delete marker configurable > --- > > Key: HUDI-1058 > URL: https://issues.apache.org/jira/browse/HUDI-1058 > Project: Apache Hudi > Issue Type: Improvement > Components: Usability >Reporter: Raymond Xu >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > > users can specify any boolean field for delete marker and > `_hoodie_is_deleted` remains as default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] xushiyan opened a new pull request #2079: [HUDI-995] Use HoodieTestTable in more classes
xushiyan opened a new pull request #2079: URL: https://github.com/apache/hudi/pull/2079 Migrate test data prep logic in - TestStatsCommand - TestHoodieROTablePathFilter After changing to HoodieTestTable APIs, removed unused deprecated APIs in HoodieTestUtils ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] sam-wmt commented on issue #2042: org.apache.hudi.exception.HoodieIOException: IOException when reading log file
sam-wmt commented on issue #2042: URL: https://github.com/apache/hudi/issues/2042#issuecomment-689562068 This appears to have been caused by an internal change to our Hudi writer which I found in the executor logs: java.lang.NoSuchMethodException: com.xxx..x.xx.(org.apache.hudi.common.util.Option) . Closing ticket This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] sam-wmt closed issue #2042: org.apache.hudi.exception.HoodieIOException: IOException when reading log file
sam-wmt closed issue #2042: URL: https://github.com/apache/hudi/issues/2042 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Yogashri12 commented on issue #2076: [SUPPORT] load data partition wise
Yogashri12 commented on issue #2076: URL: https://github.com/apache/hudi/issues/2076#issuecomment-689553730 okie i will try.thank you for the response. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r485591618 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -18,120 +18,195 @@ package org.apache.hudi.client; +import com.codahale.metrics.Timer; +import org.apache.hadoop.conf.Configuration; import org.apache.hudi.avro.model.HoodieCleanMetadata; import org.apache.hudi.avro.model.HoodieCompactionPlan; import org.apache.hudi.avro.model.HoodieRestoreMetadata; import org.apache.hudi.avro.model.HoodieRollbackMetadata; -import org.apache.hudi.client.embedded.EmbeddedTimelineService; +import org.apache.hudi.callback.HoodieWriteCommitCallback; +import org.apache.hudi.callback.common.HoodieWriteCommitCallbackMessage; +import org.apache.hudi.callback.util.HoodieCommitCallbackFactory; +import org.apache.hudi.client.embebbed.BaseEmbeddedTimelineService; +import org.apache.hudi.common.HoodieEngineContext; import org.apache.hudi.common.model.HoodieCommitMetadata; import org.apache.hudi.common.model.HoodieKey; -import org.apache.hudi.common.model.HoodieRecord; import org.apache.hudi.common.model.HoodieRecordPayload; import org.apache.hudi.common.model.HoodieWriteStat; import org.apache.hudi.common.model.WriteOperationType; import org.apache.hudi.common.table.HoodieTableMetaClient; import org.apache.hudi.common.table.timeline.HoodieActiveTimeline; import org.apache.hudi.common.table.timeline.HoodieInstant; -import org.apache.hudi.common.table.timeline.HoodieInstant.State; import org.apache.hudi.common.table.timeline.HoodieTimeline; import org.apache.hudi.common.util.Option; import org.apache.hudi.common.util.ValidationUtils; -import org.apache.hudi.config.HoodieCompactionConfig; import org.apache.hudi.config.HoodieWriteConfig; + import org.apache.hudi.exception.HoodieCommitException; import org.apache.hudi.exception.HoodieIOException; import org.apache.hudi.exception.HoodieRestoreException; import org.apache.hudi.exception.HoodieRollbackException; import org.apache.hudi.exception.HoodieSavepointException; import org.apache.hudi.index.HoodieIndex; import org.apache.hudi.metrics.HoodieMetrics; -import org.apache.hudi.table.HoodieTable; -import org.apache.hudi.table.HoodieTimelineArchiveLog; -import org.apache.hudi.table.MarkerFiles; import org.apache.hudi.table.BulkInsertPartitioner; +import org.apache.hudi.table.HoodieTable; import org.apache.hudi.table.action.HoodieWriteMetadata; -import org.apache.hudi.table.action.compact.CompactHelpers; import org.apache.hudi.table.action.savepoint.SavepointHelpers; - -import com.codahale.metrics.Timer; import org.apache.log4j.LogManager; import org.apache.log4j.Logger; -import org.apache.spark.SparkConf; -import org.apache.spark.api.java.JavaRDD; -import org.apache.spark.api.java.JavaSparkContext; import java.io.IOException; +import java.nio.charset.StandardCharsets; import java.text.ParseException; import java.util.Collection; import java.util.List; import java.util.Map; import java.util.stream.Collectors; /** - * Hoodie Write Client helps you build tables on HDFS [insert()] and then perform efficient mutations on an HDFS - * table [upsert()] - * - * Note that, at any given time, there can only be one Spark job performing these operations on a Hoodie table. + * Abstract Write Client providing functionality for performing commit, index updates and rollback + * Reused for regular write operations like upsert/insert/bulk-insert.. as well as bootstrap + * + * @param Sub type of HoodieRecordPayload + * @param Type of inputs + * @param Type of keys + * @param Type of outputs + * @param Type of record position [Key, Option[partitionPath, fileID]] in hoodie table */ -public class HoodieWriteClient extends AbstractHoodieWriteClient { - +public abstract class AbstractHoodieWriteClient extends AbstractHoodieClient { private static final long serialVersionUID = 1L; - private static final Logger LOG = LogManager.getLogger(HoodieWriteClient.class); - private static final String LOOKUP_STR = "lookup"; - private final boolean rollbackPending; - private final transient HoodieMetrics metrics; - private transient Timer.Context compactionTimer; + private static final Logger LOG = LogManager.getLogger(AbstractHoodieWriteClient.class); + + protected final transient HoodieMetrics metrics; + private final transient HoodieIndex index; + + protected transient Timer.Context writeContext = null; + private transient WriteOperationType operationType; + private transient HoodieWriteCommitCallback commitCallback; + + protected static final String LOOKUP_STR = "lookup"; + protected final boolean rollbackPending; + protected transient Timer.Context compactionTimer; private transient AsyncCleanerService asyncCleanerService; + public void setOperationType(WriteOperationType operationType) { +this.operationType = operationType; +
[GitHub] [hudi] hj2016 opened a new pull request #2078: [MINOR]Add clinbrain to powered by page
hj2016 opened a new pull request #2078: URL: https://github.com/apache/hudi/pull/2078 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request Add clinbrain to powered by page *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1192) Make create hive database automatically configurable
[ https://issues.apache.org/jira/browse/HUDI-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liujinhui updated HUDI-1192: Priority: Minor (was: Major) > Make create hive database automatically configurable > > > Key: HUDI-1192 > URL: https://issues.apache.org/jira/browse/HUDI-1192 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: liujinhui >Assignee: liujinhui >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.1 > > > {code:java} > org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL create > database if not exists data_lake > at > org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:352) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:121) > at > org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.syncMeta(DeltaSync.java:510) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:425) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:244) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:579) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hive.service.cli.HiveSQLException: Error while > compiling statement: FAILED: SemanticException No valid privileges > User lingqu does not have privileges for CREATEDATABASE > The required privileges: Server=server1->action=create->grantOption=false; > at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:266) > at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:252) > at > org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309) > at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250) > at > org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:350) > ... 10 more > Caused by: org.apache.hive.service.cli.HiveSQLException: Error while > compiling statement: FAILED: SemanticException No valid privileges > User lingqu does not have privileges for CREATEDATABASE > The required privileges: Server=server1->action=create->grantOption=false; > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:329) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:207) > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290) > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:260) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:505) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:491) > at > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:295) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:507) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > ... 3 more > Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: No valid > privileges > User lingqu does not have privileges for CREATEDATABASE > The required privileges: Server=server1->action=create->grantOption=false; > at > org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:371) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:600) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1425) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1398) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:205) > ... 15 more > Caused by: org.apache.hadoop.hive.ql.metada
[jira] [Updated] (HUDI-1269) Make whether the failure of sync hudi data to hive affects hudi ingest process configurable
[ https://issues.apache.org/jira/browse/HUDI-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liujinhui updated HUDI-1269: Priority: Minor (was: Major) > Make whether the failure of sync hudi data to hive affects hudi ingest > process configurable > --- > > Key: HUDI-1269 > URL: https://issues.apache.org/jira/browse/HUDI-1269 > Project: Apache Hudi > Issue Type: New Feature > Components: Hive Integration >Reporter: wangxianghu >Assignee: liujinhui >Priority: Minor > Fix For: 0.6.1 > > > Currently, In an ETL pipeline(eg, kafka -> hudi -> hive), If the process of > hudi to hive failed, the job is still running. > I think we can add a switch to control the job behavior(fail or keep running) > when kafka to hudi is ok, while hudi to hive failed, leave the choice to > user. since ingesting data to hudi and sync to hive is a complete task in > some scenes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] pratyakshsharma commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution
pratyakshsharma commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-689476777 Lagging a bit, will circle back on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wkhapy123 closed issue #2050: merge on read table so many small files
wkhapy123 closed issue #2050: URL: https://github.com/apache/hudi/issues/2050 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org