[GitHub] [hudi] bvaradar commented on issue #2075: [SUPPORT] hoodie.datasource.write.precombine.field not working as expected

2020-09-09 Thread GitBox


bvaradar commented on issue #2075:
URL: https://github.com/apache/hudi/issues/2075#issuecomment-690024060


   @rajgowtham24 : This is a known in 0.5.x and was fixed in 0.6.0 version



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #2068: [SUPPORT]Deltastreamer Upsert Very Slow / Never Completes After Initial Data Load

2020-09-09 Thread GitBox


bvaradar commented on issue #2068:
URL: https://github.com/apache/hudi/issues/2068#issuecomment-690012667


   @bradleyhurley : The errors are due to shuffle fetch failures. Increasing 
executor memory and resources in general helps.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-1255) Combine and get updateValue in multiFields

2020-09-09 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang resolved HUDI-1255.
-
Resolution: Fixed

> Combine and get updateValue in multiFields
> --
>
> Key: HUDI-1255
> URL: https://issues.apache.org/jira/browse/HUDI-1255
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: karl wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> update current value for several fields that you want to change.
> The default payload OverwriteWithLatestAvroPayload overwrite the whole record 
> when 
> compare to orderingVal.This doesn't meet our need when we just want to change 
> specified fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1255) Combine and get updateValue in multiFields

2020-09-09 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated HUDI-1255:

Fix Version/s: 0.6.1

> Combine and get updateValue in multiFields
> --
>
> Key: HUDI-1255
> URL: https://issues.apache.org/jira/browse/HUDI-1255
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: karl wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0, 0.6.1
>
>
> update current value for several fields that you want to change.
> The default payload OverwriteWithLatestAvroPayload overwrite the whole record 
> when 
> compare to orderingVal.This doesn't meet our need when we just want to change 
> specified fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1255) Combine and get updateValue in multiFields

2020-09-09 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated HUDI-1255:

Fix Version/s: (was: 0.6.0)

> Combine and get updateValue in multiFields
> --
>
> Key: HUDI-1255
> URL: https://issues.apache.org/jira/browse/HUDI-1255
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: karl wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> update current value for several fields that you want to change.
> The default payload OverwriteWithLatestAvroPayload overwrite the whole record 
> when 
> compare to orderingVal.This doesn't meet our need when we just want to change 
> specified fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1255) Combine and get updateValue in multiFields

2020-09-09 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated HUDI-1255:

Fix Version/s: (was: 0.6.1)

> Combine and get updateValue in multiFields
> --
>
> Key: HUDI-1255
> URL: https://issues.apache.org/jira/browse/HUDI-1255
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: karl wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> update current value for several fields that you want to change.
> The default payload OverwriteWithLatestAvroPayload overwrite the whole record 
> when 
> compare to orderingVal.This doesn't meet our need when we just want to change 
> specified fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1255) Combine and get updateValue in multiFields

2020-09-09 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated HUDI-1255:

Fix Version/s: 0.6.0
   0.6.1

> Combine and get updateValue in multiFields
> --
>
> Key: HUDI-1255
> URL: https://issues.apache.org/jira/browse/HUDI-1255
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: karl wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0, 0.6.1
>
>
> update current value for several fields that you want to change.
> The default payload OverwriteWithLatestAvroPayload overwrite the whole record 
> when 
> compare to orderingVal.This doesn't meet our need when we just want to change 
> specified fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (063a98f -> a1cff8a)

2020-09-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 063a98f  [HUDI-1254] TypedProperties can not get values by 
initializing an existing properties (#2059)
 add a1cff8a  [HUDI-1255] Add new 
Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified 
fields in storage (#2056)

No new revisions were added by this update.

Summary of changes:
 .../OverwriteNonDefaultsWithLatestAvroPayload.java | 72 ++
 .../model/OverwriteWithLatestAvroPayload.java  |  9 ++-
 ...OverwriteNonDefaultsWithLatestAvroPayload.java} | 54 ++--
 3 files changed, 116 insertions(+), 19 deletions(-)
 create mode 100644 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java
 copy 
hudi-common/src/test/java/org/apache/hudi/common/model/{TestOverwriteWithLatestAvroPayload.java
 => TestOverwriteNonDefaultsWithLatestAvroPayload.java} (63%)



[GitHub] [hudi] vinothchandar merged pull request #2056: [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage

2020-09-09 Thread GitBox


vinothchandar merged pull request #2056:
URL: https://github.com/apache/hudi/pull/2056


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar closed issue #2076: [SUPPORT] load data partition wise

2020-09-09 Thread GitBox


bvaradar closed issue #2076:
URL: https://github.com/apache/hudi/issues/2076


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-09-09 Thread GitBox


n3nash commented on pull request #1484:
URL: https://github.com/apache/hudi/pull/1484#issuecomment-689961998


   Yes, I will do by Friday.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1058) Make delete marker configurable

2020-09-09 Thread shenh062326 (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193288#comment-17193288
 ] 

shenh062326 commented on HUDI-1058:
---

[~rxu] sorry for late. I am waiting for 
https://github.com/apache/hudi/pull/1704 to merge,  because HUDI-1058 also 
needs similar modifications. It is better to work on HUDI-1058 after this merge 
request is merged, but if [this MR|https://github.com/apache/hudi/pull/1704] 
has not progressed, I can resolve HUDI-1058 first.
 
 

> Make delete marker configurable
> ---
>
> Key: HUDI-1058
> URL: https://issues.apache.org/jira/browse/HUDI-1058
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Raymond Xu
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
>
> users can specify any boolean field for delete marker and 
> `_hoodie_is_deleted` remains as default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] shenh062326 commented on pull request #1704: [HUDI-115] Enhance OverwriteWithLatestAvroPayload to also respect ordering value of record in storage

2020-09-09 Thread GitBox


shenh062326 commented on pull request #1704:
URL: https://github.com/apache/hudi/pull/1704#issuecomment-689917712


   I am working on https://issues.apache.org/jira/browse/HUDI-1058 , it also 
needs similar modifications. It is better to work on HUDI-1058 after this merge 
request is merged, but if this MR has not progressed, I can resolve HUDI-1058 
first.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-09 Thread GitBox


wangxianghu commented on pull request #1827:
URL: https://github.com/apache/hudi/pull/1827#issuecomment-689916910


   > @wangxianghu the issue with the tests is that, now most of the tests are 
moved to hudi-spark-client. previously we had split tests into hudi-client and 
others. We need to edit `travis.yml` to adjust the splits again
   
   @vinothchandar could you please help me edit travis.yml to adjust the splits 
.. I am not familiar with that
   thanks :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #2058: [HUDI-1259] Cache some framework binaries to speed up the progress of building docker image in local env

2020-09-09 Thread GitBox


yanghua commented on pull request #2058:
URL: https://github.com/apache/hudi/pull/2058#issuecomment-689916526


   > If you are referring to hudi, we don't have to rebuild docker images to 
pick up latest hudi code.
   
   Yes
   
   > The hudi codebase is mounted inside docker containers so that you can use 
the latest version.
   
   You mean if I change the code it would reflect into the hudi on docker 
immediately? Where can I know the configuration of this mechanism in the 
project? Sorry, I am not familiar with Docker.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2020-09-09 Thread GitBox


vinothchandar commented on pull request #1929:
URL: https://github.com/apache/hudi/pull/1929#issuecomment-689905849


   @satishkotha Can you please help review? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar opened a new pull request #2082: hudi cluster write path poc

2020-09-09 Thread GitBox


vinothchandar opened a new pull request #2082:
URL: https://github.com/apache/hudi/pull/2082


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2056: [HUDI-1255] Add new Payload(OverwriteNonDefaultsWithLatestAvroPayload) for updating specified fields in storage

2020-09-09 Thread GitBox


vinothchandar commented on pull request #2056:
URL: https://github.com/apache/hudi/pull/2056#issuecomment-689884262


   sorry, long weekend here in the states. will take a look today 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1270) NoSuchMethod PartitionedFile on AWS EMR Spark 2.4.5

2020-09-09 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193245#comment-17193245
 ] 

Vinoth Chandar commented on HUDI-1270:
--

Not sure if we can do much about this in Hudi itself. may be leave it to aws 
folks? 

cc [~uditme]

> NoSuchMethod PartitionedFile on AWS EMR Spark 2.4.5
> ---
>
> Key: HUDI-1270
> URL: https://issues.apache.org/jira/browse/HUDI-1270
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Gary Li
>Priority: Major
>
> There are some AWS EMR users reporting:
> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.execution.datasources.PartitionedFile.
> on EMR (Spark-2.4.5-amzn-0) when using the Spark Datasource to query MOR 
> table.
> [https://github.com/apache/hudi/pull/1848#issuecomment-687392285]
> [https://github.com/apache/hudi/issues/2057#issuecomment-685015564]
> [~uditme] [~vbalaji] would you guys able to help?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch asf-site updated: Travis CI build asf-site

2020-09-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 8afed69  Travis CI build asf-site
8afed69 is described below

commit 8afed6902e3d955f9f3977a03f8eb8845198ae78
Author: CI 
AuthorDate: Wed Sep 9 21:54:43 2020 +

Travis CI build asf-site
---
 content/docs/docker_demo.html | 1 -
 1 file changed, 1 deletion(-)

diff --git a/content/docs/docker_demo.html b/content/docs/docker_demo.html
index 6f23ab8..0d36fae 100644
--- a/content/docs/docker_demo.html
+++ b/content/docs/docker_demo.html
@@ -484,7 +484,6 @@ This should pull the docker images from docker hub and 
setup docker cluster.
 Creating spark-worker-1... done
 Copying spark default config and setting up 
configs
 Copying spark default config and setting up 
configs
-Copying spark default config and setting up 
configs
 $ docker ps
 
 



[hudi] branch asf-site updated: [MINOR]: removed redundant line from docker-demo page (#2081)

2020-09-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new ca2f9a8  [MINOR]: removed redundant line from docker-demo page (#2081)
ca2f9a8 is described below

commit ca2f9a8945afa8bcbdda3f18bee89fcfb65dbf9b
Author: Pratyaksh Sharma 
AuthorDate: Thu Sep 10 03:22:35 2020 +0530

[MINOR]: removed redundant line from docker-demo page (#2081)
---
 docs/_docs/0_4_docker_demo.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/_docs/0_4_docker_demo.md b/docs/_docs/0_4_docker_demo.md
index 4193fd1..22efbe9 100644
--- a/docs/_docs/0_4_docker_demo.md
+++ b/docs/_docs/0_4_docker_demo.md
@@ -85,7 +85,6 @@ Creating adhoc-2   ... done
 Creating spark-worker-1... done
 Copying spark default config and setting up configs
 Copying spark default config and setting up configs
-Copying spark default config and setting up configs
 $ docker ps
 ```
 



[GitHub] [hudi] vinothchandar merged pull request #2081: [MINOR]: removed redundant line from docker-demo page

2020-09-09 Thread GitBox


vinothchandar merged pull request #2081:
URL: https://github.com/apache/hudi/pull/2081


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-338) Reduce Hoodie commit/instant time granularity to millis from secs

2020-09-09 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma reassigned HUDI-338:
-

Assignee: Pratyaksh Sharma  (was: Nishith Agarwal)

> Reduce Hoodie commit/instant time granularity to millis from secs
> -
>
> Key: HUDI-338
> URL: https://issues.apache.org/jira/browse/HUDI-338
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Pratyaksh Sharma
>Priority: Major
> Fix For: 0.6.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

2020-09-09 Thread GitBox


pratyakshsharma commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r485921982



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##
@@ -71,6 +71,9 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = {"--enable-create-database"}, description = "Enable 
create hive database")
+  public Boolean enableCreateDatabase = false;

Review comment:
   
https://lists.apache.org/thread.html/e1b7f97c774e1d7d7fc54fbb46db49aaf2e217303a50d9885150242d%40%3Cdev.hudi.apache.org%3E
 - this is what I am referring to. :) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action

2020-09-09 Thread GitBox


vinothchandar commented on pull request #2048:
URL: https://github.com/apache/hudi/pull/2048#issuecomment-689812755


   >You suggested to remove HoodieReplaceStat
   
   I think the suggestion was to simplify HoodieReplaceMetadata such that it 
only contains the extra information about replaced file groups. and use the 
HoodieCommitMetadata and its HoodieWriteStat for tracking the new file groups 
written.
   We could have HoodieReplaceStat to be part of the WriteStatus itself for 
tracking the additional information about replaced file groups? 
   
   On cleaning vs archival, it would be good if we can implement this in 
cleaning. But can that be a follow-on item? Practically speaking, typical 
deployments don't configure cleaning that low. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

2020-09-09 Thread GitBox


pratyakshsharma commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r485910367



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##
@@ -117,11 +117,13 @@ private void syncHoodieTable(String tableName, boolean 
useRealtimeInputFormat) {
 boolean tableExists = hoodieHiveClient.doesTableExist(tableName);
 
 // check if the database exists else create it
-try {
-  hoodieHiveClient.updateHiveSQL("create database if not exists " + 
cfg.databaseName);
-} catch (Exception e) {
-  // this is harmless since table creation will fail anyways, creation of 
DB is needed for in-memory testing
-  LOG.warn("Unable to create database", e);
+if (cfg.enableCreateDatabase) {
+  try {
+hoodieHiveClient.updateHiveSQL("create database if not exists " + 
cfg.databaseName);

Review comment:
   +1 on throwing the error. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

2020-09-09 Thread GitBox


pratyakshsharma commented on a change in pull request #1968:
URL: https://github.com/apache/hudi/pull/1968#discussion_r485909624



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##
@@ -71,6 +71,9 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = {"--enable-create-database"}, description = "Enable 
create hive database")
+  public Boolean enableCreateDatabase = false;

Review comment:
   @vinothchandar We do not let hudi create databases by default. So false 
seems to be ok :) @bvaradar to chime in here. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1053) Make ComplexKeyGenerator also support non partitioned Hudi dataset

2020-09-09 Thread Pratyaksh Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193170#comment-17193170
 ] 

Pratyaksh Sharma commented on HUDI-1053:


[~bhavanisudha] Is this not handled now with CustomKeyGenerator? If I am not 
missing anything here, I guess we can close this. 

> Make ComplexKeyGenerator also support non partitioned Hudi dataset
> --
>
> Key: HUDI-1053
> URL: https://issues.apache.org/jira/browse/HUDI-1053
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Storage Management, Writer Core
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
> Fix For: 0.6.1
>
>
> Currently When using ComplexKeyGenerator a `default` partition is assumed. 
> Recently there has been interest in supporting non partitioned Hudi datasets 
> that uses ComplexKeyGenerator. This GitHub issue has context - 
> https://github.com/apache/hudi/issues/1747



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pratyakshsharma opened a new pull request #2081: [MINOR]: removed redundant line from docker-demo page

2020-09-09 Thread GitBox


pratyakshsharma opened a new pull request #2081:
URL: https://github.com/apache/hudi/pull/2081


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1990: [HUDI-1199]: relocated jetty in hudi-utilities-bundle pom

2020-09-09 Thread GitBox


pratyakshsharma commented on a change in pull request #1990:
URL: https://github.com/apache/hudi/pull/1990#discussion_r485892352



##
File path: packaging/hudi-utilities-bundle/pom.xml
##
@@ -172,6 +172,10 @@
org.apache.htrace.
   
org.apache.hudi.org.apache.htrace.
 
+
+  org.eclipse.jetty.
+  
org.apache.hudi.org.apache.jetty.

Review comment:
   @vinothchandar Yes this is how it was done for spark-bundle as well. Let 
me re-trigger the build for this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pratyakshsharma opened a new pull request #2080: [MINOR]: changed apache id for Pratyaksh

2020-09-09 Thread GitBox


pratyakshsharma opened a new pull request #2080:
URL: https://github.com/apache/hudi/pull/2080


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-57) [UMBRELLA] Support ORC Storage

2020-09-09 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193144#comment-17193144
 ] 

Vinoth Chandar commented on HUDI-57:


[~manijndl77] assigned it to you. there is a fair bit of prior work that 
attempted this. you can search PRs and RFCs, there is probably an easier way to 
do this now, given the base file format etc have been abstracted out nicely now

> [UMBRELLA] Support ORC Storage
> --
>
> Key: HUDI-57
> URL: https://issues.apache.org/jira/browse/HUDI-57
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Mani Jindal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/uber/hudi/issues/68]
> https://github.com/uber/hudi/issues/155



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-89) Clean up placement, naming, defaults of HoodieWriteConfig

2020-09-09 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193143#comment-17193143
 ] 

Vinoth Chandar commented on HUDI-89:


[~manijndl77] awesome. Not sure if the description matches what we have in mind 
atm though. 

 

[~shivnarayan] was thinking about this. Siva, can you please help mani ramp up 
on this JIRA? 

> Clean up placement, naming, defaults of HoodieWriteConfig
> -
>
> Key: HUDI-89
> URL: https://issues.apache.org/jira/browse/HUDI-89
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup, Usability, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> # Rename HoodieWriteConfig to HoodieClientConfig 
>  # Move bunch of configs from  CompactionConfig to StorageConfig 
>  # Introduce new HoodieCleanConfig
>  # Should we consider lombok or something to automate the 
> defaults/getters/setters
>  # Consistent name of properties/defaults 
>  # Enforce bounds more strictly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-57) [UMBRELLA] Support ORC Storage

2020-09-09 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-57:
--

Assignee: Mani Jindal  (was: Vinoth Chandar)

> [UMBRELLA] Support ORC Storage
> --
>
> Key: HUDI-57
> URL: https://issues.apache.org/jira/browse/HUDI-57
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Mani Jindal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/uber/hudi/issues/68]
> https://github.com/uber/hudi/issues/155



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-89) Clean up placement, naming, defaults of HoodieWriteConfig

2020-09-09 Thread Mani Jindal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193135#comment-17193135
 ] 

Mani Jindal commented on HUDI-89:
-

Hi [~vinoth] i am new to community can i pick this up?

> Clean up placement, naming, defaults of HoodieWriteConfig
> -
>
> Key: HUDI-89
> URL: https://issues.apache.org/jira/browse/HUDI-89
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup, Usability, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> # Rename HoodieWriteConfig to HoodieClientConfig 
>  # Move bunch of configs from  CompactionConfig to StorageConfig 
>  # Introduce new HoodieCleanConfig
>  # Should we consider lombok or something to automate the 
> defaults/getters/setters
>  # Consistent name of properties/defaults 
>  # Enforce bounds more strictly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-57) [UMBRELLA] Support ORC Storage

2020-09-09 Thread Mani Jindal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193130#comment-17193130
 ] 

Mani Jindal commented on HUDI-57:
-

Hi [~vinoth] i am new to community would love to pick up any task here 

> [UMBRELLA] Support ORC Storage
> --
>
> Key: HUDI-57
> URL: https://issues.apache.org/jira/browse/HUDI-57
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/uber/hudi/issues/68]
> https://github.com/uber/hudi/issues/155



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-09-09 Thread GitBox


vinothchandar commented on pull request #1484:
URL: https://github.com/apache/hudi/pull/1484#issuecomment-689722005


   will do. thanks ! @n3nash can take a pass as well 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1267) Additional Metadata Details for Hudi Transactions

2020-09-09 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193086#comment-17193086
 ] 

Vinoth Chandar commented on HUDI-1267:
--

ah got it. there was a proposal for a UI on top that reads across tables. this 
is worth discussing again on the mailing list.

 

This was the rough approach.
 # We run a long running instance of  TimelineServer and have all the writers 
to each table report commits/have the server pull and materialize the table 
metadata in local rocksDB
 # We can then build REST Layer on top of it and hook up a UI.

 

> Additional Metadata Details for Hudi Transactions
> -
>
> Key: HUDI-1267
> URL: https://issues.apache.org/jira/browse/HUDI-1267
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Ashish M G
>Priority: Major
>  Labels: features
> Fix For: 0.7.0
>
>
> Whenever following scenarios happen :
>  # Custom Datasource ( Kafka for instance ) -> Hudi Table
>  # Hudi -> Hudi Table
>  # s3 -> Hudi Table
> Following metadata need to be captured :
>  # Table Level Metadata
>  * 
>  ** Operation name ( record level ) like Upsert, Insert etc for last 
> operation performed on the row
>  # Transaction Level Metadata ( This will be logged on Hudi Level and not 
> Table Level )
>  ** Source ( Kafka Topic Name / S3 url for source data in case of s3 etc )
>  ** Target Hudi Table Name
>  ** Last transaction time ( last commit time )
> Basically , point (1) collects all details on table level  and point (2) 
> collects all the transactions happened on Hudi Level
> Point(1) would be just a column addition for operation type
> Eg for Point (2) :  Suppose we had an ingestion from Kafka topic 'A' to Hudi 
> table 'ingest_kafka' and another ingestion from RDBMS table ( 'tableA' ) 
> through Sqoop to Hudi Table 'RDBMSingest' then the metadata captured would be 
> :
>  
> |Source|Timestamp|Transaction Type|Target|
> |Kafka - 'A'|XX|UPSERT|ingest_kafka|
> |RDBMS - 'tableA'|XX|INSERT|RDBMSingest|
>  
> The Transaction Details Table in Point (2) should be available as a separate 
> common table which can be queried as Hudi Table or stored as parquet which 
> can be queried from Spark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-09-09 Thread Felix Kizhakkel Jose (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193006#comment-17193006
 ] 

Felix Kizhakkel Jose edited comment on HUDI-83 at 9/9/20, 4:05 PM:
---

[~uditme] we are using EMR 6.1.0, the very recent one. Where Spark 3.0, Hive 
3.1.2. So could you please elaborate a little on your response - "Yes, in hive3 
it is supported, and we can just replace timestamp column from long to 
timestamp.". What should be done  and is it possible to make in my pyspark 
script or does this change should happen in HUDI code?


was (Author: felixkjose):
[~uditme] we are using EMR 6.1.0, the very recent one. Where Spark 3.0, Hive 
3.1.2. So could you please elaborate a little on your response - "Yes, in hive3 
it is supported, and we can just replace timestamp column from long to 
timestamp.". What should be done  and is it possible to make in the pyspark 
script?

> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.1
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-09-09 Thread Felix Kizhakkel Jose (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193006#comment-17193006
 ] 

Felix Kizhakkel Jose commented on HUDI-83:
--

[~uditme] we are using EMR 6.1.0, the very recent one. Where Spark 3.0, Hive 
3.1.2. So could you please elaborate a little on your response - "Yes, in hive3 
it is supported, and we can just replace timestamp column from long to 
timestamp.". What should be done  and is it possible to make in the pyspark 
script?

> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.1
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-1254] TypedProperties can not get values by initializing an existing properties (#2059)

2020-09-09 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 063a98f  [HUDI-1254] TypedProperties can not get values by 
initializing an existing properties (#2059)
063a98f is described below

commit 063a98fc2b76beac28a4797884973abd2911c887
Author: linshan-ma 
AuthorDate: Wed Sep 9 23:42:41 2020 +0800

[HUDI-1254] TypedProperties can not get values by initializing an existing 
properties (#2059)
---
 .../apache/hudi/common/config/TypedProperties.java | 23 --
 .../common/properties/TestTypedProperties.java | 84 ++
 2 files changed, 100 insertions(+), 7 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java 
b/hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java
index 295598c..c780ded 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/config/TypedProperties.java
@@ -22,6 +22,7 @@ import java.io.Serializable;
 import java.util.Arrays;
 import java.util.List;
 import java.util.Properties;
+import java.util.Set;
 import java.util.stream.Collectors;
 
 /**
@@ -38,22 +39,30 @@ public class TypedProperties extends Properties implements 
Serializable {
   }
 
   private void checkKey(String property) {
-if (!containsKey(property)) {
+if (!keyExists(property)) {
   throw new IllegalArgumentException("Property " + property + " not 
found");
 }
   }
 
+  private boolean keyExists(String property) {
+Set keys = super.stringPropertyNames();
+if (keys.contains(property)) {
+  return true;
+}
+return false;
+  }
+
   public String getString(String property) {
 checkKey(property);
 return getProperty(property);
   }
 
   public String getString(String property, String defaultValue) {
-return containsKey(property) ? getProperty(property) : defaultValue;
+return keyExists(property) ? getProperty(property) : defaultValue;
   }
 
   public List getStringList(String property, String delimiter, 
List defaultVal) {
-if (!containsKey(property)) {
+if (!keyExists(property)) {
   return defaultVal;
 }
 return 
Arrays.stream(getProperty(property).split(delimiter)).map(String::trim).collect(Collectors.toList());
@@ -65,7 +74,7 @@ public class TypedProperties extends Properties implements 
Serializable {
   }
 
   public int getInteger(String property, int defaultValue) {
-return containsKey(property) ? Integer.parseInt(getProperty(property)) : 
defaultValue;
+return keyExists(property) ? Integer.parseInt(getProperty(property)) : 
defaultValue;
   }
 
   public long getLong(String property) {
@@ -74,7 +83,7 @@ public class TypedProperties extends Properties implements 
Serializable {
   }
 
   public long getLong(String property, long defaultValue) {
-return containsKey(property) ? Long.parseLong(getProperty(property)) : 
defaultValue;
+return keyExists(property) ? Long.parseLong(getProperty(property)) : 
defaultValue;
   }
 
   public boolean getBoolean(String property) {
@@ -83,7 +92,7 @@ public class TypedProperties extends Properties implements 
Serializable {
   }
 
   public boolean getBoolean(String property, boolean defaultValue) {
-return containsKey(property) ? Boolean.parseBoolean(getProperty(property)) 
: defaultValue;
+return keyExists(property) ? Boolean.parseBoolean(getProperty(property)) : 
defaultValue;
   }
 
   public double getDouble(String property) {
@@ -92,6 +101,6 @@ public class TypedProperties extends Properties implements 
Serializable {
   }
 
   public double getDouble(String property, double defaultValue) {
-return containsKey(property) ? Double.parseDouble(getProperty(property)) : 
defaultValue;
+return keyExists(property) ? Double.parseDouble(getProperty(property)) : 
defaultValue;
   }
 }
diff --git 
a/hudi-common/src/test/java/org/apache/hudi/common/properties/TestTypedProperties.java
 
b/hudi-common/src/test/java/org/apache/hudi/common/properties/TestTypedProperties.java
new file mode 100644
index 000..95955d4
--- /dev/null
+++ 
b/hudi-common/src/test/java/org/apache/hudi/common/properties/TestTypedProperties.java
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the Lic

[GitHub] [hudi] leesf merged pull request #2059: [HUDI-1254] TypedProperties can not get values by initializing an existing properties

2020-09-09 Thread GitBox


leesf merged pull request #2059:
URL: https://github.com/apache/hudi/pull/2059


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #2078: [MINOR]Add clinbrain to powered by page

2020-09-09 Thread GitBox


leesf commented on a change in pull request #2078:
URL: https://github.com/apache/hudi/pull/2078#discussion_r485708058



##
File path: docs/_docs/1_4_powered_by.md
##
@@ -28,6 +29,9 @@ offering real-time analysis on hudi dataset.
 Amazon Web Services is the World's leading cloud services provider. Apache 
Hudi is [pre-installed](https://aws.amazon.com/emr/features/hudi/) with the AWS 
Elastic Map Reduce 
 offering, providing means for AWS users to perform record-level 
updates/deletes and manage storage efficiently.
 
+### Clinbrain
+[Clinbrain](https://www.clinbrain.com/) is the leading of big data platform on 
medical industry, we have built 200 medical big data centers by integrating 
Hudi Data Lake solution in numerous hospitals,hudi provides the abablility to 
upsert and deletes on hdfs, at the same time, it can make the fresh data-stream 
up-to-date effcienctlly in hadoop system with the hudi incremental view.

Review comment:
   `hospitals,hudi` -> `hospitals, hudi`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1058) Make delete marker configurable

2020-09-09 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192931#comment-17192931
 ] 

Raymond Xu commented on HUDI-1058:
--

[~shenhong] any news on this ticket? :)

> Make delete marker configurable
> ---
>
> Key: HUDI-1058
> URL: https://issues.apache.org/jira/browse/HUDI-1058
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Raymond Xu
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
>
> users can specify any boolean field for delete marker and 
> `_hoodie_is_deleted` remains as default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan opened a new pull request #2079: [HUDI-995] Use HoodieTestTable in more classes

2020-09-09 Thread GitBox


xushiyan opened a new pull request #2079:
URL: https://github.com/apache/hudi/pull/2079


   Migrate test data prep logic in
   - TestStatsCommand
   - TestHoodieROTablePathFilter
   
   After changing to HoodieTestTable APIs, removed unused deprecated APIs in 
HoodieTestUtils
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sam-wmt commented on issue #2042: org.apache.hudi.exception.HoodieIOException: IOException when reading log file

2020-09-09 Thread GitBox


sam-wmt commented on issue #2042:
URL: https://github.com/apache/hudi/issues/2042#issuecomment-689562068


   This appears to have been caused by an internal change to our Hudi writer 
which I found in the executor logs:
   java.lang.NoSuchMethodException: 
com.xxx..x.xx.(org.apache.hudi.common.util.Option)
   .
   Closing ticket



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] sam-wmt closed issue #2042: org.apache.hudi.exception.HoodieIOException: IOException when reading log file

2020-09-09 Thread GitBox


sam-wmt closed issue #2042:
URL: https://github.com/apache/hudi/issues/2042


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Yogashri12 commented on issue #2076: [SUPPORT] load data partition wise

2020-09-09 Thread GitBox


Yogashri12 commented on issue #2076:
URL: https://github.com/apache/hudi/issues/2076#issuecomment-689553730


   okie i will try.thank you for the response.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-09 Thread GitBox


wangxianghu commented on a change in pull request #1827:
URL: https://github.com/apache/hudi/pull/1827#discussion_r485591618



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -18,120 +18,195 @@
 
 package org.apache.hudi.client;
 
+import com.codahale.metrics.Timer;
+import org.apache.hadoop.conf.Configuration;
 import org.apache.hudi.avro.model.HoodieCleanMetadata;
 import org.apache.hudi.avro.model.HoodieCompactionPlan;
 import org.apache.hudi.avro.model.HoodieRestoreMetadata;
 import org.apache.hudi.avro.model.HoodieRollbackMetadata;
-import org.apache.hudi.client.embedded.EmbeddedTimelineService;
+import org.apache.hudi.callback.HoodieWriteCommitCallback;
+import org.apache.hudi.callback.common.HoodieWriteCommitCallbackMessage;
+import org.apache.hudi.callback.util.HoodieCommitCallbackFactory;
+import org.apache.hudi.client.embebbed.BaseEmbeddedTimelineService;
+import org.apache.hudi.common.HoodieEngineContext;
 import org.apache.hudi.common.model.HoodieCommitMetadata;
 import org.apache.hudi.common.model.HoodieKey;
-import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.model.WriteOperationType;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
 import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
 import org.apache.hudi.common.table.timeline.HoodieInstant;
-import org.apache.hudi.common.table.timeline.HoodieInstant.State;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.ValidationUtils;
-import org.apache.hudi.config.HoodieCompactionConfig;
 import org.apache.hudi.config.HoodieWriteConfig;
+
 import org.apache.hudi.exception.HoodieCommitException;
 import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.exception.HoodieRestoreException;
 import org.apache.hudi.exception.HoodieRollbackException;
 import org.apache.hudi.exception.HoodieSavepointException;
 import org.apache.hudi.index.HoodieIndex;
 import org.apache.hudi.metrics.HoodieMetrics;
-import org.apache.hudi.table.HoodieTable;
-import org.apache.hudi.table.HoodieTimelineArchiveLog;
-import org.apache.hudi.table.MarkerFiles;
 import org.apache.hudi.table.BulkInsertPartitioner;
+import org.apache.hudi.table.HoodieTable;
 import org.apache.hudi.table.action.HoodieWriteMetadata;
-import org.apache.hudi.table.action.compact.CompactHelpers;
 import org.apache.hudi.table.action.savepoint.SavepointHelpers;
-
-import com.codahale.metrics.Timer;
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
-import org.apache.spark.SparkConf;
-import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.api.java.JavaSparkContext;
 
 import java.io.IOException;
+import java.nio.charset.StandardCharsets;
 import java.text.ParseException;
 import java.util.Collection;
 import java.util.List;
 import java.util.Map;
 import java.util.stream.Collectors;
 
 /**
- * Hoodie Write Client helps you build tables on HDFS [insert()] and then 
perform efficient mutations on an HDFS
- * table [upsert()]
- * 
- * Note that, at any given time, there can only be one Spark job performing 
these operations on a Hoodie table.
+ * Abstract Write Client providing functionality for performing commit, index 
updates and rollback
+ * Reused for regular write operations like upsert/insert/bulk-insert.. as 
well as bootstrap
+ *
+ * @param  Sub type of HoodieRecordPayload
+ * @param  Type of inputs
+ * @param  Type of keys
+ * @param  Type of outputs
+ * @param  Type of record position [Key, Option[partitionPath, fileID]] in 
hoodie table
  */
-public class HoodieWriteClient extends 
AbstractHoodieWriteClient {
-
+public abstract class AbstractHoodieWriteClient extends AbstractHoodieClient {
   private static final long serialVersionUID = 1L;
-  private static final Logger LOG = 
LogManager.getLogger(HoodieWriteClient.class);
-  private static final String LOOKUP_STR = "lookup";
-  private final boolean rollbackPending;
-  private final transient HoodieMetrics metrics;
-  private transient Timer.Context compactionTimer;
+  private static final Logger LOG = 
LogManager.getLogger(AbstractHoodieWriteClient.class);
+
+  protected final transient HoodieMetrics metrics;
+  private final transient HoodieIndex index;
+
+  protected transient Timer.Context writeContext = null;
+  private transient WriteOperationType operationType;
+  private transient HoodieWriteCommitCallback commitCallback;
+
+  protected static final String LOOKUP_STR = "lookup";
+  protected final boolean rollbackPending;
+  protected transient Timer.Context compactionTimer;
   private transient AsyncCleanerService asyncCleanerService;
 
+  public void setOperationType(WriteOperationType operationType) {
+this.operationType = operationType;
+ 

[GitHub] [hudi] hj2016 opened a new pull request #2078: [MINOR]Add clinbrain to powered by page

2020-09-09 Thread GitBox


hj2016 opened a new pull request #2078:
URL: https://github.com/apache/hudi/pull/2078


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   Add clinbrain to powered by page
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1192) Make create hive database automatically configurable

2020-09-09 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui updated HUDI-1192:

Priority: Minor  (was: Major)

> Make create hive database automatically configurable
> 
>
> Key: HUDI-1192
> URL: https://issues.apache.org/jira/browse/HUDI-1192
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> {code:java}
> org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL create 
> database if not exists data_lake
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:352)
>   at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:121)
>   at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncMeta(DeltaSync.java:510)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:425)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:244)
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:579)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException No valid privileges
>  User lingqu does not have privileges for CREATEDATABASE
>  The required privileges: Server=server1->action=create->grantOption=false;
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:266)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:252)
>   at 
> org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:350)
>   ... 10 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException No valid privileges
>  User lingqu does not have privileges for CREATEDATABASE
>  The required privileges: Server=server1->action=create->grantOption=false;
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:329)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:207)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:260)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:505)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:491)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:295)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:507)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   ... 3 more
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: No valid 
> privileges
>  User lingqu does not have privileges for CREATEDATABASE
>  The required privileges: Server=server1->action=create->grantOption=false;
>   at 
> org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:371)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:600)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1425)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1398)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:205)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metada

[jira] [Updated] (HUDI-1269) Make whether the failure of sync hudi data to hive affects hudi ingest process configurable

2020-09-09 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui updated HUDI-1269:

Priority: Minor  (was: Major)

> Make whether the failure of sync hudi data to hive affects hudi ingest 
> process configurable
> ---
>
> Key: HUDI-1269
> URL: https://issues.apache.org/jira/browse/HUDI-1269
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: wangxianghu
>Assignee: liujinhui
>Priority: Minor
> Fix For: 0.6.1
>
>
> Currently, In an ETL pipeline(eg, kafka -> hudi -> hive), If the process of 
> hudi to hive failed, the job is still running.
> I think we can add a switch to control the job behavior(fail or keep running) 
> when kafka to hudi is ok, while hudi to hive failed, leave the choice to 
> user. since ingesting data to hudi and sync to hive is a complete task in 
> some scenes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pratyakshsharma commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

2020-09-09 Thread GitBox


pratyakshsharma commented on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-689476777


   Lagging a bit, will circle back on this. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wkhapy123 closed issue #2050: merge on read table so many small files

2020-09-09 Thread GitBox


wkhapy123 closed issue #2050:
URL: https://github.com/apache/hudi/issues/2050


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org