[GitHub] [incubator-hudi] codecov-io commented on issue #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-03-26 Thread GitBox
codecov-io commented on issue #1452: [HUDI-740]Fix can not specify the 
sparkMaster of cleans run command
URL: https://github.com/apache/incubator-hudi/pull/1452#issuecomment-604806506
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1452?src=pr=h1) 
Report
   > Merging 
[#1452](https://codecov.io/gh/apache/incubator-hudi/pull/1452?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/cafc87041baf4055c39244e7cde0187437bb03d4=desc)
 will **increase** coverage by `0.06%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1452/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1452?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1452  +/-   ##
   
   + Coverage 67.61%   67.67%   +0.06% 
   - Complexity  254  259   +5 
   
 Files   340  342   +2 
 Lines 1650416510   +6 
 Branches   1689 1684   -5 
   
   + Hits  1115911173  +14 
   + Misses 4606 4599   -7 
   + Partials739  738   -1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1452?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...ava/org/apache/hudi/config/HoodieMemoryConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZU1lbW9yeUNvbmZpZy5qYXZh)
 | `60.00% <0.00%> (-16.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...ain/java/org/apache/hudi/io/HoodieMergeHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllTWVyZ2VIYW5kbGUuamF2YQ==)
 | `79.31% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...n/java/org/apache/hudi/index/hbase/HBaseIndex.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvaGJhc2UvSEJhc2VJbmRleC5qYXZh)
 | `84.21% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/client/HoodieWriteClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVdyaXRlQ2xpZW50LmphdmE=)
 | `69.77% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../org/apache/hudi/index/bloom/HoodieBloomIndex.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vSG9vZGllQmxvb21JbmRleC5qYXZh)
 | `94.73% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/client/AbstractHoodieWriteClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0Fic3RyYWN0SG9vZGllV3JpdGVDbGllbnQuamF2YQ==)
 | `74.44% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...org/apache/hudi/client/utils/SparkConfigUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L3V0aWxzL1NwYXJrQ29uZmlnVXRpbHMuamF2YQ==)
 | `85.71% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `100.00% <0.00%> (ø)` | `1.00% <0.00%> (?%)` | |
   | 
[...table/compact/HoodieMergeOnReadTableCompactor.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvY29tcGFjdC9Ib29kaWVNZXJnZU9uUmVhZFRhYmxlQ29tcGFjdG9yLmphdmE=)
 | `90.21% <0.00%> (+0.10%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `84.37% <0.00%> (+0.46%)` | `0.00% <0.00%> (ø%)` | |
   | ... and [3 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1452/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1452?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 

[GitHub] [incubator-hudi] yanghua commented on issue #1449: [HUDI-698]Add unit test for CleansCommand

2020-03-26 Thread GitBox
yanghua commented on issue #1449: [HUDI-698]Add unit test for CleansCommand
URL: https://github.com/apache/incubator-hudi/pull/1449#issuecomment-604802503
 
 
   @hddong The Travis is red. Please recheck it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #229

2020-03-26 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.36 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[jira] [Updated] (HUDI-740) Fix can not specify the sparkMaster of cleans run command

2020-03-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-740:

Labels: pull-request-available  (was: )

> Fix can not specify the sparkMaster of cleans run command
> -
>
> Key: HUDI-740
> URL: https://issues.apache.org/jira/browse/HUDI-740
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>
> Now, We can specify the sparkMaster of cleans run command, but it's not work. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong opened a new pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-03-26 Thread GitBox
hddong opened a new pull request #1452: [HUDI-740]Fix can not specify the 
sparkMaster of cleans run command
URL: https://github.com/apache/incubator-hudi/pull/1452
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Now, We can specify the sparkMaster of cleans run command, but it's not 
work.*
   
   ## Brief change log
   
   *(for example:)*
 - *Fix specify the sparkMaster of cleans run command*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-740) Fix can not specify the sparkMaster of cleans run command

2020-03-26 Thread hong dongdong (Jira)
hong dongdong created HUDI-740:
--

 Summary: Fix can not specify the sparkMaster of cleans run command
 Key: HUDI-740
 URL: https://issues.apache.org/jira/browse/HUDI-740
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: CLI
Reporter: hong dongdong
Assignee: hong dongdong


Now, We can specify the sparkMaster of cleans run command, but it's not work. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] xiangtao commented on issue #143: Tracking ticket for folks to be added to slack group

2020-03-26 Thread GitBox
xiangtao commented on issue #143: Tracking ticket for folks to be added to 
slack group
URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-604786981
 
 
   Please add:  490548...@qq.com  
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf merged pull request #1451: [MINOR] Add error message when check arguments

2020-03-26 Thread GitBox
leesf merged pull request #1451: [MINOR] Add error message when check arguments
URL: https://github.com/apache/incubator-hudi/pull/1451
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [MINOR] Add error message when check arguments (#1451)

2020-03-26 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1713f68  [MINOR] Add error message when check arguments (#1451)
1713f68 is described below

commit 1713f686f86e8c2f0a908c313cca9b595c6aed33
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Thu Mar 26 19:21:38 2020 -0700

[MINOR] Add error message when check arguments (#1451)
---
 .../main/java/org/apache/hudi/config/HoodieCompactionConfig.java| 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java 
b/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
index 074ea78..a7275c8 100644
--- 
a/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
+++ 
b/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
@@ -288,7 +288,11 @@ public class HoodieCompactionConfig extends 
DefaultHoodieConfig {
   int maxInstantsToKeep = 
Integer.parseInt(props.getProperty(HoodieCompactionConfig.MAX_COMMITS_TO_KEEP_PROP));
   int cleanerCommitsRetained =
   
Integer.parseInt(props.getProperty(HoodieCompactionConfig.CLEANER_COMMITS_RETAINED_PROP));
-  ValidationUtils.checkArgument(maxInstantsToKeep > minInstantsToKeep);
+  ValidationUtils.checkArgument(maxInstantsToKeep > minInstantsToKeep,
+  String.format(
+  "Increase %s=%d to be greater than %s=%d.",
+  HoodieCompactionConfig.MAX_COMMITS_TO_KEEP_PROP, 
maxInstantsToKeep,
+  HoodieCompactionConfig.MIN_COMMITS_TO_KEEP_PROP, 
minInstantsToKeep));
   ValidationUtils.checkArgument(minInstantsToKeep > cleanerCommitsRetained,
   String.format(
   "Increase %s=%d to be greater than %s=%d. Otherwise, there is 
risk of incremental pull "



[GitHub] [incubator-hudi] xushiyan opened a new pull request #1451: [MINOR] Add error message when check arguments

2020-03-26 Thread GitBox
xushiyan opened a new pull request #1451: [MINOR] Add error message when check 
arguments
URL: https://github.com/apache/incubator-hudi/pull/1451
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1421: [HUDI-724] Parallelize getSmallFiles for partitions

2020-03-26 Thread GitBox
umehrot2 commented on issue #1421: [HUDI-724] Parallelize getSmallFiles for 
partitions
URL: https://github.com/apache/incubator-hudi/pull/1421#issuecomment-604748097
 
 
   @vinothchandar @bvaradar does this look good to be merged ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason opened a new pull request #1450: [MINOR] Adding .codecov.yml to set exclusions for code coverage reports.

2020-03-26 Thread GitBox
prashantwason opened a new pull request #1450: [MINOR] Adding .codecov.yml to 
set exclusions for code coverage reports.
URL: https://github.com/apache/incubator-hudi/pull/1450
 
 
   ## What is the purpose of the pull request
   
   Adding the configuration file for codecov which allows us to exclude 
specific java files from code coverage. 
   
   ## Brief change log
   
   ## Verify this pull request
   Does not change any code or logic. Results can be seen on codecov.io website.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-26 Thread GitBox
smarthi commented on a change in pull request #1159: [HUDI-479] Eliminate or 
Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398910413
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class CollectionUtils {
+  /**
+   * Determines whether two iterators contain equal elements in the same 
order. More specifically,
+   * this method returns {@code true} if {@code iterator1} and {@code 
iterator2} contain the same
+   * number of elements and every element of {@code iterator1} is equal to the 
corresponding element
+   * of {@code iterator2}.
+   *
+   * Note that this will modify the supplied iterators, since they will 
have been advanced some
+   * number of elements forward.
+   */
+  public static boolean elementsEqual(Iterator iterator1, Iterator 
iterator2) {
+while (iterator1.hasNext()) {
+  if (!iterator2.hasNext()) {
+return false;
+  }
+  Object o1 = iterator1.next();
+  Object o2 = iterator2.next();
+  if (!Objects.equals(o1, o2)) {
+return false;
+  }
+}
+return !iterator2.hasNext();
+  }
+
+  @SafeVarargs
+  public static  Set createSetFromElements(final T... elements) {
+return Stream.of(elements).collect(Collectors.toSet());
+  }
+
+  public static  Map createImmutableMap(final K key, final V value) 
{
+return Collections.unmodifiableMap(Collections.singletonMap(key, value));
+  }
+
+  @SafeVarargs
+  public static  List createImmutableList(final T... elements) {
+return 
Collections.unmodifiableList(Stream.of(elements).collect(Collectors.toList()));
+  }
+
+  public static  Map createImmutableMap(final Map map) {
+return Collections.unmodifiableMap(map);
+  }
+
+  @SafeVarargs
+  public static  Set createImmutableSet(final T... elements) {
+return 
Collections.unmodifiableSet(Stream.of(elements).collect(Collectors.toSet()));
+  }
+
+  public static  Set createImmutableSet(final Set set) {
+return Collections.unmodifiableSet(set);
+  }
+
+  public static  List createImmutableList(final List list) {
+return Collections.unmodifiableList(list);
+  }
+
+  private static Object[] checkElementsNotNull(Object... array) {
+return checkElementsNotNull(array, array.length);
+  }
+
+  private static Object[] checkElementsNotNull(Object[] array, int length) {
+for (int i = 0; i < length; i++) {
+  checkElementNotNull(array[i], i);
+}
+return array;
+  }
+
+  private static Object checkElementNotNull(Object element, int index) {
+if (element == null) {
+  throw new NullPointerException("at index " + index);
+}
+return element;
+  }
+
+  public static class Maps {
 
 Review comment:
   Fixed it as part of this PR including removing CollectionUtils.Maps - 
HUDI-737 can be closed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-26 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068035#comment-17068035
 ] 

Vinoth Chandar commented on HUDI-677:
-

[~hongdongdong] We cannot break any existing APIs on the HoodieWriteClient, the 
change has to be internal to this class.. 

How about I write up a concrete proposal, given I have context into the various 
usage patterns/dependencies and you can add your thoughts on top and drive the 
implementation? 

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: hong dongdong
>Priority: Major
> Fix For: 0.6.0
>
>
> Over time a lot of the core transaction management code has been  split 
> across various files in hudi-client.. We want to clean this up and present a 
> nice interface.. 
> Some notes and thoughts and suggestions..  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-739) HoodieIOException: Could not delete in-flight instant

2020-03-26 Thread Catalin Alexandru Zamfir (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Catalin Alexandru Zamfir updated HUDI-739:
--
Description: 
We are evaluating Hudi to use for our near real-time ingestion needs, compared 
to other solutions (Delta/Iceberg). We've picked Hudi because pre-installed 
with Amazon EMR by AWS. However, adopting it is blocking on this issue with 
concurrent small batch (of 256 files) write jobs (to the same S3 path).

Using Livy we're triggering Spark jobs writing Hudi tables over S3, on EMR with 
EMRFS active. Paths are using the "s3://" prefix and EMRFS is active. We're 
writing Spark SQL datasets promoted up from RDDs. The 
"hoodie.consistency.check.enabled" is set to true. Spark serializer is Kryo. 
Hoodie version is 0.5.0-incubating.

Both on COW and MOR tables some of the submitted jobs are failing with the 
below exception:
{code:java}
org.apache.hudi.exception.HoodieIOException: Could not delete in-flight instant 
[==>20200326175252__deltacommit__INFLIGHT]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.deleteInstantFile(HoodieActiveTimeline.java:239)
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.deleteInflight(HoodieActiveTimeline.java:222)
at 
org.apache.hudi.table.HoodieCopyOnWriteTable.deleteInflightInstant(HoodieCopyOnWriteTable.java:380)
at 
org.apache.hudi.table.HoodieMergeOnReadTable.rollback(HoodieMergeOnReadTable.java:327)
at 
org.apache.hudi.HoodieWriteClient.doRollbackAndGetStats(HoodieWriteClient.java:834)
at 
org.apache.hudi.HoodieWriteClient.rollbackInternal(HoodieWriteClient.java:907)
at 
org.apache.hudi.HoodieWriteClient.rollback(HoodieWriteClient.java:733)
at 
org.apache.hudi.HoodieWriteClient.rollbackInflightCommits(HoodieWriteClient.java:1121)
at 
org.apache.hudi.HoodieWriteClient.startCommitWithTime(HoodieWriteClient.java:994)
at 
org.apache.hudi.HoodieWriteClient.startCommit(HoodieWriteClient.java:987)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:141)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
{code}
The jobs are sent in concurrent batches of 256 files, over the same S3 path, in 
total some 8k files for 6 hours of our data.

Writing happens with the following code (basePath is an S3 bucket):
{code:java}
// Configs (edited)
String databaseName = "nrt";
String assumeYmdPartitions = "false";
String extractorClass = MultiPartKeysValueExtractor.class.getName ();
String tableType = DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL ();
String tableOperation = DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL ();
String hiveJdbcUri = "jdbc:hive2://ip-x-y-z-q.eu-west-1.compute.internal:1";
String basePath = "s3://some_path_to_hudi"; // or "s3a://" does not seem to 

[jira] [Updated] (HUDI-739) HoodieIOException: Could not delete in-flight instant

2020-03-26 Thread Catalin Alexandru Zamfir (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Catalin Alexandru Zamfir updated HUDI-739:
--
Description: 
We are evaluating Hudi to use for our near real-time ingestion needs, compared 
to other solutions (Delta/Iceberg). We've picked Hudi because pre-installed 
with Amazon EMR by AWS. However, adopting it is blocking on this issue with 
concurrent small batch (of 256 files) write jobs (to the same S3 path).

Using Livy we're triggering Spark jobs writing Hudi tables over S3, on EMR with 
EMRFS active. Paths are using the "s3://" prefix and EMRFS is active. We're 
writing Spark SQL datasets promoted up from RDDs. The 
"hoodie.consistency.check.enabled" is set to true. Spark serializer is Kryo. 
Hoodie version is 0.5.0-incubating.

Both on COW and MOR tables some of the submitted jobs are failing with the 
below exception:
{code:java}
org.apache.hudi.exception.HoodieIOException: Could not delete in-flight instant 
[==>20200326175252__deltacommit__INFLIGHT]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.deleteInstantFile(HoodieActiveTimeline.java:239)
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.deleteInflight(HoodieActiveTimeline.java:222)
at 
org.apache.hudi.table.HoodieCopyOnWriteTable.deleteInflightInstant(HoodieCopyOnWriteTable.java:380)
at 
org.apache.hudi.table.HoodieMergeOnReadTable.rollback(HoodieMergeOnReadTable.java:327)
at 
org.apache.hudi.HoodieWriteClient.doRollbackAndGetStats(HoodieWriteClient.java:834)
at 
org.apache.hudi.HoodieWriteClient.rollbackInternal(HoodieWriteClient.java:907)
at 
org.apache.hudi.HoodieWriteClient.rollback(HoodieWriteClient.java:733)
at 
org.apache.hudi.HoodieWriteClient.rollbackInflightCommits(HoodieWriteClient.java:1121)
at 
org.apache.hudi.HoodieWriteClient.startCommitWithTime(HoodieWriteClient.java:994)
at 
org.apache.hudi.HoodieWriteClient.startCommit(HoodieWriteClient.java:987)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:141)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
{code}
The jobs are sent in concurrent batches of 256 files, over the same S3 path, in 
total some 8k files for 6 hours of our data.

Writing happens with the following code (basePath is an S3 bucket):
{code:java}
// Configs (edited)
String databaseName = "nrt";
String assumeYmdPartitions = "false";
String extractorClass = MultiPartKeysValueExtractor.class.getName ();
String tableType = DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL ();
String tableOperation = DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL ();
String hiveJdbcUri = "jdbc:hive2://ip-x-y-z-q.eu-west-1.compute.internal:1";
String basePath = "s3a://some_path_to_hudi";
String avroSchemaAsString = 

[jira] [Updated] (HUDI-739) HoodieIOException: Could not delete in-flight instant

2020-03-26 Thread Catalin Alexandru Zamfir (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Catalin Alexandru Zamfir updated HUDI-739:
--
Description: 
Using Livy we're triggering Spark jobs writing Hudi tables over S3, on EMR with 
EMRFS active. Paths are using the "s3://" prefix and EMRFS is active. We're 
writing Spark SQL datasets promoted up from RDDs. The 
"hoodie.consistency.check.enabled" is set to true. Spark serializer is Kryo. 
Hoodie version is 0.5.0-incubating.

Both on COW and MOR tables some of the submitted jobs are failing with the 
below exception:
{code:java}
org.apache.hudi.exception.HoodieIOException: Could not delete in-flight instant 
[==>20200326175252__deltacommit__INFLIGHT]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.deleteInstantFile(HoodieActiveTimeline.java:239)
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.deleteInflight(HoodieActiveTimeline.java:222)
at 
org.apache.hudi.table.HoodieCopyOnWriteTable.deleteInflightInstant(HoodieCopyOnWriteTable.java:380)
at 
org.apache.hudi.table.HoodieMergeOnReadTable.rollback(HoodieMergeOnReadTable.java:327)
at 
org.apache.hudi.HoodieWriteClient.doRollbackAndGetStats(HoodieWriteClient.java:834)
at 
org.apache.hudi.HoodieWriteClient.rollbackInternal(HoodieWriteClient.java:907)
at 
org.apache.hudi.HoodieWriteClient.rollback(HoodieWriteClient.java:733)
at 
org.apache.hudi.HoodieWriteClient.rollbackInflightCommits(HoodieWriteClient.java:1121)
at 
org.apache.hudi.HoodieWriteClient.startCommitWithTime(HoodieWriteClient.java:994)
at 
org.apache.hudi.HoodieWriteClient.startCommit(HoodieWriteClient.java:987)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:141)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
{code}
The jobs are sent in concurrent batches of 256 files, over the same S3 path, in 
total some 8k files for 6 hours of our data.

Writing happens with the following code (basePath is an S3 bucket):
{code:java}
// Configs (edited)
String databaseName = "nrt";
String assumeYmdPartitions = "false";
String extractorClass = MultiPartKeysValueExtractor.class.getName ();
String tableType = DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL ();
String tableOperation = DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL ();
String hiveJdbcUri = "jdbc:hive2://ip-x-y-z-q.eu-west-1.compute.internal:1";
String basePath = "s3a://some_path_to_hudi";
String avroSchemaAsString = avroSchema.toString ();
String tableName = avroSchema.getName ().toLowerCase ().replace ("avro", "");

eventsDataset.write ()
.format ("org.apache.hudi")
.option (HoodieWriteConfig.TABLE_NAME, tableName)
.option (DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY (), tableType)
.option 

[jira] [Created] (HUDI-739) HoodieIOException: Could not delete in-flight instant

2020-03-26 Thread Catalin Alexandru Zamfir (Jira)
Catalin Alexandru Zamfir created HUDI-739:
-

 Summary: HoodieIOException: Could not delete in-flight instant
 Key: HUDI-739
 URL: https://issues.apache.org/jira/browse/HUDI-739
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Common Core
Affects Versions: 0.5.0
Reporter: Catalin Alexandru Zamfir


Using Livy we're triggering Spark jobs writing Hudi tables over S3, on EMR with 
EMRFS active. Paths are using the "s3://" prefix and EMRFS is active. We're 
writing Spark SQL datasets promoted up from RDDs. The 
"hoodie.consistency.check.enabled" is set to true. Spark serializer is Kryo. 
Hoodie version is 0.5.0-incubating.

Both on COW and MOR tables some of the submitted jobs are failing with the 
below exception:
{code:java}
org.apache.hudi.exception.HoodieIOException: Could not delete in-flight instant 
[==>20200326175252__deltacommit__INFLIGHT]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.deleteInstantFile(HoodieActiveTimeline.java:239)
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.deleteInflight(HoodieActiveTimeline.java:222)
at 
org.apache.hudi.table.HoodieCopyOnWriteTable.deleteInflightInstant(HoodieCopyOnWriteTable.java:380)
at 
org.apache.hudi.table.HoodieMergeOnReadTable.rollback(HoodieMergeOnReadTable.java:327)
at 
org.apache.hudi.HoodieWriteClient.doRollbackAndGetStats(HoodieWriteClient.java:834)
at 
org.apache.hudi.HoodieWriteClient.rollbackInternal(HoodieWriteClient.java:907)
at 
org.apache.hudi.HoodieWriteClient.rollback(HoodieWriteClient.java:733)
at 
org.apache.hudi.HoodieWriteClient.rollbackInflightCommits(HoodieWriteClient.java:1121)
at 
org.apache.hudi.HoodieWriteClient.startCommitWithTime(HoodieWriteClient.java:994)
at 
org.apache.hudi.HoodieWriteClient.startCommit(HoodieWriteClient.java:987)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:141)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:84)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
{code}
The jobs are sent in concurrent batches of 256 files, over the same S3 path, in 
total some 8k files for 6 hours of our data.

Writing happens with the following code (basePath is an S3 bucket):
{code:java}
eventsDataset.write ()
.format ("org.apache.hudi")
.option (HoodieWriteConfig.TABLE_NAME, tableName)
.option (DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY (), tableType)
.option (DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY (), "id")
.option (DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY (), 
"partition_path")
.option (DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY (), "timestamp")
.option (DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY (), "true")
.option (DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY (), 

[GitHub] [incubator-hudi] smarthi commented on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-26 Thread GitBox
smarthi commented on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava 
if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-604547864
 
 
   > @smarthi The CollectionUtils IMO does not seem to reuse any code from 
guava actually.. it just wraps Java Streams/Collection apis in most cases.. SO 
we need not attribute anything in LICENSE/NOTICE.. Please clarify
   
   that's Correct. No attribution needed in LICENSE or NOTICe whatsoever.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-737) Simplify/Eliminate need for CollectionUtils#Maps/MapsBuilder

2020-03-26 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned HUDI-737:
--

Assignee: Suneel Marthi

> Simplify/Eliminate need for CollectionUtils#Maps/MapsBuilder
> 
>
> Key: HUDI-737
> URL: https://issues.apache.org/jira/browse/HUDI-737
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-738) Add error msg in DeltaStreamer if `filterDupes=true` is enabled for `operation=UPSERT`.

2020-03-26 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-738:
--

 Summary: Add error msg in DeltaStreamer if  `filterDupes=true` is 
enabled for `operation=UPSERT`. 
 Key: HUDI-738
 URL: https://issues.apache.org/jira/browse/HUDI-738
 Project: Apache Hudi (incubating)
  Issue Type: Task
  Components: DeltaStreamer, newbie, Usability
Reporter: Bhavani Sudha
Assignee: Bhavani Sudha


This checks for dedupes with existing records in the table and thus ignores 
updates. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-26 Thread GitBox
vinothchandar commented on issue #1159: [HUDI-479] Eliminate or Minimize use of 
Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-604532141
 
 
   @smarthi The CollectionUtils IMO does not seem to reuse any code from guava 
actually.. it just wraps Java Streams/Collection apis in most cases.. SO we 
need not attribute anything in LICENSE/NOTICE.. Please clarify 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-26 Thread GitBox
vinothchandar commented on a change in pull request #1159: [HUDI-479] Eliminate 
or Minimize use of Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#discussion_r398713100
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java
 ##
 @@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class CollectionUtils {
+  /**
+   * Determines whether two iterators contain equal elements in the same 
order. More specifically,
+   * this method returns {@code true} if {@code iterator1} and {@code 
iterator2} contain the same
+   * number of elements and every element of {@code iterator1} is equal to the 
corresponding element
+   * of {@code iterator2}.
+   *
+   * Note that this will modify the supplied iterators, since they will 
have been advanced some
+   * number of elements forward.
+   */
+  public static boolean elementsEqual(Iterator iterator1, Iterator 
iterator2) {
+while (iterator1.hasNext()) {
+  if (!iterator2.hasNext()) {
+return false;
+  }
+  Object o1 = iterator1.next();
+  Object o2 = iterator2.next();
+  if (!Objects.equals(o1, o2)) {
+return false;
+  }
+}
+return !iterator2.hasNext();
+  }
+
+  @SafeVarargs
+  public static  Set createSetFromElements(final T... elements) {
+return Stream.of(elements).collect(Collectors.toSet());
+  }
+
+  public static  Map createImmutableMap(final K key, final V value) 
{
+return Collections.unmodifiableMap(Collections.singletonMap(key, value));
+  }
+
+  @SafeVarargs
+  public static  List createImmutableList(final T... elements) {
+return 
Collections.unmodifiableList(Stream.of(elements).collect(Collectors.toList()));
+  }
+
+  public static  Map createImmutableMap(final Map map) {
+return Collections.unmodifiableMap(map);
+  }
+
+  @SafeVarargs
+  public static  Set createImmutableSet(final T... elements) {
+return 
Collections.unmodifiableSet(Stream.of(elements).collect(Collectors.toSet()));
+  }
+
+  public static  Set createImmutableSet(final Set set) {
+return Collections.unmodifiableSet(set);
+  }
+
+  public static  List createImmutableList(final List list) {
+return Collections.unmodifiableList(list);
+  }
+
+  private static Object[] checkElementsNotNull(Object... array) {
+return checkElementsNotNull(array, array.length);
+  }
+
+  private static Object[] checkElementsNotNull(Object[] array, int length) {
+for (int i = 0; i < length; i++) {
+  checkElementNotNull(array[i], i);
+}
+return array;
+  }
+
+  private static Object checkElementNotNull(Object element, int index) {
+if (element == null) {
+  throw new NullPointerException("at index " + index);
+}
+return element;
+  }
+
+  public static class Maps {
 
 Review comment:
   https://issues.apache.org/jira/browse/HUDI-737 filed this 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-737) Simplify/Eliminate need for CollectionUtils#Maps/MapsBuilder

2020-03-26 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-737:
---

 Summary: Simplify/Eliminate need for 
CollectionUtils#Maps/MapsBuilder
 Key: HUDI-737
 URL: https://issues.apache.org/jira/browse/HUDI-737
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Code Cleanup
Reporter: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-736) Simplify ReflectionUtils#getTopLevelClasses

2020-03-26 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-736:
---

 Summary: Simplify ReflectionUtils#getTopLevelClasses 
 Key: HUDI-736
 URL: https://issues.apache.org/jira/browse/HUDI-736
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Code Cleanup
Reporter: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-678] Make config package spark free (#1418)

2020-03-26 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 8b0a400  [HUDI-678] Make config package spark free (#1418)
8b0a400 is described below

commit 8b0a4009a9c662f45a0b76f01c2b07c36809
Author: leesf <490081...@qq.com>
AuthorDate: Thu Mar 26 23:30:27 2020 +0800

[HUDI-678] Make config package spark free (#1418)
---
 .../hudi/client/AbstractHoodieWriteClient.java |  3 +-
 .../org/apache/hudi/client/HoodieWriteClient.java  |  5 +-
 .../apache/hudi/client/utils/SparkConfigUtils.java | 94 ++
 .../org/apache/hudi/config/HoodieMemoryConfig.java | 47 ---
 .../org/apache/hudi/config/HoodieWriteConfig.java  | 27 +--
 .../apache/hudi/index/bloom/HoodieBloomIndex.java  |  3 +-
 .../org/apache/hudi/index/hbase/HBaseIndex.java|  3 +-
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |  3 +-
 .../compact/HoodieMergeOnReadTableCompactor.java   |  6 +-
 9 files changed, 110 insertions(+), 81 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
 
b/hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
index 275e5d9..d1319b3 100644
--- 
a/hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
+++ 
b/hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
@@ -21,6 +21,7 @@ package org.apache.hudi.client;
 import java.util.Collections;
 
 import org.apache.hudi.avro.model.HoodieRollbackMetadata;
+import org.apache.hudi.client.utils.SparkConfigUtils;
 import org.apache.hudi.client.embedded.EmbeddedTimelineService;
 import org.apache.hudi.common.HoodieRollbackStat;
 import org.apache.hudi.common.model.HoodieCommitMetadata;
@@ -115,7 +116,7 @@ public abstract class AbstractHoodieWriteClient e
   String instantTime) {
 // cache writeStatusRDD before updating index, so that all actions before 
this are not triggered again for future
 // RDD actions that are performed after updating the index.
-writeStatusRDD = 
writeStatusRDD.persist(config.getWriteStatusStorageLevel());
+writeStatusRDD = 
writeStatusRDD.persist(SparkConfigUtils.getWriteStatusStorageLevel(config.getProps()));
 Timer.Context indexTimer = metrics.getIndexCtx();
 // Update the index back
 JavaRDD statuses = index.updateLocation(writeStatusRDD, jsc, 
table);
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java 
b/hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java
index 7ccb2a4..5f269a8 100644
--- a/hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java
+++ b/hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java
@@ -22,6 +22,7 @@ import org.apache.hudi.avro.model.HoodieCleanMetadata;
 import org.apache.hudi.avro.model.HoodieCompactionPlan;
 import org.apache.hudi.avro.model.HoodieRestoreMetadata;
 import org.apache.hudi.avro.model.HoodieSavepointMetadata;
+import org.apache.hudi.client.utils.SparkConfigUtils;
 import org.apache.hudi.client.embedded.EmbeddedTimelineService;
 import org.apache.hudi.common.HoodieRollbackStat;
 import org.apache.hudi.common.model.EmptyHoodieRecordPayload;
@@ -1077,7 +1078,7 @@ public class HoodieWriteClient extends AbstractHo
 HoodieTable table = HoodieTable.getHoodieTable(metaClient, config, jsc);
 JavaRDD statuses = table.compact(jsc, 
compactionInstant.getTimestamp(), compactionPlan);
 // Force compaction action
-statuses.persist(config.getWriteStatusStorageLevel());
+
statuses.persist(SparkConfigUtils.getWriteStatusStorageLevel(config.getProps()));
 // pass extra-metada so that it gets stored in commit file automatically
 commitCompaction(statuses, table, compactionInstant.getTimestamp(), 
autoCommit,
 Option.ofNullable(compactionPlan.getExtraMetadata()));
@@ -1172,4 +1173,4 @@ public class HoodieWriteClient extends AbstractHo
 });
 return compactionInstantTimeOpt;
   }
-}
\ No newline at end of file
+}
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/client/utils/SparkConfigUtils.java 
b/hudi-client/src/main/java/org/apache/hudi/client/utils/SparkConfigUtils.java
new file mode 100644
index 000..f6b8549
--- /dev/null
+++ 
b/hudi-client/src/main/java/org/apache/hudi/client/utils/SparkConfigUtils.java
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  

[GitHub] [incubator-hudi] vinothchandar merged pull request #1418: [HUDI-678] Make config package spark free

2020-03-26 Thread GitBox
vinothchandar merged pull request #1418: [HUDI-678] Make config package spark 
free
URL: https://github.com/apache/incubator-hudi/pull/1418
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-26 Thread GitBox
leesf commented on a change in pull request #1418: [HUDI-678] Make config 
package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398467934
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieMemoryConfig.java
 ##
 @@ -113,52 +112,20 @@ public Builder withWriteStatusFailureFraction(double 
failureFraction) {
   return this;
 }
 
-/**
- * Dynamic calculation of max memory to use for for spillable map. 
user.available.memory = spark.executor.memory *
- * (1 - spark.memory.fraction) spillable.available.memory = 
user.available.memory * hoodie.memory.fraction. Anytime
- * the spark.executor.memory or the spark.memory.fraction is changed, the 
memory used for spillable map changes
- * accordingly
- */
-private long getMaxMemoryAllowedForMerge(String maxMemoryFraction) {
-  final String SPARK_EXECUTOR_MEMORY_PROP = "spark.executor.memory";
-  final String SPARK_EXECUTOR_MEMORY_FRACTION_PROP = 
"spark.memory.fraction";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/memory/UnifiedMemoryManager.scala#L231} so have to re-define 
this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION = "0.6";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/SparkContext.scala#L471} so have to re-define this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_MB = "1024"; // in MB
-
-  if (SparkEnv.get() != null) {
-// 1 GB is the default conf used by Spark, look at SparkContext.scala
-long executorMemoryInBytes = Utils.memoryStringToMb(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_MB)) * 1024 * 1024L;
-// 0.6 is the default value used by Spark,
-// look at {@link
-// 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L507}
-double memoryFraction = Double.parseDouble(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_FRACTION_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION));
-double maxMemoryFractionForMerge = 
Double.parseDouble(maxMemoryFraction);
-double userAvailableMemory = executorMemoryInBytes * (1 - 
memoryFraction);
-long maxMemoryForMerge = (long) Math.floor(userAvailableMemory * 
maxMemoryFractionForMerge);
-return Math.max(DEFAULT_MIN_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES, 
maxMemoryForMerge);
-  } else {
-return DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
-  }
-}
-
 public HoodieMemoryConfig build() {
   HoodieMemoryConfig config = new HoodieMemoryConfig(props);
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP),
   MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP, 
DEFAULT_MAX_MEMORY_FRACTION_FOR_COMPACTION);
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FRACTION_FOR_MERGE_PROP),
   MAX_MEMORY_FRACTION_FOR_MERGE_PROP, 
DEFAULT_MAX_MEMORY_FRACTION_FOR_MERGE);
+  long maxMemoryAllowedForMerge =
+  
SparkConfigUtils.getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_MERGE_PROP));
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FOR_MERGE_PROP), MAX_MEMORY_FOR_MERGE_PROP,
-  
String.valueOf(getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_MERGE_PROP;
+  String.valueOf(maxMemoryAllowedForMerge));
+  long maxMemoryAllowedForCompaction =
+  
SparkConfigUtils.getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP));
 
 Review comment:
   > if the classes in `config` call the SparkConfigUtils, then we cannot claim 
its spark free right..
   > 
   > cc @yanghua as well
   
   Get the point, updated the PR to make it totally free.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-26 Thread GitBox
leesf commented on a change in pull request #1418: [HUDI-678] Make config 
package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398467190
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -503,22 +494,6 @@ public int getJmxPort() {
   /**
* memory configs.
*/
-  public Double getMaxMemoryFractionPerPartitionMerge() {
-return 
Double.valueOf(props.getProperty(HoodieMemoryConfig.MAX_MEMORY_FRACTION_FOR_MERGE_PROP));
-  }
-
-  public Double getMaxMemoryFractionPerCompaction() {
-return 
Double.valueOf(props.getProperty(HoodieMemoryConfig.MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP));
-  }
 
 Review comment:
   unused methods, remove them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-677) Abstract/Refactor all transaction management logic into a set of classes from HoodieWriteClient

2020-03-26 Thread hong dongdong (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067538#comment-17067538
 ] 

hong dongdong commented on HUDI-677:


[~vinoth]

As summary said, all transaction need move out of HoodieWriteClient. A brief 
description of what I'm going to do: 
[https://docs.google.com/document/d/1-hXvcpQz42zORDlrDhJ9xEf33cmynw9hwtIb2K5DeL0/edit?usp=sharing]
 . Do you have any other suggestions or reminders

> Abstract/Refactor all transaction management logic into a set of classes from 
> HoodieWriteClient
> ---
>
> Key: HUDI-677
> URL: https://issues.apache.org/jira/browse/HUDI-677
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: hong dongdong
>Priority: Major
> Fix For: 0.6.0
>
>
> Over time a lot of the core transaction management code has been  split 
> across various files in hudi-client.. We want to clean this up and present a 
> nice interface.. 
> Some notes and thoughts and suggestions..  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] loagosad commented on issue #1438: How to get the file name corresponding to HoodieKey through the GlobalBloomIndex

2020-03-26 Thread GitBox
loagosad commented on issue #1438: How to get the file name corresponding to 
HoodieKey through the GlobalBloomIndex 
URL: https://github.com/apache/incubator-hudi/issues/1438#issuecomment-604288934
 
 
   The Hudi version I test is 0.5.1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong opened a new pull request #1449: [HUDI-698]Add unit test for CleansCommand

2020-03-26 Thread GitBox
hddong opened a new pull request #1449: [HUDI-698]Add unit test for 
CleansCommand
URL: https://github.com/apache/incubator-hudi/pull/1449
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Add unit test for CleansCommand in hudi-cli module*
   
   ## Brief change log
   
 - *Add unit test for CleansCommand*
   
   ## Verify this pull request
   
 - *Add unit test for CleansCommand.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-698) Add unit test for CleansCommand

2020-03-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-698:

Labels: pull-request-available  (was: )

> Add unit test for CleansCommand
> ---
>
> Key: HUDI-698
> URL: https://issues.apache.org/jira/browse/HUDI-698
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: CLI, Testing
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>
> Add unit test for CleansCommand in hudi-cli module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1418: [HUDI-678] Make config package spark free

2020-03-26 Thread GitBox
yanghua commented on a change in pull request #1418: [HUDI-678] Make config 
package spark free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398345709
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieMemoryConfig.java
 ##
 @@ -113,52 +112,20 @@ public Builder withWriteStatusFailureFraction(double 
failureFraction) {
   return this;
 }
 
-/**
- * Dynamic calculation of max memory to use for for spillable map. 
user.available.memory = spark.executor.memory *
- * (1 - spark.memory.fraction) spillable.available.memory = 
user.available.memory * hoodie.memory.fraction. Anytime
- * the spark.executor.memory or the spark.memory.fraction is changed, the 
memory used for spillable map changes
- * accordingly
- */
-private long getMaxMemoryAllowedForMerge(String maxMemoryFraction) {
-  final String SPARK_EXECUTOR_MEMORY_PROP = "spark.executor.memory";
-  final String SPARK_EXECUTOR_MEMORY_FRACTION_PROP = 
"spark.memory.fraction";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/memory/UnifiedMemoryManager.scala#L231} so have to re-define 
this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION = "0.6";
-  // This is hard-coded in spark code {@link
-  // 
https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-  // spark/SparkContext.scala#L471} so have to re-define this here
-  final String DEFAULT_SPARK_EXECUTOR_MEMORY_MB = "1024"; // in MB
-
-  if (SparkEnv.get() != null) {
-// 1 GB is the default conf used by Spark, look at SparkContext.scala
-long executorMemoryInBytes = Utils.memoryStringToMb(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_MB)) * 1024 * 1024L;
-// 0.6 is the default value used by Spark,
-// look at {@link
-// 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L507}
-double memoryFraction = Double.parseDouble(
-SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_FRACTION_PROP, 
DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION));
-double maxMemoryFractionForMerge = 
Double.parseDouble(maxMemoryFraction);
-double userAvailableMemory = executorMemoryInBytes * (1 - 
memoryFraction);
-long maxMemoryForMerge = (long) Math.floor(userAvailableMemory * 
maxMemoryFractionForMerge);
-return Math.max(DEFAULT_MIN_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES, 
maxMemoryForMerge);
-  } else {
-return DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
-  }
-}
-
 public HoodieMemoryConfig build() {
   HoodieMemoryConfig config = new HoodieMemoryConfig(props);
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP),
   MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP, 
DEFAULT_MAX_MEMORY_FRACTION_FOR_COMPACTION);
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FRACTION_FOR_MERGE_PROP),
   MAX_MEMORY_FRACTION_FOR_MERGE_PROP, 
DEFAULT_MAX_MEMORY_FRACTION_FOR_MERGE);
+  long maxMemoryAllowedForMerge =
+  
SparkConfigUtils.getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_MERGE_PROP));
   setDefaultOnCondition(props, 
!props.containsKey(MAX_MEMORY_FOR_MERGE_PROP), MAX_MEMORY_FOR_MERGE_PROP,
-  
String.valueOf(getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_MERGE_PROP;
+  String.valueOf(maxMemoryAllowedForMerge));
+  long maxMemoryAllowedForCompaction =
+  
SparkConfigUtils.getMaxMemoryAllowedForMerge(props.getProperty(MAX_MEMORY_FRACTION_FOR_COMPACTION_PROP));
 
 Review comment:
   Yes, you are right. Maybe we have to extract these two lines into an extra 
method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1448: [MINOR] Update DOAP with 0.5.2 Release

2020-03-26 Thread GitBox
yanghua commented on a change in pull request #1448: [MINOR] Update DOAP with 
0.5.2 Release
URL: https://github.com/apache/incubator-hudi/pull/1448#discussion_r398342379
 
 

 ##
 File path: doap_HUDI.rdf
 ##
 @@ -46,6 +46,11 @@
 2020-01-31
 0.5.1
   
+  
 
 Review comment:
   Have added a step in "Finalize the release" section:
   
   ```
   Update DOAP file in the root of the project via sending a PR like this one.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1448: [MINOR] Update DOAP with 0.5.2 Release

2020-03-26 Thread GitBox
yanghua commented on a change in pull request #1448: [MINOR] Update DOAP with 
0.5.2 Release
URL: https://github.com/apache/incubator-hudi/pull/1448#discussion_r398339368
 
 

 ##
 File path: doap_HUDI.rdf
 ##
 @@ -46,6 +46,11 @@
 2020-01-31
 0.5.1
   
+  
 
 Review comment:
   OK 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-731) Implement a chained transformer for deltastreamer that can chain other transformer implementations

2020-03-26 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-731:

Fix Version/s: 0.6.0

> Implement a chained transformer for deltastreamer that can chain other 
> transformer implementations
> --
>
> Key: HUDI-731
> URL: https://issues.apache.org/jira/browse/HUDI-731
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: DeltaStreamer, Utilities
>Reporter: Vinoth Chandar
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)