[hudi] branch hudi_test_suite_refactor updated (80add00 -> cc7c314)

2020-07-18 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/hudi.git.


 discard 80add00  [HUDI-394] Provide a basic implementation of test suite
 add cc7c314  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (80add00)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (cc7c314)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 hudi-integ-test/src/test/resources/log4j-surefire-quiet.properties | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)



[GitHub] [hudi] shenh062326 commented on a change in pull request #1769: [DOC] Add document for the use of metrics system in Hudi.

2020-07-18 Thread GitBox


shenh062326 commented on a change in pull request #1769:
URL: https://github.com/apache/hudi/pull/1769#discussion_r456859789



##
File path: docs/_docs/2_8_metrics.md
##
@@ -0,0 +1,108 @@
+---
+title: Metrics Guide
+keywords: hudi, administration, operation, devops, metrics
+permalink: /docs/metrics.html
+summary: This section offers an overview of metrics in Hudi
+toc: true
+last_modified_at: 2020-06-20T15:59:57-04:00
+---
+
+In this section, We will introduce the metrics and metricsReporter in Hudi. 
You can view the metrics configuration 
[here](configurations.html#metrics-configs).
+
+## Metrics

Review comment:
   Yes, I will use HoodieMetrics instand of Metrics.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] shenh062326 commented on a change in pull request #1769: [DOC] Add document for the use of metrics system in Hudi.

2020-07-18 Thread GitBox


shenh062326 commented on a change in pull request #1769:
URL: https://github.com/apache/hudi/pull/1769#discussion_r456859658



##
File path: docs/_docs/2_8_metrics.md
##
@@ -0,0 +1,108 @@
+---
+title: Metrics Guide
+keywords: hudi, administration, operation, devops, metrics
+permalink: /docs/metrics.html
+summary: This section offers an overview of metrics in Hudi
+toc: true
+last_modified_at: 2020-06-20T15:59:57-04:00
+---
+
+In this section, We will introduce the metrics and metricsReporter in Hudi. 
You can view the metrics configuration 
[here](configurations.html#metrics-configs).
+
+## Metrics
+
+Once the Hudi writer is configured with the right table and environment for 
metrics, it produces the following graphite metrics, that aid in debugging hudi 
tables

Review comment:
   > > configured with the right table and environment for metrics
   > 
   > Do you think it'll be clearer to illustrate how to achieve this before 
moving on to the followings?
   
   I will move this part to the end.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Build failed in Jenkins: hudi-snapshot-deployment-0.5 #343

2020-07-18 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.36 KB...]

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[GitHub] [hudi] shenh062326 commented on a change in pull request #1769: [DOC] Add document for the use of metrics system in Hudi.

2020-07-18 Thread GitBox


shenh062326 commented on a change in pull request #1769:
URL: https://github.com/apache/hudi/pull/1769#discussion_r456852020



##
File path: docs/_docs/2_8_metrics.md
##
@@ -0,0 +1,108 @@
+---
+title: Metrics Guide
+keywords: hudi, administration, operation, devops, metrics
+permalink: /docs/metrics.html
+summary: This section offers an overview of metrics in Hudi
+toc: true
+last_modified_at: 2020-06-20T15:59:57-04:00
+---
+
+In this section, We will introduce the metrics and metricsReporter in Hudi. 
You can view the metrics configuration 
[here](configurations.html#metrics-configs).

Review comment:
   Yes, thanks for you comments, I will fix it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] shenh062326 commented on pull request #1838: [HUDI-1082] Fix minor bug in deciding the insert buckets

2020-07-18 Thread GitBox


shenh062326 commented on pull request #1838:
URL: https://github.com/apache/hudi/pull/1838#issuecomment-660575757


   Add test in TestUpsertPartitioner, set partition newInsertBucket0.weight = 
0.3, newInsertBucket1.weight = 0.7.
   
   ```
 @Test
 public void testGetPartitioner2() throws Exception {
   String testPartitionPath1 = "2016/09/26";
   HoodieWriteConfig config = makeHoodieClientConfigBuilder()
   
.withCompactionConfig(HoodieCompactionConfig.newBuilder().compactionSmallFileSize(0)
   
.insertSplitSize(100).autoTuneInsertSplits(false).build())
   
.withStorageConfig(HoodieStorageConfig.newBuilder().limitFileSize(1000 * 
1024).build()).build();
   
   HoodieClientTestUtils.fakeCommitFile(basePath, "001");
   metaClient = HoodieTableMetaClient.reload(metaClient);
   HoodieCopyOnWriteTable table = (HoodieCopyOnWriteTable) 
HoodieTable.create(metaClient, config, hadoopConf);
   HoodieTestDataGenerator dataGenerator1 = new HoodieTestDataGenerator(new 
String[] {testPartitionPath1});
   List insertRecords1 = 
dataGenerator1.generateInserts("001", 200);
   List records1 = new ArrayList<>();
   records1.addAll(insertRecords1);
   
   WorkloadProfile profile = new WorkloadProfile(jsc.parallelize(records1));
   UpsertPartitioner partitioner = new UpsertPartitioner(profile, jsc, 
table, config);
   List insertBuckets = 
partitioner.getInsertBuckets(testPartitionPath1);
   InsertBucket newInsertBucket0 = new InsertBucket();
   newInsertBucket0.bucketNumber = 0;
   newInsertBucket0.weight = 0.3;
   insertBuckets.remove(0);
   insertBuckets.add(0, newInsertBucket0);
   
   InsertBucket newInsertBucket1 = new InsertBucket();
   newInsertBucket1.bucketNumber = 1;
   newInsertBucket1.weight = 0.7;
   insertBuckets.remove(1);
   insertBuckets.add(1, newInsertBucket1);
   
   Map partition2numRecords = new HashMap();
   for (HoodieRecord hoodieRecord: insertRecords1) {
 int partition = partitioner.getPartition(new Tuple2<>(
 hoodieRecord.getKey(), 
Option.ofNullable(hoodieRecord.getCurrentLocation(;
 if (!partition2numRecords.containsKey(partition)) {
   partition2numRecords.put(partition, 0);
 }
 int num = partition2numRecords.get(partition);
 partition2numRecords.put(partition, num + 1);
   }
   System.out.println(partition2numRecords);
 }
   ```
   
   Test it five times, the results show that the number of records inserted to 
a bucket approximately matches the bucket weight.
   ```
   {0=66, 1=134}
   {0=63, 1=137}
   {0=64, 1=136}
   {0=67, 1=133}
   {0=68, 1=132}
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch hudi_test_suite_refactor updated (88d8929 -> 80add00)

2020-07-18 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/hudi.git.


 discard 88d8929  [HUDI-394] Provide a basic implementation of test suite
 add 80add00  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (88d8929)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (80add00)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 hudi-integ-test/src/test/resources/log4j-surefire-quiet.properties | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)



[GitHub] [hudi] vinothchandar commented on pull request #1765: [HUDI-1049] 0.5.3 Patch - In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-07-18 Thread GitBox


vinothchandar commented on pull request #1765:
URL: https://github.com/apache/hudi/pull/1765#issuecomment-660563449


   @zuyanton .I am going to rebase this onto master and then land for 0.6.0. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1112) Blog on Tracking Hudi Data along transaction time and buisness time

2020-07-18 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1112:


 Summary: Blog on Tracking Hudi Data along transaction time and 
buisness time
 Key: HUDI-1112
 URL: https://issues.apache.org/jira/browse/HUDI-1112
 Project: Apache Hudi
  Issue Type: Task
  Components: Docs
Reporter: Vinoth Chandar
 Fix For: 0.6.0


https://github.com/apache/hudi/issues/1705



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on issue #1705: Tracking Hudi Data along transaction time and buisness time

2020-07-18 Thread GitBox


vinothchandar commented on issue #1705:
URL: https://github.com/apache/hudi/issues/1705#issuecomment-660556777


   Thanks for the persistence. https://issues.apache.org/jira/browse/HUDI-1112. 
all yours. please grab it when ready



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-1102] Add common useful Spark related and Table path detection utilities (#1841)

2020-07-18 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1aae437  [HUDI-1102] Add common useful Spark related and Table path 
detection utilities (#1841)
1aae437 is described below

commit 1aae437257cfd94fd277cf667257f6abffcc0c21
Author: Udit Mehrotra 
AuthorDate: Sat Jul 18 16:16:32 2020 -0700

[HUDI-1102] Add common useful Spark related and Table path detection 
utilities (#1841)

Co-authored-by: Mehrotra 
---
 .../hudi/common/table/HoodieTableMetaClient.java   |   1 +
 .../apache/hudi/common/util/TablePathUtils.java| 110 ++
 .../hudi/common/util/TestTablePathUtils.java   | 126 +
 .../main/java/org/apache/hudi/DataSourceUtils.java |  23 
 .../scala/org/apache/hudi/HudiSparkUtils.scala |  50 
 .../scala/org/apache/hudi/TestHudiSparkUtils.scala | 105 +
 6 files changed, 415 insertions(+)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
index 9675b77..b047595 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
@@ -73,6 +73,7 @@ public class HoodieTableMetaClient implements Serializable {
   public static final String METAFOLDER_NAME = ".hoodie";
   public static final String TEMPFOLDER_NAME = METAFOLDER_NAME + 
File.separator + ".temp";
   public static final String AUXILIARYFOLDER_NAME = METAFOLDER_NAME + 
File.separator + ".aux";
+  public static final String BOOTSTRAP_INDEX_ROOT_FOLDER_PATH = 
AUXILIARYFOLDER_NAME + File.separator + ".bootstrap";
   public static final String MARKER_EXTN = ".marker";
 
   private String basePath;
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/util/TablePathUtils.java 
b/hudi-common/src/main/java/org/apache/hudi/common/util/TablePathUtils.java
new file mode 100644
index 000..6982fdb
--- /dev/null
+++ b/hudi-common/src/main/java/org/apache/hudi/common/util/TablePathUtils.java
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+
+public class TablePathUtils {
+
+  private static final Logger LOG = LogManager.getLogger(TablePathUtils.class);
+
+  private static boolean hasTableMetadataFolder(FileSystem fs, Path path) {
+if (path == null) {
+  return false;
+}
+
+try {
+  return fs.exists(new Path(path, HoodieTableMetaClient.METAFOLDER_NAME));
+} catch (IOException ioe) {
+  throw new HoodieException("Error checking Hoodie metadata folder for " + 
path, ioe);
+}
+  }
+
+  public static Option getTablePath(FileSystem fs, Path path) throws 
HoodieException, IOException {
+LOG.info("Getting table path from path : " + path);
+
+FileStatus fileStatus = fs.getFileStatus(path);
+Path directory = fileStatus.isFile() ? fileStatus.getPath().getParent() : 
fileStatus.getPath();
+
+if (TablePathUtils.hasTableMetadataFolder(fs, directory)) {
+  // Handle table folder itself
+  return Option.of(directory);
+}
+
+// Handle metadata folder or metadata sub folder path
+Option tablePath = getTablePathFromTableMetadataPath(fs, directory);
+if (tablePath.isPresent()) {
+  return tablePath;
+}
+
+// Handle partition folder
+return getTablePathFromPartitionPath(fs, directory);
+  }
+
+  private static boolean isTableMetadataFolder(String path) {
+return path != null && path.endsWith("/" + 
HoodieTableMetaClient.METAFOLDER_NAME);
+  }
+
+  private static 

[GitHub] [hudi] vinothchandar merged pull request #1841: [HUDI-1102] Add common useful Spark related and Table path detection utilities

2020-07-18 Thread GitBox


vinothchandar merged pull request #1841:
URL: https://github.com/apache/hudi/pull/1841


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] reenarosid closed issue #1840: HUDI DELETE

2020-07-18 Thread GitBox


reenarosid closed issue #1840:
URL: https://github.com/apache/hudi/issues/1840


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch hudi_test_suite_refactor updated (786b36e -> 88d8929)

2020-07-18 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/hudi.git.


 discard 786b36e  [HUDI-394] Provide a basic implementation of test suite
 add 88d8929  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (786b36e)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (88d8929)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/integ/testsuite/job/TestHoodieTestSuiteJob.java | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)



[GitHub] [hudi] RajasekarSribalan commented on issue #1823: [SUPPORT] MOR trigger compaction from Hudi CLI

2020-07-18 Thread GitBox


RajasekarSribalan commented on issue #1823:
URL: https://github.com/apache/hudi/issues/1823#issuecomment-660503863


   @bvaradar @bhasudha 
   
   PLs find the cli output for a MOR table  clean info
   
   
   20/07/18 16:05:31 INFO timeline.HoodieActiveTimeline: Loaded instants 
[[20200716082419__clean__COMPLETED], [20200716102509__clean__COMPLETED], 
[20200716103921__clean__COMPLETED], [20200716134933__clean__COMPLETED], 
[20200716135749__clean__COMPLETED], [20200716163408__clean__COMPLETED], 
[20200716164519__clean__COMPLETED], [20200716192304__clean__COMPLETED], 
[20200716192304__deltacommit__COMPLETED], 
[20200716193103__deltacommit__COMPLETED], [20200717034005__commit__COMPLETED], 
[20200717080741__clean__COMPLETED], [20200717080741__deltacommit__COMPLETED], 
[20200717100758__clean__COMPLETED], [20200717100758__deltacommit__COMPLETED], 
[20200717101709__clean__COMPLETED], [20200717101709__deltacommit__COMPLETED], 
[20200717120702__clean__COMPLETED], [20200717120702__deltacommit__COMPLETED], 
[20200717121648__clean__COMPLETED], [20200717121648__deltacommit__COMPLETED], 
[20200717141621__clean__COMPLETED], [20200717141621__deltacommit__COMPLETED], 
[20200717142837__clean__COMPLETED], [2020071714
 2837__deltacommit__COMPLETED], [20200717161843__clean__COMPLETED], 
[20200717161843__deltacommit__COMPLETED], [20200717162524__clean__COMPLETED], 
[20200717162524__deltacommit__COMPLETED], [20200717180202__clean__COMPLETED], 
[20200717180202__deltacommit__COMPLETED], 
[20200717182211__deltacommit__COMPLETED], 
[20200717203440__deltacommit__COMPLETED], [20200718040640__clean__COMPLETED], 
[20200718040640__deltacommit__COMPLETED], [20200718055600__commit__COMPLETED], 
[20200718062014__clean__COMPLETED], [20200718062014__deltacommit__COMPLETED], 
[20200718062721__clean__COMPLETED], [20200718062721__deltacommit__COMPLETED], 
[20200718082117__clean__COMPLETED], [20200718082117__deltacommit__COMPLETED], 
[20200718082800__clean__COMPLETED], [20200718082800__deltacommit__COMPLETED], 
[20200718102800__clean__COMPLETED], [20200718102800__deltacommit__COMPLETED], 
[20200718104348__deltacommit__COMPLETED]]
   Clean null not found in metadata 
org.apache.hudi.common.table.timeline.HoodieDefaultTimeline: 
[20200716082419__clean__COMPLETED],[20200716102509__clean__COMPLETED],[20200716103921__clean__COMPLETED],[20200716134933__clean__COMPLETED],[20200716135749__clean__COMPLETED],[20200716163408__clean__COMPLETED],[20200716164519__clean__COMPLETED],[20200716192304__clean__COMPLETED],[20200717080741__clean__COMPLETED],[20200717100758__clean__COMPLETED],[20200717101709__clean__COMPLETED],[20200717120702__clean__COMPLETED],[20200717121648__clean__COMPLETED],[20200717141621__clean__COMPLETED],[20200717142837__clean__COMPLETED],[20200717161843__clean__COMPLETED],[20200717162524__clean__COMPLETED],[20200717180202__clean__COMPLETED],[20200718040640__clean__COMPLETED],[20200718062014__clean__COMPLETED],[20200718062721__clean__COMPLETED],[20200718082117__clean__COMPLETED],[20200718082800__clean__COMPLETED],[20200718102800__clean__COMPLETED]
   hudi:XX->cleans show
   
╔╤═╤═╤══╗
   ║ CleanTime  │ EarliestCommandRetained │ Total Files Deleted │ Total 
Time Taken ║
   
╠╪═╪═╪══╣
   ║ 20200718102800 │ │ 0   │ -1
   ║
   
╟┼─┼─┼──╢
   ║ 20200718082800 │ │ 0   │ -1
   ║
   
╟┼─┼─┼──╢
   ║ 20200718082117 │ │ 0   │ -1
   ║
   
╟┼─┼─┼──╢
   ║ 20200718062721 │ │ 0   │ -1
   ║
   
╟┼─┼─┼──╢
   ║ 20200718062014 │ │ 0   │ -1
   ║
   
╟┼─┼─┼──╢
   ║ 20200718040640 │ │ 0   │ -1
   ║
   
╟┼─┼─┼──╢
   ║ 20200717180202 │ │ 0   │ -1
   ║
   
╟┼─┼─┼──╢
   ║ 20200717162524 │ │ 0   │ -1
   ║
   
╟┼─┼─┼──╢
   ║ 20200717161843 │ │ 0   │ -1
   ║
   
╟┼─┼─┼──╢
   ║ 

[GitHub] [hudi] RajasekarSribalan commented on issue #1823: [SUPPORT] MOR trigger compaction from Hudi CLI

2020-07-18 Thread GitBox


RajasekarSribalan commented on issue #1823:
URL: https://github.com/apache/hudi/issues/1823#issuecomment-660502873


   Thanks @bvaradar @bhasudha  and one more problem which I could see is, how 
the compaction and cleaner should be configured? Should both have same values? 
What If i configure clean commits as 3, so that I reclaim more space and 
compaction to happen after 24 commits.. Since I am doing cleaner frequently, 
will be delta commits will be cleaned/delete before compaction. Please shed 
some light on this matter because, i could see tons of files in hdfs for a 
single table.
   
   For example, in my case, when i ran a bulk insert for a table to store it in 
Hudi, there were 7000+ parquet files for created which was fine. After running 
streaming pipeline for doing upsert on the same table for 2 days, i could see 
there were 90,000+ files in HDFS. I havent changed the default cleaner 
configuration ,so i believe cleaning happends after 24 commits? so thats the 
reason i have these many files. Pls correct me if I am wrong.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nandini57 commented on issue #1705: Tracking Hudi Data along transaction time and buisness time

2020-07-18 Thread GitBox


nandini57 commented on issue #1705:
URL: https://github.com/apache/hudi/issues/1705#issuecomment-660494259


   Still waiting :( .Can you please create a JIRA and assign to me? Will 
possibly take 2 weeks more to get clearence



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on a change in pull request #1769: [DOC] Add document for the use of metrics system in Hudi.

2020-07-18 Thread GitBox


xushiyan commented on a change in pull request #1769:
URL: https://github.com/apache/hudi/pull/1769#discussion_r456772276



##
File path: docs/_docs/2_8_metrics.md
##
@@ -0,0 +1,108 @@
+---
+title: Metrics Guide
+keywords: hudi, administration, operation, devops, metrics
+permalink: /docs/metrics.html
+summary: This section offers an overview of metrics in Hudi
+toc: true
+last_modified_at: 2020-06-20T15:59:57-04:00
+---
+
+In this section, We will introduce the metrics and metricsReporter in Hudi. 
You can view the metrics configuration 
[here](configurations.html#metrics-configs).

Review comment:
   ```suggestion
   In this section, we will introduce `HoodieMetrics` and `MetricsReporter` in 
Hudi. You can view the metrics-related configurations 
[here](configurations.html#metrics-configs).
   ```

##
File path: docs/_docs/2_8_metrics.md
##
@@ -0,0 +1,108 @@
+---
+title: Metrics Guide
+keywords: hudi, administration, operation, devops, metrics
+permalink: /docs/metrics.html
+summary: This section offers an overview of metrics in Hudi
+toc: true
+last_modified_at: 2020-06-20T15:59:57-04:00
+---
+
+In this section, We will introduce the metrics and metricsReporter in Hudi. 
You can view the metrics configuration 
[here](configurations.html#metrics-configs).
+
+## Metrics
+
+Once the Hudi writer is configured with the right table and environment for 
metrics, it produces the following graphite metrics, that aid in debugging hudi 
tables

Review comment:
   > configured with the right table and environment for metrics
   
   Do you think it'll be clearer to illustrate how to achieve this before 
moving on to the followings?

##
File path: docs/_docs/2_8_metrics.md
##
@@ -0,0 +1,108 @@
+---
+title: Metrics Guide
+keywords: hudi, administration, operation, devops, metrics
+permalink: /docs/metrics.html
+summary: This section offers an overview of metrics in Hudi
+toc: true
+last_modified_at: 2020-06-20T15:59:57-04:00
+---
+
+In this section, We will introduce the metrics and metricsReporter in Hudi. 
You can view the metrics configuration 
[here](configurations.html#metrics-configs).
+
+## Metrics

Review comment:
   Did you mean to introduce `HoodieMetrics` ? if so, please align on the 
class name for the section

##
File path: docs/_docs/2_8_metrics.md
##
@@ -0,0 +1,108 @@
+---
+title: Metrics Guide
+keywords: hudi, administration, operation, devops, metrics
+permalink: /docs/metrics.html
+summary: This section offers an overview of metrics in Hudi
+toc: true
+last_modified_at: 2020-06-20T15:59:57-04:00
+---
+
+In this section, We will introduce the metrics and metricsReporter in Hudi. 
You can view the metrics configuration 
[here](configurations.html#metrics-configs).
+
+## Metrics
+
+Once the Hudi writer is configured with the right table and environment for 
metrics, it produces the following graphite metrics, that aid in debugging hudi 
tables
+
+ - **Commit Duration** - This is amount of time it took to successfully commit 
a batch of records
+ - **Rollback Duration** - Similarly, amount of time taken to undo partial 
data left over by a failed commit (happens everytime automatically after a 
failing write)
+ - **File Level metrics** - Shows the amount of new files added, versions, 
deleted (cleaned) in each commit
+ - **Record Level Metrics** - Total records inserted/updated etc per commit
+ - **Partition Level metrics** - number of partitions upserted (super useful 
to understand sudden spikes in commit duration)
+
+These metrics can then be plotted on a standard tool like grafana. Below is a 
sample commit duration chart.
+
+
+
+
+
+## MetricsReporter
+
+MetricsReporter is a interface for report metrics to user specified place. 
Currently, it's implementations has InMemoryMetricsReporter, 
JmxMetricsReporter, MetricsGraphiteReporter and DatadogMetricsReporter. Since 
InMemoryMetricsReporter is only used for testing, we will introduce the other 
three implementations.
+
+### JmxMetricsReporter
+
+JmxMetricsReporter is a implementation of Jmx reporter, which used to report 
jmx metric.
+
+ Configurations
+The following is an example of configuration as JXM. The detailed 
configuration can refer to [here](configurations.html#jmx).
+
+  ```properties
+  hoodie.metrics.on=true
+  hoodie.metrics.reporter.type=JMX
+  hoodie.metrics.jmx.host=192.168.0.106
+  hoodie.metrics.jmx.port=4001
+  ```
+
+ Demo
+As configuration above, Hudi metrics will started JMX server on port 4001. 
Then we can start jconsole to connect to 192.168.0.106:4001. Below is a sample 
of monitoring hudi jmx metrics through jconsole.
+
+
+
+
+### MetricsGraphiteReporter
+
+MetricsGraphiteReporter is a implementation of Graphite reporter, which 
connects to the Graphite server, and send metrics to that server.
+
+ Configurations
+The following is an example of configuration as GRAPHITE. The detailed 
configuration can refer to 

[GitHub] [hudi] shenh062326 edited a comment on pull request #1838: [HUDI-1082] Fix minor bug in deciding the insert buckets

2020-07-18 Thread GitBox


shenh062326 edited a comment on pull request #1838:
URL: https://github.com/apache/hudi/pull/1838#issuecomment-660458165


   > Can you add a simple test to ensure the no of records inserted to a bucket 
approx matches the bucket weight. If we already have one, do refer it here.
   
   Sure, I will add a test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] shenh062326 commented on pull request #1838: [HUDI-1082] Fix minor bug in deciding the insert buckets

2020-07-18 Thread GitBox


shenh062326 commented on pull request #1838:
URL: https://github.com/apache/hudi/pull/1838#issuecomment-660458165


   > Can you add a simple test to ensure the no of records inserted to a bucket 
approx matches the bucket weight. If we already have one, do refer it here.
   
   Sure, I will add a testcase.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] shenh062326 commented on a change in pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-18 Thread GitBox


shenh062326 commented on a change in pull request #1819:
URL: https://github.com/apache/hudi/pull/1819#discussion_r456771979



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java
##
@@ -36,6 +36,9 @@
 public class OverwriteWithLatestAvroPayload extends BaseAvroPayload
 implements HoodieRecordPayload {
 
+  public static final String DEFAULT_DELETE_FIELD = "_hoodie_is_deleted";

Review comment:
   Remove DEFAULT_DELETE_FIELD.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org