[jira] [Commented] (HUDI-1007) When earliestOffsets is greater than checkpoint, Hudi will not be able to successfully consume data

2020-06-08 Thread liujinhui (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128395#comment-17128395
 ] 

liujinhui commented on HUDI-1007:
-

# This test case is really special and requires a production kafka-topic. The 
test environment may be because the amount of data is relatively small. It is 
better to have data written to the topic every moment, because we need to 
reproduce the scenario where the data expires every moment
2. Meet the first point, you can use deltastreamer for consumption
[~vinoth]

> When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> ---
>
> Key: HUDI-1007
> URL: https://issues.apache.org/jira/browse/HUDI-1007
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
> Fix For: 0.6.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Use deltastreamer to consume kafka,
>  When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#checkupValidOffsets
> boolean checkpointOffsetReseter = checkpointOffsets.entrySet().stream()
>  .anyMatch(offset -> offset.getValue() < 
> earliestOffsets.get(offset.getKey()));
> return checkpointOffsetReseter ? earliestOffsets : checkpointOffsets;
> Kafka data is continuously generated, which means that some data will 
> continue to expire.
>  When earliestOffsets is greater than checkpoint, earliestOffsets will be 
> taken. But at this moment, some data expired. In the end, consumption fails. 
> This process is an endless cycle. I can understand that this design may be to 
> avoid the loss of data, but it will lead to such a situation, I want to fix 
> this problem, I want to hear your opinion  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


svn commit: r39983 - in /dev/hudi/hudi-0.5.3-rc2: ./ hudi-0.5.3-rc2.src.tgz hudi-0.5.3-rc2.src.tgz.asc hudi-0.5.3-rc2.src.tgz.sha512

2020-06-08 Thread sivabalan
Author: sivabalan
Date: Mon Jun  8 13:35:18 2020
New Revision: 39983

Log:
Staging source releases for hudi release-0.5.3-rc2

Added:
dev/hudi/hudi-0.5.3-rc2/
dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz   (with props)
dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz.asc
dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz.sha512

Added: dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz
==
Binary file - no diff available.

Propchange: dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz
--
svn:mime-type = application/octet-stream

Added: dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz.asc
==
--- dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz.asc (added)
+++ dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz.asc Mon Jun  8 13:35:18 2020
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEEABtm+islQ8FRhyzMKaT9gvFQiDMFAl7eNO8ACgkQKaT9gvFQ
+iDPNwA/+LTxjoo1o9hSY4NMy9xc1Y2duyCButhofVJO4lAVeG2CssY+eY/A6l9D4
+zLy7XIHN0fg4Ox7TH2maaIsO4Dtds+oVFZCXQSgZAGVw7VLOlwfXu4ELZ1X10K3E
+fdDBauRzXX6lAAxMb1zcNgLp5vZAjTRFNPIRo91rW7OU3MOQ5OQfpOl+3aHUr3xv
+W4HnC9jGhNsFMLAKaoyf3svyjV8596sacXmor2DYPe4IQPvIqR6GqMInNF9AG1nd
+NgrxvBL5Ad1yVcPQGD8MG7EPV74lOGcydLpp/SCxDRgOIZBIXGiYH1pGfxD9XXPt
+iP8PUtPa1gIDM3DgeoZzRIZCJ6RjY2u5KkxSMGHu4DR7bbztMetI7DPSiqbf6R52
+5oPiLbIFnk8cFjXRJo0wBeG6O0CmKnnfMcbQhjlI4FBwn+d6cAHwzINT73aCX4DA
+1r+ho2KoPrEnqI+B0F6ivbxqpyxAwTSp8spHn8k/K0D1PXZIGaRraZ0NI6odWw2Q
+eQg61gatg7KOmH4OIfYsOt3YK9zZ+vHeL6spz7FXSQumTpTCjKE0vJjX28H+qgrH
+cODWszW38FpE7dY7YgGrXOz5TevydHebtVASQ80oe0HnM6vNeN7p/F0A+zPx9yns
+F5/BlHVp+MoqShSd3GNAeUe+mjCzcgbKmgJBtiurLfL34m4HSBk=
+=f9Z9
+-END PGP SIGNATURE-

Added: dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz.sha512
==
--- dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz.sha512 (added)
+++ dev/hudi/hudi-0.5.3-rc2/hudi-0.5.3-rc2.src.tgz.sha512 Mon Jun  8 13:35:18 
2020
@@ -0,0 +1 @@
+6f83b46858ca8eefcae0c340861aaa88336d81d2a1d9ed058456e469e256827ba16d5ce3c5348ae13baafa1a213cd6ea09c0e11dadc2c52138a9ae53bf9d
  hudi-0.5.3-rc2.src.tgz




[jira] [Commented] (HUDI-875) Introduce a new pom module named hudi-common-sync

2020-06-08 Thread liwei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128297#comment-17128297
 ] 

liwei commented on HUDI-875:


[https://github.com/apache/hudi/pull/1716]

> Introduce a new pom module named hudi-common-sync
> -
>
> Key: HUDI-875
> URL: https://issues.apache.org/jira/browse/HUDI-875
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Hive Integration
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] annotated tag release-0.5.3-rc2 updated (41fb6c2 -> bca79c5)

2020-06-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to annotated tag release-0.5.3-rc2
in repository https://gitbox.apache.org/repos/asf/hudi.git.


*** WARNING: tag release-0.5.3-rc2 was modified! ***

from 41fb6c2  (commit)
  to bca79c5  (tag)
 tagging 41fb6c268c6e84949ee70f305df93680c431a9da (commit)
 replaces release-0.5.2-incubating
  by Sivabalan Narayanan
  on Mon Jun 8 09:01:11 2020 -0400

- Log -
0.5.3
-BEGIN PGP SIGNATURE-

iQIzBAABCAAdFiEEABtm+islQ8FRhyzMKaT9gvFQiDMFAl7eNpgACgkQKaT9gvFQ
iDNBbRAAuCLfCNe9oySovidkkrUAPxiDV3tbV30mIj0josiEfy6qvrNVYuOozvCz
M9WSMNI4P+xlGuZbGMXjh5KpYfN84QCXLeoEc3u+1YQZn1Rmgh3QjNZZ5m6bDsEZ
ihE2jouP3zAaH7vCG9Tdw1dfO5t67SeGyNhCIRw9DXlre+xnLRmcY+jcGoaWs+1q
COWVIGu1/rAGdiTImV78EfD2g+fpBQSUj7Thutp4Yi+Rcjy1tqh6EraIqbWIv5gg
o59WU/4h1W2sjm6gecyEVucqUAWh+aw+IGY9+3aZjyoEq0iQ6P2MUZ/iTSc8g0ff
TPlk8cuvac2hohrH81xj665lZSNMcb24tlaHZOaQlMjzGKH+OX7Bn4uXG0Ub3u+g
XYYJ+zwXhIHBLLjvHZ7S4ggMkyoTp4ZF4666G8og/FQ69tJO3pdoLCWTAdrl8sA7
QEnQNpQskzYIYoEh7GEku578COK6K0aDkjOziBPxv1rs7rT1sDeMcL9kikoUXtxt
1V5Hos1iYXPfRB6DC9ZjKmii2hZSNpwMUxj/79hhFNCNkRSPtr2SowHTRkLElr+0
V7i38VQJzs4TLEnvylxcsr78+Z1SgsrEXUl7rzFKZLpoNRqv6mNccBpbomHuxyEG
1OIhhZTRU1rI0KwJEBZesCxk4qXRzWl/wDjibx+zBUD7O1tO9Sw=
=Pe5e
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:



[hudi] 01/04: [HUDI-988] Fix Unit Test Flakiness : Ensure all instantiations of HoodieWriteClient is closed properly. Fix bug in TestRollbacks. Make CLI unit tests for Hudi CLI check skip redering str

2020-06-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch release-0.5.3
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 6dcd0a3524fe7be0bbbd3e673ed7e1d4b035e0cb
Author: Balaji Varadarajan 
AuthorDate: Tue Jun 2 01:49:37 2020 -0700

[HUDI-988] Fix Unit Test Flakiness : Ensure all instantiations of 
HoodieWriteClient is closed properly. Fix bug in TestRollbacks. Make CLI unit 
tests for Hudi CLI check skip redering strings
---
 .../apache/hudi/cli/HoodieTableHeaderFields.java   |  16 +
 .../org/apache/hudi/cli/commands/StatsCommand.java |   4 +-
 .../cli/commands/AbstractShellIntegrationTest.java |   2 +-
 .../hudi/cli/commands/TestRepairsCommand.java  | 206 -
 .../org/apache/hudi/client/HoodieWriteClient.java  |   2 +-
 .../apache/hudi/client/TestHoodieClientBase.java   | 938 ++---
 .../java/org/apache/hudi/client/TestMultiFS.java   |   4 -
 .../hudi/client/TestUpdateSchemaEvolution.java |   4 +-
 .../hudi/common/HoodieClientTestHarness.java   | 426 +-
 .../hudi/index/TestHBaseQPSResourceAllocator.java  |   2 +-
 .../java/org/apache/hudi/index/TestHbaseIndex.java |  17 +-
 .../org/apache/hudi/index/TestHoodieIndex.java |   2 +-
 .../hudi/index/bloom/TestHoodieBloomIndex.java |   2 +-
 .../index/bloom/TestHoodieGlobalBloomIndex.java|   2 +-
 .../org/apache/hudi/io/TestHoodieMergeHandle.java  |  12 +-
 .../apache/hudi/table/TestCopyOnWriteTable.java|   5 +-
 .../apache/hudi/table/TestMergeOnReadTable.java|  38 +-
 .../hudi/table/compact/TestHoodieCompactor.java|  12 +-
 pom.xml|   1 +
 19 files changed, 745 insertions(+), 950 deletions(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieTableHeaderFields.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieTableHeaderFields.java
index 2e3bc01..708ae29 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieTableHeaderFields.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieTableHeaderFields.java
@@ -33,4 +33,20 @@ public class HoodieTableHeaderFields {
   public static final String HEADER_HOODIE_PROPERTY = "Property";
   public static final String HEADER_OLD_VALUE = "Old Value";
   public static final String HEADER_NEW_VALUE = "New Value";
+
+  /**
+   * Fields of Stats.
+   */
+  public static final String HEADER_COMMIT_TIME = "CommitTime";
+  public static final String HEADER_TOTAL_UPSERTED = "Total Upserted";
+  public static final String HEADER_TOTAL_WRITTEN = "Total Written";
+  public static final String HEADER_WRITE_AMPLIFICATION_FACTOR = "Write 
Amplification Factor";
+  public static final String HEADER_HISTOGRAM_MIN = "Min";
+  public static final String HEADER_HISTOGRAM_10TH = "10th";
+  public static final String HEADER_HISTOGRAM_50TH = "50th";
+  public static final String HEADER_HISTOGRAM_AVG = "avg";
+  public static final String HEADER_HISTOGRAM_95TH = "95th";
+  public static final String HEADER_HISTOGRAM_MAX = "Max";
+  public static final String HEADER_HISTOGRAM_NUM_FILES = "NumFiles";
+  public static final String HEADER_HISTOGRAM_STD_DEV = "StdDev";
 }
diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/StatsCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/StatsCommand.java
index b05aee2..4874777 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/StatsCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/StatsCommand.java
@@ -54,7 +54,7 @@ import java.util.stream.Collectors;
 @Component
 public class StatsCommand implements CommandMarker {
 
-  private static final int MAX_FILES = 100;
+  public static final int MAX_FILES = 100;
 
   @CliCommand(value = "stats wa", help = "Write Amplification. Ratio of how 
many records were upserted to how many "
   + "records were actually written")
@@ -97,7 +97,7 @@ public class StatsCommand implements CommandMarker {
 return HoodiePrintHelper.print(header, new HashMap<>(), sortByField, 
descending, limit, headerOnly, rows);
   }
 
-  private Comparable[] printFileSizeHistogram(String commitTime, Snapshot s) {
+  public Comparable[] printFileSizeHistogram(String commitTime, Snapshot s) {
 return new Comparable[] {commitTime, s.getMin(), s.getValue(0.1), 
s.getMedian(), s.getMean(), s.get95thPercentile(),
 s.getMax(), s.size(), s.getStdDev()};
   }
diff --git 
a/hudi-cli/src/test/java/org/apache/hudi/cli/commands/AbstractShellIntegrationTest.java
 
b/hudi-cli/src/test/java/org/apache/hudi/cli/commands/AbstractShellIntegrationTest.java
index ad81af5..d9f1688 100644
--- 
a/hudi-cli/src/test/java/org/apache/hudi/cli/commands/AbstractShellIntegrationTest.java
+++ 
b/hudi-cli/src/test/java/org/apache/hudi/cli/commands/AbstractShellIntegrationTest.java
@@ -58,4 +58,4 @@ public abstract class AbstractShellIntegrationTest extends 
HoodieClientTestHarne
   protected static JLineShellComponent getShell() {
 

[hudi] 03/04: Making few fixes after cherry picking

2020-06-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch release-0.5.3
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit d3afcbac3a6e5362d57570a2a5807abbf65c69d8
Author: Sivabalan Narayanan 
AuthorDate: Sun Jun 7 16:23:40 2020 -0400

Making few fixes after cherry picking
---
 .../apache/hudi/client/TestHoodieClientBase.java   | 917 +++--
 .../hudi/common/HoodieClientTestHarness.java   | 426 +-
 .../apache/hudi/table/TestMergeOnReadTable.java|   2 +
 .../hudi/table/compact/TestHoodieCompactor.java|   6 +-
 .../table/string/TestHoodieActiveTimeline.java |   2 +-
 5 files changed, 678 insertions(+), 675 deletions(-)

diff --git 
a/hudi-client/src/test/java/org/apache/hudi/client/TestHoodieClientBase.java 
b/hudi-client/src/test/java/org/apache/hudi/client/TestHoodieClientBase.java
index 6e6458b..6856489 100644
--- a/hudi-client/src/test/java/org/apache/hudi/client/TestHoodieClientBase.java
+++ b/hudi-client/src/test/java/org/apache/hudi/client/TestHoodieClientBase.java
@@ -72,477 +72,478 @@ import static org.junit.Assert.assertTrue;
  */
 public class TestHoodieClientBase extends HoodieClientTestHarness {
 
-private static final Logger LOG = 
LogManager.getLogger(TestHoodieClientBase.class);
-
-@Before
-public void setUp() throws Exception {
-initResources();
+  private static final Logger LOG = 
LogManager.getLogger(TestHoodieClientBase.class);
+
+  @Before
+  public void setUp() throws Exception {
+initResources();
+  }
+
+  @After
+  public void tearDown() throws Exception {
+cleanupResources();
+  }
+
+  protected HoodieCleanClient getHoodieCleanClient(HoodieWriteConfig cfg) {
+return new HoodieCleanClient(jsc, cfg, new HoodieMetrics(cfg, 
cfg.getTableName()));
+  }
+
+  /**
+   * Get Default HoodieWriteConfig for tests.
+   *
+   * @return Default Hoodie Write Config for tests
+   */
+  protected HoodieWriteConfig getConfig() {
+return getConfigBuilder().build();
+  }
+
+  protected HoodieWriteConfig getConfig(IndexType indexType) {
+return getConfigBuilder(indexType).build();
+  }
+
+  /**
+   * Get Config builder with default configs set.
+   *
+   * @return Config Builder
+   */
+  protected HoodieWriteConfig.Builder getConfigBuilder() {
+return getConfigBuilder(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA);
+  }
+
+  /**
+   * Get Config builder with default configs set.
+   *
+   * @return Config Builder
+   */
+  HoodieWriteConfig.Builder getConfigBuilder(IndexType indexType) {
+return getConfigBuilder(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA, 
indexType);
+  }
+
+  HoodieWriteConfig.Builder getConfigBuilder(String schemaStr) {
+return getConfigBuilder(schemaStr, IndexType.BLOOM);
+  }
+
+  /**
+   * Get Config builder with default configs set.
+   *
+   * @return Config Builder
+   */
+  HoodieWriteConfig.Builder getConfigBuilder(String schemaStr, IndexType 
indexType) {
+return 
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(schemaStr)
+.withParallelism(2, 
2).withBulkInsertParallelism(2).withFinalizeWriteParallelism(2)
+.withTimelineLayoutVersion(TimelineLayoutVersion.CURR_VERSION)
+.withWriteStatusClass(MetadataMergeWriteStatus.class)
+
.withConsistencyGuardConfig(ConsistencyGuardConfig.newBuilder().withConsistencyCheckEnabled(true).build())
+
.withCompactionConfig(HoodieCompactionConfig.newBuilder().compactionSmallFileSize(1024
 * 1024).build())
+.withStorageConfig(HoodieStorageConfig.newBuilder().limitFileSize(1024 
* 1024).build())
+.forTable("test-trip-table")
+
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(indexType).build())
+
.withEmbeddedTimelineServerEnabled(true).withFileSystemViewConfig(FileSystemViewStorageConfig.newBuilder()
+.withEnableBackupForRemoteFileSystemView(false) // Fail test if 
problem connecting to timeline-server
+
.withStorageType(FileSystemViewStorageType.EMBEDDED_KV_STORE).build());
+  }
+
+  protected HoodieTable getHoodieTable(HoodieTableMetaClient metaClient, 
HoodieWriteConfig config) {
+HoodieTable table = HoodieTable.getHoodieTable(metaClient, config, jsc);
+((SyncableFileSystemView) (table.getSliceView())).reset();
+return table;
+  }
+
+  /**
+   * Assert no failures in writing hoodie files.
+   *
+   * @param statuses List of Write Status
+   */
+  public static void assertNoWriteErrors(List statuses) {
+// Verify there are no errors
+for (WriteStatus status : statuses) {
+  assertFalse("Errors found in write of " + status.getFileId(), 
status.hasErrors());
 }
-
-@After
-public void tearDown() throws Exception {
-cleanupResources();
+  }
+
+  void assertPartitionMetadataForRecords(List inputRecords, 
FileSystem fs) throws IOException {
+Set partitionPathSet = inputRecords.stream()
+

[hudi] branch release-0.5.3 updated (5fcc461 -> 41fb6c2)

2020-06-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch release-0.5.3
in repository https://gitbox.apache.org/repos/asf/hudi.git.


omit 5fcc461  Bumping release candidate number 1
 new 6dcd0a3  [HUDI-988] Fix Unit Test Flakiness : Ensure all 
instantiations of HoodieWriteClient is closed properly. Fix bug in 
TestRollbacks. Make CLI unit tests for Hudi CLI check skip redering strings
 new ae48ecb  [HUDI-990] Timeline API : 
filterCompletedAndCompactionInstants needs to handle requested state correctly. 
Also ensure timeline gets reloaded after we revert committed transactions
 new d3afcba  Making few fixes after cherry picking
 new 41fb6c2  Bumping release candidate number 2

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (5fcc461)
\
 N -- N -- N   refs/heads/release-0.5.3 (41fb6c2)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docker/hoodie/hadoop/base/pom.xml  |   2 +-
 docker/hoodie/hadoop/datanode/pom.xml  |   2 +-
 docker/hoodie/hadoop/historyserver/pom.xml |   2 +-
 docker/hoodie/hadoop/hive_base/pom.xml |   2 +-
 docker/hoodie/hadoop/namenode/pom.xml  |   2 +-
 docker/hoodie/hadoop/pom.xml   |   2 +-
 docker/hoodie/hadoop/prestobase/pom.xml|   2 +-
 docker/hoodie/hadoop/spark_base/pom.xml|   2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml|   2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml   |   2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml   |   2 +-
 hudi-cli/pom.xml   |   2 +-
 .../apache/hudi/cli/HoodieTableHeaderFields.java   |  16 ++
 .../org/apache/hudi/cli/commands/StatsCommand.java |   4 +-
 .../cli/commands/AbstractShellIntegrationTest.java |   2 +-
 .../hudi/cli/commands/TestRepairsCommand.java  | 206 -
 hudi-client/pom.xml|   2 +-
 .../org/apache/hudi/client/HoodieWriteClient.java  |   2 +-
 .../client/embedded/EmbeddedTimelineService.java   |   4 +-
 .../apache/hudi/table/HoodieCopyOnWriteTable.java  |   2 +
 .../apache/hudi/table/HoodieMergeOnReadTable.java  |   2 +
 .../apache/hudi/client/TestHoodieClientBase.java   | 187 +--
 .../java/org/apache/hudi/client/TestMultiFS.java   |   4 -
 .../hudi/client/TestUpdateSchemaEvolution.java |   4 +-
 .../hudi/common/HoodieClientTestHarness.java   |  54 --
 .../hudi/index/TestHBaseQPSResourceAllocator.java  |   2 +-
 .../java/org/apache/hudi/index/TestHbaseIndex.java |  17 +-
 .../org/apache/hudi/index/TestHoodieIndex.java |   2 +-
 .../hudi/index/bloom/TestHoodieBloomIndex.java |   2 +-
 .../index/bloom/TestHoodieGlobalBloomIndex.java|   2 +-
 .../org/apache/hudi/io/TestHoodieMergeHandle.java  |  12 +-
 .../apache/hudi/table/TestCopyOnWriteTable.java|   5 +-
 .../apache/hudi/table/TestMergeOnReadTable.java|  43 ++---
 .../hudi/table/compact/TestHoodieCompactor.java|  14 +-
 hudi-common/pom.xml|   2 +-
 .../table/timeline/HoodieDefaultTimeline.java  |   2 +-
 .../table/view/FileSystemViewStorageConfig.java|  21 +++
 .../table/string/TestHoodieActiveTimeline.java |   2 +-
 hudi-hadoop-mr/pom.xml |   2 +-
 hudi-hive/pom.xml  |   2 +-
 hudi-integ-test/pom.xml|   2 +-
 hudi-spark/pom.xml |   2 +-
 hudi-timeline-service/pom.xml  |   2 +-
 hudi-utilities/pom.xml |   2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml|   2 +-
 packaging/hudi-hive-bundle/pom.xml |   2 +-
 packaging/hudi-presto-bundle/pom.xml   |   2 +-
 packaging/hudi-spark-bundle/pom.xml|   2 +-
 packaging/hudi-timeline-server-bundle/pom.xml  |   2 +-
 packaging/hudi-utilities-bundle/pom.xml|   2 +-
 pom.xml|   3 +-
 51 files changed, 247 insertions(+), 419 deletions(-)
 delete mode 100644 

[hudi] 04/04: Bumping release candidate number 2

2020-06-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch release-0.5.3
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 41fb6c268c6e84949ee70f305df93680c431a9da
Author: Sivabalan Narayanan 
AuthorDate: Mon Jun 8 08:41:25 2020 -0400

Bumping release candidate number 2
---
 docker/hoodie/hadoop/base/pom.xml | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml| 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml| 2 +-
 docker/hoodie/hadoop/namenode/pom.xml | 2 +-
 docker/hoodie/hadoop/pom.xml  | 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml   | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml   | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml  | 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml  | 2 +-
 hudi-cli/pom.xml  | 2 +-
 hudi-client/pom.xml   | 2 +-
 hudi-common/pom.xml   | 2 +-
 hudi-hadoop-mr/pom.xml| 2 +-
 hudi-hive/pom.xml | 2 +-
 hudi-integ-test/pom.xml   | 2 +-
 hudi-spark/pom.xml| 2 +-
 hudi-timeline-service/pom.xml | 2 +-
 hudi-utilities/pom.xml| 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml   | 2 +-
 packaging/hudi-hive-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml  | 2 +-
 packaging/hudi-spark-bundle/pom.xml   | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml | 2 +-
 packaging/hudi-utilities-bundle/pom.xml   | 2 +-
 pom.xml   | 2 +-
 27 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 2f271c7..c95f610 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ b/docker/hoodie/hadoop/base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.3-SNAPSHOT
+0.5.3-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/datanode/pom.xml 
b/docker/hoodie/hadoop/datanode/pom.xml
index c27fa29..3715b89 100644
--- a/docker/hoodie/hadoop/datanode/pom.xml
+++ b/docker/hoodie/hadoop/datanode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.3-SNAPSHOT
+0.5.3-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/historyserver/pom.xml 
b/docker/hoodie/hadoop/historyserver/pom.xml
index 302d453..63defe3 100644
--- a/docker/hoodie/hadoop/historyserver/pom.xml
+++ b/docker/hoodie/hadoop/historyserver/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.3-SNAPSHOT
+0.5.3-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/hive_base/pom.xml 
b/docker/hoodie/hadoop/hive_base/pom.xml
index 087c60c..494f3c0 100644
--- a/docker/hoodie/hadoop/hive_base/pom.xml
+++ b/docker/hoodie/hadoop/hive_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.3-SNAPSHOT
+0.5.3-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/namenode/pom.xml 
b/docker/hoodie/hadoop/namenode/pom.xml
index 5e93c41..e68ddeb 100644
--- a/docker/hoodie/hadoop/namenode/pom.xml
+++ b/docker/hoodie/hadoop/namenode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.3-SNAPSHOT
+0.5.3-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml
index 4c1..23419c5 100644
--- a/docker/hoodie/hadoop/pom.xml
+++ b/docker/hoodie/hadoop/pom.xml
@@ -19,7 +19,7 @@
   
 hudi
 org.apache.hudi
-0.5.3-SNAPSHOT
+0.5.3-rc2
 ../../../pom.xml
   
   4.0.0
diff --git a/docker/hoodie/hadoop/prestobase/pom.xml 
b/docker/hoodie/hadoop/prestobase/pom.xml
index c8ac0c9..e8b6e1c 100644
--- a/docker/hoodie/hadoop/prestobase/pom.xml
+++ b/docker/hoodie/hadoop/prestobase/pom.xml
@@ -22,7 +22,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.3-SNAPSHOT
+0.5.3-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/spark_base/pom.xml 
b/docker/hoodie/hadoop/spark_base/pom.xml
index ae80714..4aa0e86 100644
--- a/docker/hoodie/hadoop/spark_base/pom.xml
+++ b/docker/hoodie/hadoop/spark_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.3-SNAPSHOT
+0.5.3-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/sparkadhoc/pom.xml 
b/docker/hoodie/hadoop/sparkadhoc/pom.xml
index 0ad98af..08a9cb7 100644
--- a/docker/hoodie/hadoop/sparkadhoc/pom.xml
+++ b/docker/hoodie/hadoop/sparkadhoc/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.5.3-SNAPSHOT
+0.5.3-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/sparkmaster/pom.xml 
b/docker/hoodie/hadoop/sparkmaster/pom.xml
index 1df5edd..e8306dd 100644
--- 

[hudi] 02/04: [HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly. Also ensure timeline gets reloaded after we revert committed transactions

2020-06-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch release-0.5.3
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit ae48ecbe232eb55267d1a138baeec13baa1fb249
Author: Balaji Varadarajan 
AuthorDate: Wed Jun 3 00:35:14 2020 -0700

[HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to 
handle requested state correctly. Also ensure timeline gets reloaded after we 
revert committed transactions
---
 .../client/embedded/EmbeddedTimelineService.java|  4 +++-
 .../apache/hudi/table/HoodieCopyOnWriteTable.java   |  2 ++
 .../apache/hudi/table/HoodieMergeOnReadTable.java   |  2 ++
 .../org/apache/hudi/table/TestMergeOnReadTable.java |  3 +++
 .../table/timeline/HoodieDefaultTimeline.java   |  2 +-
 .../table/view/FileSystemViewStorageConfig.java | 21 +
 6 files changed, 32 insertions(+), 2 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java
 
b/hudi-client/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java
index 5afee3f..c7c4f7b 100644
--- 
a/hudi-client/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java
+++ 
b/hudi-client/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java
@@ -89,7 +89,9 @@ public class EmbeddedTimelineService {
* Retrieves proper view storage configs for remote clients to access this 
service.
*/
   public FileSystemViewStorageConfig getRemoteFileSystemViewConfig() {
-return 
FileSystemViewStorageConfig.newBuilder().withStorageType(FileSystemViewStorageType.REMOTE_FIRST)
+FileSystemViewStorageType viewStorageType = 
config.shouldEnableBackupForRemoteFileSystemView()
+? FileSystemViewStorageType.REMOTE_FIRST : 
FileSystemViewStorageType.REMOTE_ONLY;
+return 
FileSystemViewStorageConfig.newBuilder().withStorageType(viewStorageType)
 
.withRemoteServerHost(hostAddr).withRemoteServerPort(serverPort).build();
   }
 
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java 
b/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java
index 4c91c77..c74af2d 100644
--- 
a/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java
+++ 
b/hudi-client/src/main/java/org/apache/hudi/table/HoodieCopyOnWriteTable.java
@@ -359,6 +359,8 @@ public class HoodieCopyOnWriteTable extends Hoodi
 if (instant.isCompleted()) {
   LOG.info("Unpublishing instant " + instant);
   instant = activeTimeline.revertToInflight(instant);
+  // reload meta-client to reflect latest timeline status
+  metaClient.reloadActiveTimeline();
 }
 
 // For Requested State (like failure during index lookup), there is 
nothing to do rollback other than
diff --git 
a/hudi-client/src/main/java/org/apache/hudi/table/HoodieMergeOnReadTable.java 
b/hudi-client/src/main/java/org/apache/hudi/table/HoodieMergeOnReadTable.java
index 938a5fd..5f56369 100644
--- 
a/hudi-client/src/main/java/org/apache/hudi/table/HoodieMergeOnReadTable.java
+++ 
b/hudi-client/src/main/java/org/apache/hudi/table/HoodieMergeOnReadTable.java
@@ -179,6 +179,8 @@ public class HoodieMergeOnReadTable extends Hoodi
 if (instant.isCompleted()) {
   LOG.error("Un-publishing instant " + instant + ", deleteInstants=" + 
deleteInstants);
   instant = this.getActiveTimeline().revertToInflight(instant);
+  // reload meta-client to reflect latest timeline status
+  metaClient.reloadActiveTimeline();
 }
 
 List allRollbackStats = new ArrayList<>();
diff --git 
a/hudi-client/src/test/java/org/apache/hudi/table/TestMergeOnReadTable.java 
b/hudi-client/src/test/java/org/apache/hudi/table/TestMergeOnReadTable.java
index fdc968d..9f3eaea 100644
--- a/hudi-client/src/test/java/org/apache/hudi/table/TestMergeOnReadTable.java
+++ b/hudi-client/src/test/java/org/apache/hudi/table/TestMergeOnReadTable.java
@@ -44,6 +44,7 @@ import 
org.apache.hudi.common.table.TableFileSystemView.SliceView;
 import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
 import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieInstant.State;
+import org.apache.hudi.common.table.view.FileSystemViewStorageConfig;
 import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.config.HoodieCompactionConfig;
@@ -1219,6 +1220,8 @@ public class TestMergeOnReadTable extends 
HoodieClientTestHarness {
 
.withInlineCompaction(false).withMaxNumDeltaCommitsBeforeCompaction(1).build())
 .withStorageConfig(HoodieStorageConfig.newBuilder().limitFileSize(1024 
* 1024 * 1024).build())
 .withEmbeddedTimelineServerEnabled(true).forTable("test-trip-table")
+.withFileSystemViewConfig(new FileSystemViewStorageConfig.Builder()
+

[hudi] branch master updated: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug (#1652)

2020-06-08 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 97ab97b  [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug 
(#1652)
97ab97b is described below

commit 97ab97b72635164db5ac2a4f93e72e224603ffe0
Author: liujinhui <965147...@qq.com>
AuthorDate: Mon Jun 8 20:46:47 2020 +0800

[HUDI-918] Fix kafkaOffsetGen can not read kafka data bug (#1652)
---
 .../org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java   | 6 ++
 1 file changed, 6 insertions(+)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
index 39c47a2..9331274 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
@@ -21,6 +21,7 @@ package org.apache.hudi.utilities.sources.helpers;
 import org.apache.hudi.DataSourceUtils;
 import org.apache.hudi.common.config.TypedProperties;
 import org.apache.hudi.common.util.Option;
+import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.exception.HoodieNotSupportedException;
 
 import org.apache.kafka.clients.consumer.KafkaConsumer;
@@ -207,6 +208,11 @@ public class KafkaOffsetGen {
 maxEventsToReadFromKafka = (maxEventsToReadFromKafka == Long.MAX_VALUE || 
maxEventsToReadFromKafka == Integer.MAX_VALUE)
 ? Config.maxEventsFromKafkaSource : maxEventsToReadFromKafka;
 long numEvents = sourceLimit == Long.MAX_VALUE ? maxEventsToReadFromKafka 
: sourceLimit;
+
+if (numEvents < toOffsets.size()) {
+  throw new HoodieException("sourceLimit should not be less than the 
number of kafka partitions");
+}
+
 return CheckpointUtils.computeOffsetRanges(fromOffsets, toOffsets, 
numEvents);
   }
 



[jira] [Updated] (HUDI-1002) Ignore case when setting incremental mode in hive query

2020-06-08 Thread Hong Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated HUDI-1002:

Status: Open  (was: New)

> Ignore case when setting incremental mode in hive query
> ---
>
> Key: HUDI-1002
> URL: https://issues.apache.org/jira/browse/HUDI-1002
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: leesf
>Assignee: Hong Shen
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.6.0
>
>
> when using hive query hudi dataset in incremental mode, we need set 
> `set hoodie.hudi_table.consume.mode=INCREMENTAL`, here INCREMENTAL must be 
> uppercase, and 
> `set hoodie.hudi_table.consume.mode=incremental` would not work. 
> IMO, `incremental`should also work, and we would ignore the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1002) Ignore case when setting incremental mode in hive query

2020-06-08 Thread Hong Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen resolved HUDI-1002.
-
Resolution: Fixed

> Ignore case when setting incremental mode in hive query
> ---
>
> Key: HUDI-1002
> URL: https://issues.apache.org/jira/browse/HUDI-1002
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: leesf
>Assignee: Hong Shen
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.6.0
>
>
> when using hive query hudi dataset in incremental mode, we need set 
> `set hoodie.hudi_table.consume.mode=INCREMENTAL`, here INCREMENTAL must be 
> uppercase, and 
> `set hoodie.hudi_table.consume.mode=incremental` would not work. 
> IMO, `incremental`should also work, and we would ignore the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1007) When earliestOffsets is greater than checkpoint, Hudi will not be able to successfully consume data

2020-06-08 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128238#comment-17128238
 ] 

Vinoth Chandar commented on HUDI-1007:
--

[~liujinhui] My understanding is this must be handled already today - as we 
obtain the offsets each run?.. if not, we should indeed fix this. can you share 
a failing test case?

> When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> ---
>
> Key: HUDI-1007
> URL: https://issues.apache.org/jira/browse/HUDI-1007
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
> Fix For: 0.6.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Use deltastreamer to consume kafka,
>  When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#checkupValidOffsets
> boolean checkpointOffsetReseter = checkpointOffsets.entrySet().stream()
>  .anyMatch(offset -> offset.getValue() < 
> earliestOffsets.get(offset.getKey()));
> return checkpointOffsetReseter ? earliestOffsets : checkpointOffsets;
> Kafka data is continuously generated, which means that some data will 
> continue to expire.
>  When earliestOffsets is greater than checkpoint, earliestOffsets will be 
> taken. But at this moment, some data expired. In the end, consumption fails. 
> This process is an endless cycle. I can understand that this design may be to 
> avoid the loss of data, but it will lead to such a situation, I want to fix 
> this problem, I want to hear your opinion  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-944) Support more complete concurrency control when writing data

2020-06-08 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128222#comment-17128222
 ] 

Vinoth Chandar commented on HUDI-944:
-

Hi [~309637554] please go ahead with HUDI-839 tests if that's a good change to 
get started.. Also happy to finish it up. So let me know :) 

 

On b, its actually exciting to see that we have some similar ideas again :)

> This scenes we also meet. In some database , use bucket or sharding to solve 
> this problem. With bucket  users need  to  first bucket there data with the 
> key using hash partition  algorithm(like kafka built in such algorithm), then 
> different hudi client write the data with different key and will not conflict 
> when concurrency writing data.

We need to introduce a set of bucketed logs to place the inserts and merge them 
with the other base file groups.. Anyways, once you are ramped up, we can 
continue this on a doc :) 

HUDI-55, I feel is very different. its more supporting cases of point lookup 
like queries (we can just leverage RFC-08 to do a much better job of this.).

 

 

 

> Support more complete  concurrency control when writing data
> 
>
> Key: HUDI-944
> URL: https://issues.apache.org/jira/browse/HUDI-944
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: liwei
>Assignee: liwei
>Priority: Major
> Fix For: 0.6.0
>
>
> Now hudi just support write、compaction concurrency control. But some scenario 
> need write concurrency control.Such as two spark job with different data 
> source ,need to write to the same hudi table.
> I have two Proposal:
> 1. first step :support write concurrency control on different partition
>  but now when two client write data to different partition, will meet these 
> error
> a、Rolling back commits failed
> b、instants version already exist
> {code:java}
>  [2020-05-25 21:20:34,732] INFO Checking for file exists 
> ?/tmp/HudiDLATestPartition/.hoodie/20200525212031.clean.inflight 
> (org.apache.hudi.common.table.timeline.HoodieActiveTimeline)
>  Exception in thread "main" org.apache.hudi.exception.HoodieIOException: 
> Failed to create file /tmp/HudiDLATestPartition/.hoodie/20200525212031.clean
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:437)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionCleanInflightToComplete(HoodieActiveTimeline.java:290)
>  at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:183)
>  at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:142)
>  at 
> org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  {code}
> c、two client's archiving conflict
> d、the read client meets "Unable to infer schema for Parquet. It must be 
> specified manually.;"
> 2. second step:support insert、upsert、compaction concurrency control on 
> different isolation level such as Serializable、WriteSerializable.
> hudi can design a mechanism to check the confict in 
> AbstractHoodieWriteClient.commit()
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1007) When earliestOffsets is greater than checkpoint, Hudi will not be able to successfully consume data

2020-06-08 Thread liujinhui (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128220#comment-17128220
 ] 

liujinhui commented on HUDI-1007:
-

*[~vinoth]  What is your idea?*

> When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> ---
>
> Key: HUDI-1007
> URL: https://issues.apache.org/jira/browse/HUDI-1007
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
> Fix For: 0.6.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Use deltastreamer to consume kafka,
>  When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#checkupValidOffsets
> boolean checkpointOffsetReseter = checkpointOffsets.entrySet().stream()
>  .anyMatch(offset -> offset.getValue() < 
> earliestOffsets.get(offset.getKey()));
> return checkpointOffsetReseter ? earliestOffsets : checkpointOffsets;
> Kafka data is continuously generated, which means that some data will 
> continue to expire.
>  When earliestOffsets is greater than checkpoint, earliestOffsets will be 
> taken. But at this moment, some data expired. In the end, consumption fails. 
> This process is an endless cycle. I can understand that this design may be to 
> avoid the loss of data, but it will lead to such a situation, I want to fix 
> this problem, I want to hear your opinion  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1007) When earliestOffsets is greater than checkpoint, Hudi will not be able to successfully consume data

2020-06-08 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui updated HUDI-1007:

Description: 
Use deltastreamer to consume kafka,
 When earliestOffsets is greater than checkpoint, Hudi will not be able to 
successfully consume data

org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#checkupValidOffsets

boolean checkpointOffsetReseter = checkpointOffsets.entrySet().stream()
 .anyMatch(offset -> offset.getValue() < earliestOffsets.get(offset.getKey()));

return checkpointOffsetReseter ? earliestOffsets : checkpointOffsets;

Kafka data is continuously generated, which means that some data will continue 
to expire.
 When earliestOffsets is greater than checkpoint, earliestOffsets will be 
taken. But at this moment, some data expired. In the end, consumption fails. 
This process is an endless cycle. I can understand that this design may be to 
avoid the loss of data, but it will lead to such a situation, I want to fix 
this problem, I want to hear your opinion  

  was:
Use deltastreamer to consume kafka,
 When earliestOffsets is greater than checkpoint, Hudi will not be able to 
successfully consume data



org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#checkupValidOffsets

boolean checkpointOffsetReseter = checkpointOffsets.entrySet().stream()
 .anyMatch(offset -> offset.getValue() < earliestOffsets.get(offset.getKey()));

return checkpointOffsetReseter ? earliestOffsets : checkpointOffsets;


Kafka data is continuously generated, which means that some data will continue 
to expire.
When earliestOffsets is greater than checkpoint, earliestOffsets will be taken. 
But at this moment, some data expired. In the end, consumption fails. This 
process is an endless cycle. I can understand that this design may be to avoid 
the loss of data, but it will lead to such a situation, I want to fix this 
problem, I want to hear your opinion


> When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> ---
>
> Key: HUDI-1007
> URL: https://issues.apache.org/jira/browse/HUDI-1007
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
> Fix For: 0.6.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Use deltastreamer to consume kafka,
>  When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#checkupValidOffsets
> boolean checkpointOffsetReseter = checkpointOffsets.entrySet().stream()
>  .anyMatch(offset -> offset.getValue() < 
> earliestOffsets.get(offset.getKey()));
> return checkpointOffsetReseter ? earliestOffsets : checkpointOffsets;
> Kafka data is continuously generated, which means that some data will 
> continue to expire.
>  When earliestOffsets is greater than checkpoint, earliestOffsets will be 
> taken. But at this moment, some data expired. In the end, consumption fails. 
> This process is an endless cycle. I can understand that this design may be to 
> avoid the loss of data, but it will lead to such a situation, I want to fix 
> this problem, I want to hear your opinion  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1007) When earliestOffsets is greater than checkpoint, Hudi will not be able to successfully consume data

2020-06-08 Thread liujinhui (Jira)
liujinhui created HUDI-1007:
---

 Summary: When earliestOffsets is greater than checkpoint, Hudi 
will not be able to successfully consume data
 Key: HUDI-1007
 URL: https://issues.apache.org/jira/browse/HUDI-1007
 Project: Apache Hudi
  Issue Type: Bug
Reporter: liujinhui
 Fix For: 0.6.0


Use deltastreamer to consume kafka,
 When earliestOffsets is greater than checkpoint, Hudi will not be able to 
successfully consume data



org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#checkupValidOffsets

boolean checkpointOffsetReseter = checkpointOffsets.entrySet().stream()
 .anyMatch(offset -> offset.getValue() < earliestOffsets.get(offset.getKey()));

return checkpointOffsetReseter ? earliestOffsets : checkpointOffsets;


Kafka data is continuously generated, which means that some data will continue 
to expire.
When earliestOffsets is greater than checkpoint, earliestOffsets will be taken. 
But at this moment, some data expired. In the end, consumption fails. This 
process is an endless cycle. I can understand that this design may be to avoid 
the loss of data, but it will lead to such a situation, I want to fix this 
problem, I want to hear your opinion



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1007) When earliestOffsets is greater than checkpoint, Hudi will not be able to successfully consume data

2020-06-08 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui reassigned HUDI-1007:
---

Assignee: liujinhui

> When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> ---
>
> Key: HUDI-1007
> URL: https://issues.apache.org/jira/browse/HUDI-1007
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
> Fix For: 0.6.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Use deltastreamer to consume kafka,
>  When earliestOffsets is greater than checkpoint, Hudi will not be able to 
> successfully consume data
> org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen#checkupValidOffsets
> boolean checkpointOffsetReseter = checkpointOffsets.entrySet().stream()
>  .anyMatch(offset -> offset.getValue() < 
> earliestOffsets.get(offset.getKey()));
> return checkpointOffsetReseter ? earliestOffsets : checkpointOffsets;
> Kafka data is continuously generated, which means that some data will 
> continue to expire.
> When earliestOffsets is greater than checkpoint, earliestOffsets will be 
> taken. But at this moment, some data expired. In the end, consumption fails. 
> This process is an endless cycle. I can understand that this design may be to 
> avoid the loss of data, but it will lead to such a situation, I want to fix 
> this problem, I want to hear your opinion



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-1002] Ignore case when setting incremental mode in hive query (#1715)

2020-06-08 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2901f54  [HUDI-1002] Ignore case when setting incremental mode in hive 
query (#1715)
2901f54 is described below

commit 2901f5423a08c8f0c5fe60c8a3f23acbd36c0aed
Author: Shen Hong 
AuthorDate: Mon Jun 8 19:38:32 2020 +0800

[HUDI-1002] Ignore case when setting incremental mode in hive query (#1715)
---
 .../src/main/java/org/apache/hudi/hadoop/HoodieHiveUtil.java  | 2 +-
 .../java/org/apache/hudi/hadoop/TestHoodieParquetInputFormat.java | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHiveUtil.java 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHiveUtil.java
index 5266391..0537cfa 100644
--- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHiveUtil.java
+++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHiveUtil.java
@@ -98,7 +98,7 @@ public class HoodieHiveUtil {
 Map tablesModeMap = job.getConfiguration()
 .getValByRegex(HOODIE_CONSUME_MODE_PATTERN_STRING.pattern());
 List result = tablesModeMap.entrySet().stream().map(s -> {
-  if (s.getValue().trim().equals(INCREMENTAL_SCAN_MODE)) {
+  if (s.getValue().trim().toUpperCase().equals(INCREMENTAL_SCAN_MODE)) {
 Matcher matcher = 
HOODIE_CONSUME_MODE_PATTERN_STRING.matcher(s.getKey());
 return (!matcher.find() ? null : matcher.group(1));
   }
diff --git 
a/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieParquetInputFormat.java
 
b/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieParquetInputFormat.java
index 51a6524..ad38d33 100644
--- 
a/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieParquetInputFormat.java
+++ 
b/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieParquetInputFormat.java
@@ -310,12 +310,14 @@ public class TestHoodieParquetInputFormat {
 
   @Test
   public void testGetIncrementalTableNames() throws IOException {
-String[] expectedincrTables = {"db1.raw_trips", "db2.model_trips"};
+String[] expectedincrTables = {"db1.raw_trips", "db2.model_trips", 
"db3.model_trips"};
 JobConf conf = new JobConf();
 String incrementalMode1 = 
String.format(HoodieHiveUtil.HOODIE_CONSUME_MODE_PATTERN, 
expectedincrTables[0]);
 conf.set(incrementalMode1, HoodieHiveUtil.INCREMENTAL_SCAN_MODE);
 String incrementalMode2 = 
String.format(HoodieHiveUtil.HOODIE_CONSUME_MODE_PATTERN, 
expectedincrTables[1]);
 conf.set(incrementalMode2,HoodieHiveUtil.INCREMENTAL_SCAN_MODE);
+String incrementalMode3 = 
String.format(HoodieHiveUtil.HOODIE_CONSUME_MODE_PATTERN, "db3.model_trips");
+conf.set(incrementalMode3, 
HoodieHiveUtil.INCREMENTAL_SCAN_MODE.toLowerCase());
 String defaultmode = 
String.format(HoodieHiveUtil.HOODIE_CONSUME_MODE_PATTERN, "db3.first_trips");
 conf.set(defaultmode, HoodieHiveUtil.DEFAULT_SCAN_MODE);
 List actualincrTables = 
HoodieHiveUtil.getIncrementalTableNames(Job.getInstance(conf));



[jira] [Assigned] (HUDI-1006) deltastreamer set auto.offset.reset=latest can't consume data

2020-06-08 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui reassigned HUDI-1006:
---

Assignee: Tianye Li  (was: liujinhui)

> deltastreamer set auto.offset.reset=latest can't consume data
> -
>
> Key: HUDI-1006
> URL: https://issues.apache.org/jira/browse/HUDI-1006
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: Tianye Li
>Priority: Major
> Fix For: 0.6.0
>
>
> org.apache.hudi.utilities.sources.JsonKafkaSource#fetchNewData
> if (totalNewMsgs <= 0) {
>  return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? 
> lastCheckpointStr.get() : "");
> }
> I think it should not be empty here, it should be 
> if (totalNewMsgs <= 0) {
>  return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? 
> lastCheckpointStr.get() : CheckpointUtils.offsetsToStr(offsetRanges));
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1006) deltastreamer set auto.offset.reset=latest can't consume data

2020-06-08 Thread liujinhui (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128197#comment-17128197
 ] 

liujinhui commented on HUDI-1006:
-

[~Litianye]  It's up to you to fix this problem

> deltastreamer set auto.offset.reset=latest can't consume data
> -
>
> Key: HUDI-1006
> URL: https://issues.apache.org/jira/browse/HUDI-1006
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: Tianye Li
>Priority: Major
> Fix For: 0.6.0
>
>
> org.apache.hudi.utilities.sources.JsonKafkaSource#fetchNewData
> if (totalNewMsgs <= 0) {
>  return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? 
> lastCheckpointStr.get() : "");
> }
> I think it should not be empty here, it should be 
> if (totalNewMsgs <= 0) {
>  return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? 
> lastCheckpointStr.get() : CheckpointUtils.offsetsToStr(offsetRanges));
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] bvaradar commented on pull request #1712: Cherry picking HUDI-988 and HUDI-990 to release-0.5.3

2020-06-08 Thread GitBox


bvaradar commented on pull request #1712:
URL: https://github.com/apache/hudi/pull/1712#issuecomment-640265188


   @nsivabalan : Lets wait for tests to pass before you can merge to 0.5.3 
branch



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1664: HUDI-942 Increase default value number of delta commits for inline compaction

2020-06-08 Thread GitBox


codecov-commenter edited a comment on pull request #1664:
URL: https://github.com/apache/hudi/pull/1664#issuecomment-640292029


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1664?src=pr=h1) Report
   > Merging 
[#1664](https://codecov.io/gh/apache/hudi/pull/1664?src=pr=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/fb283934a33a0bc7b11f80e4149f7922fa4f0af5=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1664/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1664?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1664   +/-   ##
   =
 Coverage 18.19%   18.19%   
 Complexity  857  857   
   =
 Files   348  348   
 Lines 1535815358   
 Branches   1525 1525   
   =
 Hits   2794 2794   
 Misses1220612206   
 Partials358  358   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1664?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...org/apache/hudi/config/HoodieCompactionConfig.java](https://codecov.io/gh/apache/hudi/pull/1664/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZUNvbXBhY3Rpb25Db25maWcuamF2YQ==)
 | `56.00% <ø> (ø)` | `3.00 <0.00> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1664?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1664?src=pr=footer). Last 
update 
[fb28393...2e4dab9](https://codecov.io/gh/apache/hudi/pull/1664?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] RocMarshal commented on issue #143: Tracking ticket for folks to be added to slack group

2020-06-08 Thread GitBox


RocMarshal commented on issue #143:
URL: https://github.com/apache/hudi/issues/143#issuecomment-640085114


   Could you add me to the slack channel?  flin...@126.com
   Thank you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #1710: [MINOR] Fix delta streamer write config

2020-06-08 Thread GitBox


vinothchandar commented on a change in pull request #1710:
URL: https://github.com/apache/hudi/pull/1710#discussion_r436611592



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##
@@ -497,25 +499,30 @@ private void setupWriteClient() {
* @param schemaProvider Schema Provider
*/
   private HoodieWriteConfig getHoodieClientConfig(SchemaProvider 
schemaProvider) {
+final boolean combineBeforeUpsert = true;
+final boolean autoCommit = false;
 HoodieWriteConfig.Builder builder =
-
HoodieWriteConfig.newBuilder().withPath(cfg.targetBasePath).combineInput(cfg.filterDupes,
 true)
+
HoodieWriteConfig.newBuilder().withPath(cfg.targetBasePath).combineInput(cfg.filterDupes,
 combineBeforeUpsert)
 
.withCompactionConfig(HoodieCompactionConfig.newBuilder().withPayloadClass(cfg.payloadClassName)
 // Inline compaction is disabled for continuous mode. 
otherwise enabled for MOR
 .withInlineCompaction(cfg.isInlineCompactionEnabled()).build())
 .forTable(cfg.targetTableName)
-
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
-.withAutoCommit(false).withProps(props);
+.withAutoCommit(autoCommit).withProps(props);

Review comment:
   but this will override this.. the `withProps()`. no? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1715: [HUDI-1002] Ignore case when setting incremental mode in hive query

2020-06-08 Thread GitBox


codecov-commenter edited a comment on pull request #1715:
URL: https://github.com/apache/hudi/pull/1715#issuecomment-640451170


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=h1) Report
   > Merging 
[#1715](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/fb283934a33a0bc7b11f80e4149f7922fa4f0af5=desc)
 will **decrease** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1715/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1715  +/-   ##
   
   - Coverage 18.19%   18.18%   -0.01% 
 Complexity  857  857  
   
 Files   348  348  
 Lines 1535815361   +3 
 Branches   1525 1525  
   
 Hits   2794 2794  
   - Misses1220612209   +3 
 Partials358  358  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/hadoop/HoodieHiveUtil.java](https://codecov.io/gh/apache/hudi/pull/1715/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZUhpdmVVdGlsLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/1715/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `36.06% <0.00%> (-1.87%)` | `9.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=footer). Last 
update 
[fb28393...3bbf939](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #1715: [HUDI-1002] Ignore case when setting incremental mode in hive query

2020-06-08 Thread GitBox


codecov-commenter commented on pull request #1715:
URL: https://github.com/apache/hudi/pull/1715#issuecomment-640451170


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=h1) Report
   > Merging 
[#1715](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/fb283934a33a0bc7b11f80e4149f7922fa4f0af5=desc)
 will **decrease** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1715/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1715  +/-   ##
   
   - Coverage 18.19%   18.18%   -0.01% 
 Complexity  857  857  
   
 Files   348  348  
 Lines 1535815361   +3 
 Branches   1525 1525  
   
 Hits   2794 2794  
   - Misses1220612209   +3 
 Partials358  358  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/hadoop/HoodieHiveUtil.java](https://codecov.io/gh/apache/hudi/pull/1715/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZUhpdmVVdGlsLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/1715/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `36.06% <0.00%> (-1.87%)` | `9.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=footer). Last 
update 
[fb28393...3bbf939](https://codecov.io/gh/apache/hudi/pull/1715?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1002) Ignore case when setting incremental mode in hive query

2020-06-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1002:
-
Labels: newbie pull-request-available  (was: newbie)

> Ignore case when setting incremental mode in hive query
> ---
>
> Key: HUDI-1002
> URL: https://issues.apache.org/jira/browse/HUDI-1002
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: leesf
>Assignee: Hong Shen
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.6.0
>
>
> when using hive query hudi dataset in incremental mode, we need set 
> `set hoodie.hudi_table.consume.mode=INCREMENTAL`, here INCREMENTAL must be 
> uppercase, and 
> `set hoodie.hudi_table.consume.mode=incremental` would not work. 
> IMO, `incremental`should also work, and we would ignore the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] shenh062326 opened a new pull request #1715: [HUDI-1002] Ignore case when setting incremental mode in hive query

2020-06-08 Thread GitBox


shenh062326 opened a new pull request #1715:
URL: https://github.com/apache/hudi/pull/1715


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Ignore case when setting incremental mode in hive query
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Litianye commented on issue #143: Tracking ticket for folks to be added to slack group

2020-06-08 Thread GitBox


Litianye commented on issue #143:
URL: https://github.com/apache/hudi/issues/143#issuecomment-640426343


   please add litiany...@outlook.com



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-1000] Fix incremental query for COW non-partitioned table with no data (#1708)

2020-06-08 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new e0a5e0d  [HUDI-1000] Fix incremental query for COW non-partitioned 
table with no data (#1708)
e0a5e0d is described below

commit e0a5e0d3435acc5f01812aa46d96cc7d5eb65860
Author: hj2016 
AuthorDate: Mon Jun 8 15:34:42 2020 +0800

[HUDI-1000] Fix incremental query for COW non-partitioned table with no 
data (#1708)
---
 .../src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
index 6b0ecb9..5a10f3c 100644
--- 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
+++ 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
@@ -183,7 +183,7 @@ public class HoodieParquetInputFormat extends 
MapredParquetInputFormat implement
   return null;
 }
 String incrementalInputPaths = partitionsToList.stream()
-.map(s -> tableMetaClient.getBasePath() + Path.SEPARATOR + s)
+.map(s -> StringUtils.isNullOrEmpty(s) ? tableMetaClient.getBasePath() 
: tableMetaClient.getBasePath() + Path.SEPARATOR + s)
 .filter(s -> {
   /*
* Ensure to return only results from the original input path that 
has incremental changes



[GitHub] [hudi] leesf merged pull request #1708: [HUDI-1000] Fix incremental query for COW non-partitioned table with no data

2020-06-08 Thread GitBox


leesf merged pull request #1708:
URL: https://github.com/apache/hudi/pull/1708


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1006) deltastreamer set auto.offset.reset=latest can't consume data

2020-06-08 Thread liujinhui (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liujinhui updated HUDI-1006:

Status: Open  (was: New)

> deltastreamer set auto.offset.reset=latest can't consume data
> -
>
> Key: HUDI-1006
> URL: https://issues.apache.org/jira/browse/HUDI-1006
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
> Fix For: 0.6.0
>
>
> org.apache.hudi.utilities.sources.JsonKafkaSource#fetchNewData
> if (totalNewMsgs <= 0) {
>  return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? 
> lastCheckpointStr.get() : "");
> }
> I think it should not be empty here, it should be 
> if (totalNewMsgs <= 0) {
>  return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? 
> lastCheckpointStr.get() : CheckpointUtils.offsetsToStr(offsetRanges));
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1006) deltastreamer set auto.offset.reset=latest can't consume data

2020-06-08 Thread liujinhui (Jira)
liujinhui created HUDI-1006:
---

 Summary: deltastreamer set auto.offset.reset=latest can't consume 
data
 Key: HUDI-1006
 URL: https://issues.apache.org/jira/browse/HUDI-1006
 Project: Apache Hudi
  Issue Type: Bug
  Components: DeltaStreamer
Reporter: liujinhui
Assignee: liujinhui
 Fix For: 0.6.0


org.apache.hudi.utilities.sources.JsonKafkaSource#fetchNewData


if (totalNewMsgs <= 0) {
 return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? 
lastCheckpointStr.get() : "");
}

I think it should not be empty here, it should be 

if (totalNewMsgs <= 0) {
 return new InputBatch<>(Option.empty(), lastCheckpointStr.isPresent() ? 
lastCheckpointStr.get() : CheckpointUtils.offsetsToStr(offsetRanges));
}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter commented on pull request #1706: [HUDI-998] Introduce a robot to build testing website automatically

2020-06-08 Thread GitBox


codecov-commenter commented on pull request #1706:
URL: https://github.com/apache/hudi/pull/1706#issuecomment-640409433


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1706?src=pr=h1) Report
   > Merging 
[#1706](https://codecov.io/gh/apache/hudi/pull/1706?src=pr=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/fb283934a33a0bc7b11f80e4149f7922fa4f0af5=desc)
 will **increase** coverage by `0.05%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1706/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1706?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1706  +/-   ##
   
   + Coverage 18.19%   18.24%   +0.05% 
   - Complexity  857  863   +6 
   
 Files   348  348  
 Lines 1535815391  +33 
 Branches   1525 1525  
   
   + Hits   2794 2808  +14 
   - Misses1220612225  +19 
 Partials358  358  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1706?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/1706/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `39.56% <0.00%> (+1.62%)` | `15.00% <0.00%> (+6.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1706?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1706?src=pr=footer). Last 
update 
[fb28393...ea5e06b](https://codecov.io/gh/apache/hudi/pull/1706?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #1714: [HUDI-1005] fix NPE in HoodieWriteClient.clean

2020-06-08 Thread GitBox


codecov-commenter commented on pull request #1714:
URL: https://github.com/apache/hudi/pull/1714#issuecomment-640408285


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1714?src=pr=h1) Report
   > Merging 
[#1714](https://codecov.io/gh/apache/hudi/pull/1714?src=pr=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/fb283934a33a0bc7b11f80e4149f7922fa4f0af5=desc)
 will **increase** coverage by `0.05%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1714/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1714?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1714  +/-   ##
   
   + Coverage 18.19%   18.24%   +0.05% 
   - Complexity  857  863   +6 
   
 Files   348  348  
 Lines 1535815391  +33 
 Branches   1525 1525  
   
   + Hits   2794 2808  +14 
   - Misses1220612225  +19 
 Partials358  358  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1714?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...java/org/apache/hudi/client/HoodieWriteClient.java](https://codecov.io/gh/apache/hudi/pull/1714/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVdyaXRlQ2xpZW50LmphdmE=)
 | `20.37% <0.00%> (ø)` | `12.00 <0.00> (ø)` | |
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/1714/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `39.56% <0.00%> (+1.62%)` | `15.00% <0.00%> (+6.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1714?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1714?src=pr=footer). Last 
update 
[fb28393...5239c9e](https://codecov.io/gh/apache/hudi/pull/1714?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #1710: [MINOR] Fix delta streamer write config

2020-06-08 Thread GitBox


codecov-commenter commented on pull request #1710:
URL: https://github.com/apache/hudi/pull/1710#issuecomment-640407664


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1710?src=pr=h1) Report
   > Merging 
[#1710](https://codecov.io/gh/apache/hudi/pull/1710?src=pr=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/fb283934a33a0bc7b11f80e4149f7922fa4f0af5=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1710/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1710?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1710   +/-   ##
   =
 Coverage 18.19%   18.20%   
   - Complexity  857  858+1 
   =
 Files   348  348   
 Lines 1535815357-1 
 Branches   1525 1525   
   =
   + Hits   2794 2795+1 
   + Misses1220612204-2 
 Partials358  358   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1710?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/1710/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `39.84% <100.00%> (-0.24%)` | `48.00 <1.00> (ø)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/hudi/pull/1710/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `22.69% <0.00%> (+0.70%)` | `29.00% <0.00%> (+1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1710?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1710?src=pr=footer). Last 
update 
[fb28393...6f98540](https://codecov.io/gh/apache/hudi/pull/1710?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1002) Ignore case when setting incremental mode in hive query

2020-06-08 Thread Hong Shen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127926#comment-17127926
 ] 

Hong Shen commented on HUDI-1002:
-

[~xleesf] I'm going to work on this. :D

> Ignore case when setting incremental mode in hive query
> ---
>
> Key: HUDI-1002
> URL: https://issues.apache.org/jira/browse/HUDI-1002
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: leesf
>Assignee: Hong Shen
>Priority: Major
>  Labels: newbie
> Fix For: 0.6.0
>
>
> when using hive query hudi dataset in incremental mode, we need set 
> `set hoodie.hudi_table.consume.mode=INCREMENTAL`, here INCREMENTAL must be 
> uppercase, and 
> `set hoodie.hudi_table.consume.mode=incremental` would not work. 
> IMO, `incremental`should also work, and we would ignore the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1002) Ignore case when setting incremental mode in hive query

2020-06-08 Thread Hong Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen reassigned HUDI-1002:
---

Assignee: Hong Shen

> Ignore case when setting incremental mode in hive query
> ---
>
> Key: HUDI-1002
> URL: https://issues.apache.org/jira/browse/HUDI-1002
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: leesf
>Assignee: Hong Shen
>Priority: Major
>  Labels: newbie
> Fix For: 0.6.0
>
>
> when using hive query hudi dataset in incremental mode, we need set 
> `set hoodie.hudi_table.consume.mode=INCREMENTAL`, here INCREMENTAL must be 
> uppercase, and 
> `set hoodie.hudi_table.consume.mode=incremental` would not work. 
> IMO, `incremental`should also work, and we would ignore the case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] bvaradar commented on pull request #1650: [HUDI-541]: replaced dataFile/df with baseFile/bf throughout code base

2020-06-08 Thread GitBox


bvaradar commented on pull request #1650:
URL: https://github.com/apache/hudi/pull/1650#issuecomment-640395176


   @pratyakshsharma : Can you resolve conflicts and rebase when you get a 
chance ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar merged pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-08 Thread GitBox


bvaradar merged pull request #1707:
URL: https://github.com/apache/hudi/pull/1707


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-988] Fix More Unit Test Flakiness

2020-06-08 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new e9cab67  [HUDI-988] Fix More Unit Test Flakiness
e9cab67 is described below

commit e9cab67b8095b30205af27498dc0b279d188a454
Author: garyli1019 
AuthorDate: Fri Jun 5 17:25:59 2020 -0700

[HUDI-988] Fix More Unit Test Flakiness
---
 .../hudi/client/TestCompactionAdminClient.java |   8 --
 .../java/org/apache/hudi/client/TestMultiFS.java   |   4 +-
 .../hudi/client/TestTableSchemaEvolution.java  |  12 --
 .../hudi/client/TestUpdateSchemaEvolution.java |   3 +-
 .../hudi/execution/TestBoundedInMemoryQueue.java   |   3 +-
 .../TestSparkBoundedInMemoryExecutor.java  |   2 +-
 .../org/apache/hudi/index/TestHoodieIndex.java |  13 +--
 .../hudi/index/bloom/TestHoodieBloomIndex.java |   4 +-
 .../index/bloom/TestHoodieGlobalBloomIndex.java|   5 +-
 .../apache/hudi/io/TestHoodieCommitArchiveLog.java |   3 +-
 .../hudi/io/TestHoodieKeyLocationFetchHandle.java  |   4 +-
 .../org/apache/hudi/io/TestHoodieMergeHandle.java  |   6 +-
 .../apache/hudi/table/TestConsistencyGuard.java|   2 +-
 .../hudi/table/TestHoodieMergeOnReadTable.java | 123 ++---
 .../commit/TestCopyOnWriteActionExecutor.java  |  32 +-
 .../table/action/commit/TestUpsertPartitioner.java |  23 +---
 .../table/action/compact/TestAsyncCompaction.java  |   2 +-
 .../table/action/compact/TestHoodieCompactor.java  |   5 +-
 .../hudi/testutils/HoodieClientTestHarness.java|  67 ---
 .../table/view/HoodieTableFileSystemView.java  |   6 +
 .../timeline/service/FileSystemViewHandler.java|   2 +-
 pom.xml|   2 +-
 22 files changed, 138 insertions(+), 193 deletions(-)

diff --git 
a/hudi-client/src/test/java/org/apache/hudi/client/TestCompactionAdminClient.java
 
b/hudi-client/src/test/java/org/apache/hudi/client/TestCompactionAdminClient.java
index 2d69156..1200f67 100644
--- 
a/hudi-client/src/test/java/org/apache/hudi/client/TestCompactionAdminClient.java
+++ 
b/hudi-client/src/test/java/org/apache/hudi/client/TestCompactionAdminClient.java
@@ -37,7 +37,6 @@ import org.apache.hudi.testutils.HoodieClientTestBase;
 
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
-import org.junit.jupiter.api.AfterEach;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 
@@ -71,13 +70,6 @@ public class TestCompactionAdminClient extends 
HoodieClientTestBase {
 client = new CompactionAdminClient(jsc, basePath);
   }
 
-  @AfterEach
-  public void tearDown() {
-client.close();
-metaClient = null;
-cleanupSparkContexts();
-  }
-
   @Test
   public void testUnscheduleCompactionPlan() throws Exception {
 int numEntriesPerInstant = 10;
diff --git a/hudi-client/src/test/java/org/apache/hudi/client/TestMultiFS.java 
b/hudi-client/src/test/java/org/apache/hudi/client/TestMultiFS.java
index 02efe8e..6a78bc5 100644
--- a/hudi-client/src/test/java/org/apache/hudi/client/TestMultiFS.java
+++ b/hudi-client/src/test/java/org/apache/hudi/client/TestMultiFS.java
@@ -63,9 +63,7 @@ public class TestMultiFS extends HoodieClientTestHarness {
 
   @AfterEach
   public void tearDown() throws Exception {
-cleanupSparkContexts();
-cleanupDFS();
-cleanupTestDataGenerator();
+cleanupResources();
   }
 
   protected HoodieWriteConfig getHoodieWriteConfig(String basePath) {
diff --git 
a/hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
 
b/hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
index 0148bca..25e97c9 100644
--- 
a/hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
+++ 
b/hudi-client/src/test/java/org/apache/hudi/client/TestTableSchemaEvolution.java
@@ -38,8 +38,6 @@ import org.apache.hudi.testutils.TestRawTripPayload;
 
 import org.apache.avro.Schema;
 import org.apache.avro.generic.GenericRecord;
-import org.junit.jupiter.api.AfterEach;
-import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 
 import java.io.IOException;
@@ -76,16 +74,6 @@ public class TestTableSchemaEvolution extends 
HoodieClientTestBase {
   public static final String TRIP_EXAMPLE_SCHEMA_DEVOLVED = TRIP_SCHEMA_PREFIX 
+ MAP_TYPE_SCHEMA + FARE_NESTED_SCHEMA
   + TRIP_SCHEMA_SUFFIX;
 
-  @BeforeEach
-  public void setUp() throws IOException {
-initResources();
-  }
-
-  @AfterEach
-  public void tearDown() throws IOException {
-cleanupResources();
-  }
-
   @Test
   public void testSchemaCompatibilityBasic() throws Exception {
 assertTrue(TableSchemaResolver.isSchemaCompatible(TRIP_EXAMPLE_SCHEMA, 
TRIP_EXAMPLE_SCHEMA),
diff --git 
a/hudi-client/src/test/java/org/apache/hudi/client/TestUpdateSchemaEvolution.java
 

[GitHub] [hudi] bvaradar commented on pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-08 Thread GitBox


bvaradar commented on pull request #1707:
URL: https://github.com/apache/hudi/pull/1707#issuecomment-640391981


   Will merge this change and re-trigger other PRs



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on a change in pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-08 Thread GitBox


bvaradar commented on a change in pull request #1707:
URL: https://github.com/apache/hudi/pull/1707#discussion_r436475281



##
File path: hudi-client/src/test/java/org/apache/hudi/client/TestMultiFS.java
##
@@ -63,9 +63,7 @@ public void setUp() throws Exception {
 
   @AfterEach
   public void tearDown() throws Exception {
-cleanupSparkContexts();
-cleanupDFS();
-cleanupTestDataGenerator();
+cleanupResources();

Review comment:
   @garyli1019 : Can you open a jira with information about the leak that 
you found in HoodieLogFileReader and HoodieWrapperFileSystem. We need to get to 
the top of this asap. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-08 Thread GitBox


codecov-commenter edited a comment on pull request #1707:
URL: https://github.com/apache/hudi/pull/1707#issuecomment-639967452


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1707?src=pr=h1) Report
   > Merging 
[#1707](https://codecov.io/gh/apache/hudi/pull/1707?src=pr=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/fb283934a33a0bc7b11f80e4149f7922fa4f0af5=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1707/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1707?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1707   +/-   ##
   =
 Coverage 18.19%   18.20%   
   - Complexity  857  858+1 
   =
 Files   348  348   
 Lines 1535815361+3 
 Branches   1525 1525   
   =
   + Hits   2794 2796+2 
   - Misses1220612207+1 
 Partials358  358   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1707?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/1707/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `36.06% <0.00%> (-1.87%)` | `9.00 <0.00> (ø)` | |
   | 
[...e/hudi/timeline/service/FileSystemViewHandler.java](https://codecov.io/gh/apache/hudi/pull/1707/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvRmlsZVN5c3RlbVZpZXdIYW5kbGVyLmphdmE=)
 | `39.90% <0.00%> (ø)` | `11.00 <0.00> (ø)` | |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/hudi/pull/1707/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `22.69% <0.00%> (+0.70%)` | `29.00% <0.00%> (+1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1707?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1707?src=pr=footer). Last 
update 
[fb28393...54ee824](https://codecov.io/gh/apache/hudi/pull/1707?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




<    1   2