[GitHub] [incubator-hudi] jaimin-shah closed pull request #728: adding support for complex keys

2019-06-13 Thread GitBox
jaimin-shah closed pull request #728: adding support for complex keys
URL: https://github.com/apache/incubator-hudi/pull/728
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] jaimin-shah opened a new pull request #728: adding support for complex keys

2019-06-13 Thread GitBox
jaimin-shah opened a new pull request #728: adding support for complex keys
URL: https://github.com/apache/incubator-hudi/pull/728
 
 
   Resolving the issue related to ambiguity in recordKey by creating and 
parsing json object as string.
   
   Now HoodieKey looks like this:
   HoodieKey { 
recordKey={"_row_key":"16bf0b32-7557-42ac-b367-9fe32ae4795e","timestamp":"0.0"} 
partitionPath=rider-002/driver-002}


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] rbhartia commented on issue #705: hadoop 2.8.x miss RecoveryInProgressException class

2019-06-13 Thread GitBox
rbhartia commented on issue #705: hadoop 2.8.x miss RecoveryInProgressException 
class
URL: https://github.com/apache/incubator-hudi/issues/705#issuecomment-501948380
 
 
   Seems like this is due to hoodie-common POM using the hadoop-common and 
hadoop-hdfs modules with tests classifier. I was able to hoodie to build 
against 2.8.4 by simply adding these modules in hoodie-common pom.xml. 
   
   
 org.apache.hadoop
 hadoop-hdfs
   
   
 org.apache.hadoop
 hadoop-common
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen opened a new issue #734: How to upsert data just with memory

2019-06-13 Thread GitBox
cdmikechen opened a new issue #734: How to upsert data just with memory
URL: https://github.com/apache/incubator-hudi/issues/734
 
 
   I found that there is a `hoodie.write.status.storage.level` configuration in 
`HoodieWriteConfig` , so I tried to update a row in a hoodie table (750 rows 
and 400KB). But when using spark to update a row to hoodie , hoodie still 
shuffle data to disk and not only use memory, and it will take more time.
   ![image](https://github.com/cdmikechen/image/blob/master/20190614.png)
   I think if data is small , hoodie should process data in memory and not 
shuffle. Is there a way to let hoodie not to shuffle data?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar opened a new pull request #733: Ci fixes

2019-06-13 Thread GitBox
vinothchandar opened a new pull request #733: Ci fixes
URL: https://github.com/apache/incubator-hudi/pull/733
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #722: HUDI-148 Small File selection logic for MOR must skip fileIds selected for pending compaction correctly

2019-06-13 Thread GitBox
bvaradar commented on issue #722: HUDI-148 Small File selection logic for MOR 
must skip fileIds selected for pending compaction correctly
URL: https://github.com/apache/incubator-hudi/pull/722#issuecomment-501864692
 
 
   @n3nash : Please merge when you review this PR and ok with it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar merged pull request #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement

2019-06-13 Thread GitBox
vinothchandar merged pull request #674: Upgrade to Hive 2.x, MOR read query 
fixes and performance improvement
URL: https://github.com/apache/incubator-hudi/pull/674
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: - Ugrading to Hive 2.x - Eliminating in-memory deltaRecordsMap - Use writerSchema to generate generic record needed by custom payloads - changes to make tests w

2019-06-13 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 129e433  - Ugrading to Hive 2.x - Eliminating in-memory 
deltaRecordsMap - Use writerSchema to generate generic record needed by custom 
payloads - changes to make tests work with hive 2.x
129e433 is described below

commit 129e4336413fd2290e137804cf16c515c502c2f7
Author: Nishith Agarwal 
AuthorDate: Fri May 10 13:09:09 2019 -0700

- Ugrading to Hive 2.x
- Eliminating in-memory deltaRecordsMap
- Use writerSchema to generate generic record needed by custom payloads
- changes to make tests work with hive 2.x
---
 .../table/log/AbstractHoodieLogRecordScanner.java  |   2 -
 .../common/table/log/HoodieLogFileReader.java  |   2 +
 .../hoodie/common/table/log/HoodieLogFormat.java   |  20 +++
 .../common/table/log/HoodieLogFormatReader.java|  10 ++
 .../uber/hoodie/common/util/HoodieAvroUtils.java   |  22 ++-
 .../uber/hoodie/common/util/LogReaderUtils.java|  81 +++
 .../com/uber/hoodie/hadoop/HoodieInputFormat.java  |   3 +-
 .../hadoop/SafeParquetRecordReaderWrapper.java |  11 +-
 .../realtime/AbstractRealtimeRecordReader.java | 132 -
 .../hadoop/realtime/HoodieRealtimeInputFormat.java |  23 ++-
 .../realtime/HoodieRealtimeRecordReader.java   |  15 +-
 .../realtime/RealtimeCompactedRecordReader.java|  83 ++-
 .../realtime/RealtimeUnmergedRecordReader.java |  13 +-
 .../uber/hoodie/hadoop/HoodieInputFormatTest.java  |   5 +-
 .../uber/hoodie/hadoop/InputFormatTestUtil.java|  44 ++
 .../realtime/HoodieRealtimeRecordReaderTest.java   | 159 +
 hoodie-hive/pom.xml|  15 ++
 .../com/uber/hoodie/hive/HoodieHiveClient.java |   9 +-
 .../com/uber/hoodie/hive/util/HiveTestService.java |   3 +
 hoodie-utilities/pom.xml   |  51 +--
 pom.xml|   2 +-
 release/config/license-mappings.xml|  40 --
 22 files changed, 554 insertions(+), 191 deletions(-)

diff --git 
a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/AbstractHoodieLogRecordScanner.java
 
b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/AbstractHoodieLogRecordScanner.java
index b0010b4..c2fe730 100644
--- 
a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/AbstractHoodieLogRecordScanner.java
+++ 
b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/AbstractHoodieLogRecordScanner.java
@@ -310,8 +310,6 @@ public abstract class AbstractHoodieLogRecordScanner {
   processAvroDataBlock((HoodieAvroDataBlock) lastBlock);
   break;
 case DELETE_BLOCK:
-  // TODO : If delete is the only block written and/or records are 
present in parquet file
-  // TODO : Mark as tombstone (optional.empty()) for data instead of 
deleting the entry
   Arrays.stream(((HoodieDeleteBlock) 
lastBlock).getKeysToDelete()).forEach(this::processNextDeletedKey);
   break;
 case CORRUPT_BLOCK:
diff --git 
a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFileReader.java
 
b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFileReader.java
index 8c2dea4..d062cc1 100644
--- 
a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFileReader.java
+++ 
b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFileReader.java
@@ -331,6 +331,7 @@ class HoodieLogFileReader implements HoodieLogFormat.Reader 
{
   /**
* hasPrev is not idempotent
*/
+  @Override
   public boolean hasPrev() {
 try {
   if (!this.reverseReader) {
@@ -352,6 +353,7 @@ class HoodieLogFileReader implements HoodieLogFormat.Reader 
{
* iterate reverse (prev) or forward (next). Doing both in the same instance 
is not supported
* WARNING : Every call to prev() should be preceded with hasPrev()
*/
+  @Override
   public HoodieLogBlock prev() throws IOException {
 
 if (!this.reverseReader) {
diff --git 
a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFormat.java
 
b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFormat.java
index 3f01179..650700a 100644
--- 
a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFormat.java
+++ 
b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFormat.java
@@ -81,6 +81,19 @@ public interface HoodieLogFormat {
  * @return the path to this {@link HoodieLogFormat}
  */
 HoodieLogFile getLogFile();
+
+/**
+ * Read log file in reverse order and check if prev block is present
+ * @return
+ */
+public boolean hasPrev();
+
+/**
+ * Read log file in reverse order and return prev block if presen

[GitHub] [incubator-hudi] bvaradar opened a new pull request #732: Explicitly set jvm max memory setting to 4G

2019-06-13 Thread GitBox
bvaradar opened a new pull request #732: Explicitly set jvm max memory setting 
to 4G
URL: https://github.com/apache/incubator-hudi/pull/732
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: All Opened hoodie clients in tests needs to be closed TestMergeOnReadTable must use embedded timeline server

2019-06-13 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new cd7623e  All Opened hoodie clients in tests needs to be closed 
TestMergeOnReadTable must use embedded timeline server
cd7623e is described below

commit cd7623e2160abd43ea44f48a9640ea8fa0bb6db3
Author: Balaji Varadarajan 
AuthorDate: Wed Jun 12 18:28:49 2019 -0700

All Opened hoodie clients in tests needs to be closed
TestMergeOnReadTable must use embedded timeline server
---
 .../java/com/uber/hoodie/TestAsyncCompaction.java  |  2 +-
 .../java/com/uber/hoodie/TestClientRollback.java   |  6 +--
 .../java/com/uber/hoodie/TestHoodieClientBase.java | 37 -
 .../TestHoodieClientOnCopyOnWriteStorage.java  | 14 +++
 .../java/com/uber/hoodie/TestHoodieReadClient.java | 12 +++---
 .../src/test/java/com/uber/hoodie/TestMultiFS.java | 23 ++-
 .../java/com/uber/hoodie/index/TestHbaseIndex.java | 22 --
 .../com/uber/hoodie/io/TestHoodieCompactor.java| 18 +++-
 .../com/uber/hoodie/io/TestHoodieMergeHandle.java  | 18 +++-
 .../uber/hoodie/table/TestMergeOnReadTable.java| 48 ++
 10 files changed, 146 insertions(+), 54 deletions(-)

diff --git 
a/hoodie-client/src/test/java/com/uber/hoodie/TestAsyncCompaction.java 
b/hoodie-client/src/test/java/com/uber/hoodie/TestAsyncCompaction.java
index a86edc4..4fcc32a 100644
--- a/hoodie-client/src/test/java/com/uber/hoodie/TestAsyncCompaction.java
+++ b/hoodie-client/src/test/java/com/uber/hoodie/TestAsyncCompaction.java
@@ -92,7 +92,7 @@ public class TestAsyncCompaction extends TestHoodieClientBase 
{
   public void testRollbackForInflightCompaction() throws Exception {
 // Rollback inflight compaction
 HoodieWriteConfig cfg = getConfig(false);
-HoodieWriteClient client = new HoodieWriteClient(jsc, cfg, true);
+HoodieWriteClient client = getHoodieWriteClient(cfg, true);
 
 String firstInstantTime = "001";
 String secondInstantTime = "004";
diff --git 
a/hoodie-client/src/test/java/com/uber/hoodie/TestClientRollback.java 
b/hoodie-client/src/test/java/com/uber/hoodie/TestClientRollback.java
index f08c343..8a6f18d 100644
--- a/hoodie-client/src/test/java/com/uber/hoodie/TestClientRollback.java
+++ b/hoodie-client/src/test/java/com/uber/hoodie/TestClientRollback.java
@@ -204,7 +204,7 @@ public class TestClientRollback extends 
TestHoodieClientBase {
 HoodieWriteConfig config = 
HoodieWriteConfig.newBuilder().withPath(basePath).withIndexConfig(
 
HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.INMEMORY).build()).build();
 
-HoodieWriteClient client = new HoodieWriteClient(jsc, config, false);
+HoodieWriteClient client = getHoodieWriteClient(config, false);
 
 // Rollback commit 1 (this should fail, since commit2 is still around)
 try {
@@ -294,7 +294,7 @@ public class TestClientRollback extends 
TestHoodieClientBase {
 HoodieWriteConfig config = 
HoodieWriteConfig.newBuilder().withPath(basePath).withIndexConfig(
 
HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.INMEMORY).build()).build();
 
-new HoodieWriteClient(jsc, config, false);
+getHoodieWriteClient(config, false);
 
 // Check results, nothing changed
 assertTrue(HoodieTestUtils.doesCommitExist(basePath, commitTime1));
@@ -311,7 +311,7 @@ public class TestClientRollback extends 
TestHoodieClientBase {
 && HoodieTestUtils.doesDataFileExist(basePath, "2016/05/06", 
commitTime1, file13));
 
 // Turn auto rollback on
-new HoodieWriteClient(jsc, config, true).startCommit();
+getHoodieWriteClient(config, true).startCommit();
 assertTrue(HoodieTestUtils.doesCommitExist(basePath, commitTime1));
 assertFalse(HoodieTestUtils.doesInflightExist(basePath, commitTime2));
 assertFalse(HoodieTestUtils.doesInflightExist(basePath, commitTime3));
diff --git 
a/hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClientBase.java 
b/hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClientBase.java
index 850f45f..a668d3b 100644
--- a/hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClientBase.java
+++ b/hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClientBase.java
@@ -81,27 +81,43 @@ public class TestHoodieClientBase implements Serializable {
   protected transient HoodieTestDataGenerator dataGen = null;
 
   private HoodieWriteClient writeClient;
+  private HoodieReadClient readClient;
 
-  protected HoodieWriteClient getHoodieWriteClient(HoodieWriteConfig cfg) 
throws Exception {
-closeClient();
-writeClient = new HoodieWriteClient(jsc, cfg);
-return writeClient;
+  protected HoodieWriteClient getHoodieWriteClient(HoodieWriteConfig cfg) {
+return getHoodieWriteClient(cfg, false);
   }
 
-  protected HoodieWriteClient getHoodieWrit

[GitHub] [incubator-hudi] bvaradar merged pull request #731: All Opened hoodie clients in tests needs to be closed

2019-06-13 Thread GitBox
bvaradar merged pull request #731: All Opened hoodie clients in tests needs to 
be closed
URL: https://github.com/apache/incubator-hudi/pull/731
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash edited a comment on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement

2019-06-13 Thread GitBox
n3nash edited a comment on issue #674: Upgrade to Hive 2.x, MOR read query 
fixes and performance improvement
URL: https://github.com/apache/incubator-hudi/pull/674#issuecomment-501810417
 
 
   Yes, I rebased and pushed last night. This should not increase the 
instability, not many tests added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement

2019-06-13 Thread GitBox
n3nash commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and 
performance improvement
URL: https://github.com/apache/incubator-hudi/pull/674#issuecomment-501810417
 
 
   Yes, I rebased and pushed last night.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement

2019-06-13 Thread GitBox
vinothchandar commented on issue #674: Upgrade to Hive 2.x, MOR read query 
fixes and performance improvement
URL: https://github.com/apache/incubator-hudi/pull/674#issuecomment-501789301
 
 
   @n3nash can you rebase against latest master and push again.. We are waiting 
to stabilize the CI, before merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement

2019-06-13 Thread GitBox
n3nash commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and 
performance improvement
URL: https://github.com/apache/incubator-hudi/pull/674#issuecomment-501788485
 
 
   @vinothchandar Waiting for you to accept, can merge this then.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar opened a new pull request #731: All Opened hoodie clients in tests needs to be closed

2019-06-13 Thread GitBox
bvaradar opened a new pull request #731: All Opened hoodie clients in tests 
needs to be closed
URL: https://github.com/apache/incubator-hudi/pull/731
 
 
   Also, TestMergeOnReadTable must use embedded timeline server


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services