[GitHub] [incubator-hudi] jaimin-shah closed pull request #728: adding support for complex keys
jaimin-shah closed pull request #728: adding support for complex keys URL: https://github.com/apache/incubator-hudi/pull/728 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] jaimin-shah opened a new pull request #728: adding support for complex keys
jaimin-shah opened a new pull request #728: adding support for complex keys URL: https://github.com/apache/incubator-hudi/pull/728 Resolving the issue related to ambiguity in recordKey by creating and parsing json object as string. Now HoodieKey looks like this: HoodieKey { recordKey={"_row_key":"16bf0b32-7557-42ac-b367-9fe32ae4795e","timestamp":"0.0"} partitionPath=rider-002/driver-002} This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] rbhartia commented on issue #705: hadoop 2.8.x miss RecoveryInProgressException class
rbhartia commented on issue #705: hadoop 2.8.x miss RecoveryInProgressException class URL: https://github.com/apache/incubator-hudi/issues/705#issuecomment-501948380 Seems like this is due to hoodie-common POM using the hadoop-common and hadoop-hdfs modules with tests classifier. I was able to hoodie to build against 2.8.4 by simply adding these modules in hoodie-common pom.xml. org.apache.hadoop hadoop-hdfs org.apache.hadoop hadoop-common This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] cdmikechen opened a new issue #734: How to upsert data just with memory
cdmikechen opened a new issue #734: How to upsert data just with memory URL: https://github.com/apache/incubator-hudi/issues/734 I found that there is a `hoodie.write.status.storage.level` configuration in `HoodieWriteConfig` , so I tried to update a row in a hoodie table (750 rows and 400KB). But when using spark to update a row to hoodie , hoodie still shuffle data to disk and not only use memory, and it will take more time. ![image](https://github.com/cdmikechen/image/blob/master/20190614.png) I think if data is small , hoodie should process data in memory and not shuffle. Is there a way to let hoodie not to shuffle data? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar opened a new pull request #733: Ci fixes
vinothchandar opened a new pull request #733: Ci fixes URL: https://github.com/apache/incubator-hudi/pull/733 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on issue #722: HUDI-148 Small File selection logic for MOR must skip fileIds selected for pending compaction correctly
bvaradar commented on issue #722: HUDI-148 Small File selection logic for MOR must skip fileIds selected for pending compaction correctly URL: https://github.com/apache/incubator-hudi/pull/722#issuecomment-501864692 @n3nash : Please merge when you review this PR and ok with it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar merged pull request #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement
vinothchandar merged pull request #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement URL: https://github.com/apache/incubator-hudi/pull/674 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-hudi] branch master updated: - Ugrading to Hive 2.x - Eliminating in-memory deltaRecordsMap - Use writerSchema to generate generic record needed by custom payloads - changes to make tests w
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/master by this push: new 129e433 - Ugrading to Hive 2.x - Eliminating in-memory deltaRecordsMap - Use writerSchema to generate generic record needed by custom payloads - changes to make tests work with hive 2.x 129e433 is described below commit 129e4336413fd2290e137804cf16c515c502c2f7 Author: Nishith Agarwal AuthorDate: Fri May 10 13:09:09 2019 -0700 - Ugrading to Hive 2.x - Eliminating in-memory deltaRecordsMap - Use writerSchema to generate generic record needed by custom payloads - changes to make tests work with hive 2.x --- .../table/log/AbstractHoodieLogRecordScanner.java | 2 - .../common/table/log/HoodieLogFileReader.java | 2 + .../hoodie/common/table/log/HoodieLogFormat.java | 20 +++ .../common/table/log/HoodieLogFormatReader.java| 10 ++ .../uber/hoodie/common/util/HoodieAvroUtils.java | 22 ++- .../uber/hoodie/common/util/LogReaderUtils.java| 81 +++ .../com/uber/hoodie/hadoop/HoodieInputFormat.java | 3 +- .../hadoop/SafeParquetRecordReaderWrapper.java | 11 +- .../realtime/AbstractRealtimeRecordReader.java | 132 - .../hadoop/realtime/HoodieRealtimeInputFormat.java | 23 ++- .../realtime/HoodieRealtimeRecordReader.java | 15 +- .../realtime/RealtimeCompactedRecordReader.java| 83 ++- .../realtime/RealtimeUnmergedRecordReader.java | 13 +- .../uber/hoodie/hadoop/HoodieInputFormatTest.java | 5 +- .../uber/hoodie/hadoop/InputFormatTestUtil.java| 44 ++ .../realtime/HoodieRealtimeRecordReaderTest.java | 159 + hoodie-hive/pom.xml| 15 ++ .../com/uber/hoodie/hive/HoodieHiveClient.java | 9 +- .../com/uber/hoodie/hive/util/HiveTestService.java | 3 + hoodie-utilities/pom.xml | 51 +-- pom.xml| 2 +- release/config/license-mappings.xml| 40 -- 22 files changed, 554 insertions(+), 191 deletions(-) diff --git a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/AbstractHoodieLogRecordScanner.java b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/AbstractHoodieLogRecordScanner.java index b0010b4..c2fe730 100644 --- a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/AbstractHoodieLogRecordScanner.java +++ b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/AbstractHoodieLogRecordScanner.java @@ -310,8 +310,6 @@ public abstract class AbstractHoodieLogRecordScanner { processAvroDataBlock((HoodieAvroDataBlock) lastBlock); break; case DELETE_BLOCK: - // TODO : If delete is the only block written and/or records are present in parquet file - // TODO : Mark as tombstone (optional.empty()) for data instead of deleting the entry Arrays.stream(((HoodieDeleteBlock) lastBlock).getKeysToDelete()).forEach(this::processNextDeletedKey); break; case CORRUPT_BLOCK: diff --git a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFileReader.java b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFileReader.java index 8c2dea4..d062cc1 100644 --- a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFileReader.java +++ b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFileReader.java @@ -331,6 +331,7 @@ class HoodieLogFileReader implements HoodieLogFormat.Reader { /** * hasPrev is not idempotent */ + @Override public boolean hasPrev() { try { if (!this.reverseReader) { @@ -352,6 +353,7 @@ class HoodieLogFileReader implements HoodieLogFormat.Reader { * iterate reverse (prev) or forward (next). Doing both in the same instance is not supported * WARNING : Every call to prev() should be preceded with hasPrev() */ + @Override public HoodieLogBlock prev() throws IOException { if (!this.reverseReader) { diff --git a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFormat.java b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFormat.java index 3f01179..650700a 100644 --- a/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFormat.java +++ b/hoodie-common/src/main/java/com/uber/hoodie/common/table/log/HoodieLogFormat.java @@ -81,6 +81,19 @@ public interface HoodieLogFormat { * @return the path to this {@link HoodieLogFormat} */ HoodieLogFile getLogFile(); + +/** + * Read log file in reverse order and check if prev block is present + * @return + */ +public boolean hasPrev(); + +/** + * Read log file in reverse order and return prev block if presen
[GitHub] [incubator-hudi] bvaradar opened a new pull request #732: Explicitly set jvm max memory setting to 4G
bvaradar opened a new pull request #732: Explicitly set jvm max memory setting to 4G URL: https://github.com/apache/incubator-hudi/pull/732 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-hudi] branch master updated: All Opened hoodie clients in tests needs to be closed TestMergeOnReadTable must use embedded timeline server
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/master by this push: new cd7623e All Opened hoodie clients in tests needs to be closed TestMergeOnReadTable must use embedded timeline server cd7623e is described below commit cd7623e2160abd43ea44f48a9640ea8fa0bb6db3 Author: Balaji Varadarajan AuthorDate: Wed Jun 12 18:28:49 2019 -0700 All Opened hoodie clients in tests needs to be closed TestMergeOnReadTable must use embedded timeline server --- .../java/com/uber/hoodie/TestAsyncCompaction.java | 2 +- .../java/com/uber/hoodie/TestClientRollback.java | 6 +-- .../java/com/uber/hoodie/TestHoodieClientBase.java | 37 - .../TestHoodieClientOnCopyOnWriteStorage.java | 14 +++ .../java/com/uber/hoodie/TestHoodieReadClient.java | 12 +++--- .../src/test/java/com/uber/hoodie/TestMultiFS.java | 23 ++- .../java/com/uber/hoodie/index/TestHbaseIndex.java | 22 -- .../com/uber/hoodie/io/TestHoodieCompactor.java| 18 +++- .../com/uber/hoodie/io/TestHoodieMergeHandle.java | 18 +++- .../uber/hoodie/table/TestMergeOnReadTable.java| 48 ++ 10 files changed, 146 insertions(+), 54 deletions(-) diff --git a/hoodie-client/src/test/java/com/uber/hoodie/TestAsyncCompaction.java b/hoodie-client/src/test/java/com/uber/hoodie/TestAsyncCompaction.java index a86edc4..4fcc32a 100644 --- a/hoodie-client/src/test/java/com/uber/hoodie/TestAsyncCompaction.java +++ b/hoodie-client/src/test/java/com/uber/hoodie/TestAsyncCompaction.java @@ -92,7 +92,7 @@ public class TestAsyncCompaction extends TestHoodieClientBase { public void testRollbackForInflightCompaction() throws Exception { // Rollback inflight compaction HoodieWriteConfig cfg = getConfig(false); -HoodieWriteClient client = new HoodieWriteClient(jsc, cfg, true); +HoodieWriteClient client = getHoodieWriteClient(cfg, true); String firstInstantTime = "001"; String secondInstantTime = "004"; diff --git a/hoodie-client/src/test/java/com/uber/hoodie/TestClientRollback.java b/hoodie-client/src/test/java/com/uber/hoodie/TestClientRollback.java index f08c343..8a6f18d 100644 --- a/hoodie-client/src/test/java/com/uber/hoodie/TestClientRollback.java +++ b/hoodie-client/src/test/java/com/uber/hoodie/TestClientRollback.java @@ -204,7 +204,7 @@ public class TestClientRollback extends TestHoodieClientBase { HoodieWriteConfig config = HoodieWriteConfig.newBuilder().withPath(basePath).withIndexConfig( HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.INMEMORY).build()).build(); -HoodieWriteClient client = new HoodieWriteClient(jsc, config, false); +HoodieWriteClient client = getHoodieWriteClient(config, false); // Rollback commit 1 (this should fail, since commit2 is still around) try { @@ -294,7 +294,7 @@ public class TestClientRollback extends TestHoodieClientBase { HoodieWriteConfig config = HoodieWriteConfig.newBuilder().withPath(basePath).withIndexConfig( HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.INMEMORY).build()).build(); -new HoodieWriteClient(jsc, config, false); +getHoodieWriteClient(config, false); // Check results, nothing changed assertTrue(HoodieTestUtils.doesCommitExist(basePath, commitTime1)); @@ -311,7 +311,7 @@ public class TestClientRollback extends TestHoodieClientBase { && HoodieTestUtils.doesDataFileExist(basePath, "2016/05/06", commitTime1, file13)); // Turn auto rollback on -new HoodieWriteClient(jsc, config, true).startCommit(); +getHoodieWriteClient(config, true).startCommit(); assertTrue(HoodieTestUtils.doesCommitExist(basePath, commitTime1)); assertFalse(HoodieTestUtils.doesInflightExist(basePath, commitTime2)); assertFalse(HoodieTestUtils.doesInflightExist(basePath, commitTime3)); diff --git a/hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClientBase.java b/hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClientBase.java index 850f45f..a668d3b 100644 --- a/hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClientBase.java +++ b/hoodie-client/src/test/java/com/uber/hoodie/TestHoodieClientBase.java @@ -81,27 +81,43 @@ public class TestHoodieClientBase implements Serializable { protected transient HoodieTestDataGenerator dataGen = null; private HoodieWriteClient writeClient; + private HoodieReadClient readClient; - protected HoodieWriteClient getHoodieWriteClient(HoodieWriteConfig cfg) throws Exception { -closeClient(); -writeClient = new HoodieWriteClient(jsc, cfg); -return writeClient; + protected HoodieWriteClient getHoodieWriteClient(HoodieWriteConfig cfg) { +return getHoodieWriteClient(cfg, false); } - protected HoodieWriteClient getHoodieWrit
[GitHub] [incubator-hudi] bvaradar merged pull request #731: All Opened hoodie clients in tests needs to be closed
bvaradar merged pull request #731: All Opened hoodie clients in tests needs to be closed URL: https://github.com/apache/incubator-hudi/pull/731 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] n3nash edited a comment on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement
n3nash edited a comment on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement URL: https://github.com/apache/incubator-hudi/pull/674#issuecomment-501810417 Yes, I rebased and pushed last night. This should not increase the instability, not many tests added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] n3nash commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement
n3nash commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement URL: https://github.com/apache/incubator-hudi/pull/674#issuecomment-501810417 Yes, I rebased and pushed last night. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] vinothchandar commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement
vinothchandar commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement URL: https://github.com/apache/incubator-hudi/pull/674#issuecomment-501789301 @n3nash can you rebase against latest master and push again.. We are waiting to stabilize the CI, before merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] n3nash commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement
n3nash commented on issue #674: Upgrade to Hive 2.x, MOR read query fixes and performance improvement URL: https://github.com/apache/incubator-hudi/pull/674#issuecomment-501788485 @vinothchandar Waiting for you to accept, can merge this then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar opened a new pull request #731: All Opened hoodie clients in tests needs to be closed
bvaradar opened a new pull request #731: All Opened hoodie clients in tests needs to be closed URL: https://github.com/apache/incubator-hudi/pull/731 Also, TestMergeOnReadTable must use embedded timeline server This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services