[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
codecov-io edited a comment on issue #1165: [HUDI-76] Add CSV Source support 
for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-597470147
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=h1) 
Report
   > Merging 
[#1165](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806&el=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1165/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1165  +/-   ##
   
   + Coverage 67.40%   67.41%   +0.01% 
   - Complexity  230  240  +10 
   
 Files   336  337   +1 
 Lines 1636616391  +25 
 Branches   1672 1676   +4 
   
   + Hits  1103111050  +19 
   - Misses 4602 4605   +3 
   - Partials733  736   +3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `100.00% <100.00%> (ø)` | `10.00 <10.00> (?)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `25.00% <0.00%> (-50.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0.00%> (-0.88%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=footer).
 Last update 
[77d5b92...fb6bc0b](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
codecov-io commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi 
Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-597470147
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=h1) 
Report
   > Merging 
[#1165](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806?src=pr&el=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1165/graphs/tree.svg?width=650&token=VTTXabwbs2&height=150&src=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1165  +/-   ##
   
   + Coverage  67.4%   67.41%   +0.01% 
   - Complexity  230  240  +10 
   
 Files   336  337   +1 
 Lines 1636616391  +25 
 Branches   1672 1676   +4 
   
   + Hits  1103111050  +19 
   - Misses 4602 4605   +3 
   - Partials733  736   +3
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `100% <100%> (ø)` | `10 <10> (?)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `25% <0%> (-50%)` | `0% <0%> (ø)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0%> (-5.56%)` | `0% <0%> (ø)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1165/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0%> (-0.88%)` | `0% <0%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=footer).
 Last update 
[77d5b92...fb6bc0b](https://codecov.io/gh/apache/incubator-hudi/pull/1165?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-692) Add delete savepoint for cli

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-692:

Labels: pull-request-available  (was: )

> Add delete savepoint for cli
> 
>
> Key: HUDI-692
> URL: https://issues.apache.org/jira/browse/HUDI-692
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>
> Now, deleteSavepoint already provided in HoodieWriteClient, but not provide 
> to user, add it in CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong opened a new pull request #1397: [HUDI-692] Add delete savepoint for cli

2020-03-10 Thread GitBox
hddong opened a new pull request #1397: [HUDI-692] Add delete savepoint for cli
URL: https://github.com/apache/incubator-hudi/pull/1397
 
 
   ## What is the purpose of the pull request
   
   *Now, deleteSavepoint already provided in HoodieWriteClient, but not provide 
to user, add it in CLI.*
   
   ## Brief change log
   
 - *Add delete savepoint for cli*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-692) Add delete savepoint for cli

2020-03-10 Thread hong dongdong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong reassigned HUDI-692:
--

Assignee: hong dongdong

> Add delete savepoint for cli
> 
>
> Key: HUDI-692
> URL: https://issues.apache.org/jira/browse/HUDI-692
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>
> Now, deleteSavepoint already provided in HoodieWriteClient, but not provide 
> to user, add it in CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-692) Add delete savepoint for cli

2020-03-10 Thread hong dongdong (Jira)
hong dongdong created HUDI-692:
--

 Summary: Add delete savepoint for cli
 Key: HUDI-692
 URL: https://issues.apache.org/jira/browse/HUDI-692
 Project: Apache Hudi (incubating)
  Issue Type: New Feature
  Components: CLI
Reporter: hong dongdong


Now, deleteSavepoint already provided in HoodieWriteClient, but not provide to 
user, add it in CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390761706
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -693,6 +699,146 @@ public void 
testParquetDFSSourceWithSchemaFilesAndTransformer() throws Exception
 testParquetDFSSource(true, TripsWithDistanceTransformer.class.getName());
   }
 
+  private void prepareCsvDFSSource(
+  boolean hasHeader, char sep, boolean useSchemaProvider, boolean 
hasTransformer) throws IOException {
+String sourceRoot = dfsBasePath + "/csvFiles";
+String recordKeyField = (hasHeader || useSchemaProvider) ? "_row_key" : 
"_c0";
+
+// Properties used for testing delta-streamer with CSV source
+TypedProperties csvProps = new TypedProperties();
+csvProps.setProperty("include", "base.properties");
+csvProps.setProperty("hoodie.datasource.write.recordkey.field", 
recordKeyField);
+csvProps.setProperty("hoodie.datasource.write.partitionpath.field", 
"not_there");
+if (useSchemaProvider) {
+  
csvProps.setProperty("hoodie.deltastreamer.schemaprovider.source.schema.file", 
dfsBasePath + "/source-flattened.avsc");
+  if (hasTransformer) {
+
csvProps.setProperty("hoodie.deltastreamer.schemaprovider.target.schema.file", 
dfsBasePath + "/target-flattened.avsc");
+  }
+}
+csvProps.setProperty("hoodie.deltastreamer.source.dfs.root", sourceRoot);
+
+if (sep != ',') {
+  if (sep == '\t') {
+csvProps.setProperty("hoodie.deltastreamer.csv.sep", "\\t");
+  } else {
+csvProps.setProperty("hoodie.deltastreamer.csv.sep", 
Character.toString(sep));
+  }
+}
+if (hasHeader) {
+  csvProps.setProperty("hoodie.deltastreamer.csv.header", 
Boolean.toString(hasHeader));
+}
+
+UtilitiesTestBase.Helpers.savePropsToDFS(csvProps, dfs, dfsBasePath + "/" 
+ PROPS_FILENAME_TEST_CSV);
+
+String path = sourceRoot + "/1.csv";
+HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator();
+UtilitiesTestBase.Helpers.saveCsvToDFS(
+hasHeader, sep,
+Helpers.jsonifyRecords(dataGenerator.generateInserts("000", 
CSV_NUM_RECORDS, true)),
+dfs, path);
+  }
+
+  private void testCsvDFSSource(
+  boolean hasHeader, char sep, boolean useSchemaProvider, String 
transformerClassName) throws Exception {
+prepareCsvDFSSource(hasHeader, sep, useSchemaProvider, 
transformerClassName != null);
+String tableBasePath = dfsBasePath + "/test_csv_table" + testNum;
+String sourceOrderingField = (hasHeader || useSchemaProvider) ? 
"timestamp" : "_c0";
+HoodieDeltaStreamer deltaStreamer =
+new HoodieDeltaStreamer(TestHelpers.makeConfig(
+tableBasePath, Operation.INSERT, CsvDFSSource.class.getName(),
+transformerClassName, PROPS_FILENAME_TEST_CSV, false,
+useSchemaProvider, 1000, false, null, null, sourceOrderingField), 
jsc);
+deltaStreamer.sync();
+TestHelpers.assertRecordCount(CSV_NUM_RECORDS, tableBasePath + 
"/*/*.parquet", sqlContext);
+testNum++;
+  }
+
+  @Test
+  public void 
testCsvDFSSourceWithHeaderWithoutSchemaProviderAndNoTransformer() throws 
Exception {
+// The CSV files have header, the columns are separated by ',', the 
default separator
+// No schema provider is specified, no transformer is applied
+// In this case, the source schema comes from the inferred schema of the 
CSV files
+testCsvDFSSource(true, ',', false, null);
+  }
+
+  @Test
+  public void 
testCsvDFSSourceWithHeaderAndSepWithoutSchemaProviderAndNoTransformer() throws 
Exception {
+// The CSV files have header, the columns are separated by '\t',
+// which is passed in through the Hudi CSV properties
+// No schema provider is specified, no transformer is applied
+// In this case, the source schema comes from the inferred schema of the 
CSV files
+testCsvDFSSource(true, '\t', false, null);
+  }
+
+  @Test
+  public void 
testCsvDFSSourceWithHeaderAndSepWithSchemaProviderAndNoTransformer() throws 
Exception {
+// The CSV files have header, the columns are separated by '\t'
+// File schema provider is used, no transformer is applied
+// In this case, the source schema comes from the source Avro schema file
+testCsvDFSSource(true, '\t', true, null);
+  }
+
+  @Test
+  public void 
testCsvDFSSourceWithHeaderAndSepWithoutSchemaProviderAndWithTransformer() 
throws Exception {
+// The CSV files have header, the columns are separated by '\t'
+// No schema provider is specified, transformer is applied
+// In this case, the source schema comes from the inferred schema of the 
CSV files.
+// Target schema is determined based on the Dataframe after transformation
+testCsvDFSSource(true, '\t', false, 
TripsWithDistanceTransformer.class.getName());
+  }
+
+  @Test
+ 

[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390763221
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestCsvDFSSource.java
 ##
 @@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.utilities.UtilitiesTestBase;
+import org.apache.hudi.utilities.schema.FilebasedSchemaProvider;
+
+import org.apache.hadoop.fs.Path;
+import org.junit.Before;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * Basic tests for {@link CsvDFSSource}.
 
 Review comment:
   Actually this class runs the tests defined in `AbstractDFSSourceTestBase` 
with logic for CSV source implemented in this class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390762093
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/UtilitiesTestBase.java
 ##
 @@ -193,19 +204,60 @@ public static void saveStringsToDFS(String[] lines, 
FileSystem fs, String target
   os.close();
 }
 
+/**
+ * Converts the json records into CSV format and writes to a file.
+ *
+ * @param hasHeader  whether the CSV file should have a header line.
+ * @param sep  the column separator to use.
+ * @param lines  the records in JSON format.
+ * @param fs  {@link FileSystem} instance.
+ * @param targetPath  File path.
+ * @throws IOException
+ */
+public static void saveCsvToDFS(
+boolean hasHeader, char sep,
+String[] lines, FileSystem fs, String targetPath) throws IOException {
+  Builder csvSchemaBuilder = CsvSchema.builder();
+
+  ArrayNode arrayNode = mapper.createArrayNode();
+  Arrays.stream(lines).forEachOrdered(
+  line -> {
+try {
+  arrayNode.add(mapper.readValue(line, ObjectNode.class));
+} catch (IOException e) {
+  e.printStackTrace();
 
 Review comment:
   This should not happen though but agree that we can throw exception here to 
catch any conversion issues.  Note that this is only used in the test code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390760597
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/CsvDFSSource.java
 ##
 @@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.sources;
+
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.utilities.schema.SchemaProvider;
+import org.apache.hudi.utilities.sources.helpers.DFSPathSelector;
+
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.DataFrameReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.avro.SchemaConverters;
+import org.apache.spark.sql.types.StructType;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Reads data from CSV files on DFS as the data source.
+ *
+ * Internally, we use Spark to read CSV files thus any limitation of Spark CSV 
also applies here
+ * (e.g., limited support for nested schema).
+ *
+ * You can set the CSV-specific configs in the format of 
hoodie.deltastreamer.csv.*
+ * that are Spark compatible to deal with CSV files in Hudi.  The supported 
options are:
+ *
+ *   "sep", "encoding", "quote", "escape", "charToEscapeQuoteEscaping", 
"comment",
+ *   "header", "enforceSchema", "inferSchema", "samplingRatio", 
"ignoreLeadingWhiteSpace",
+ *   "ignoreTrailingWhiteSpace", "nullValue", "emptyValue", "nanValue", 
"positiveInf",
+ *   "negativeInf", "dateFormat", "timestampFormat", "maxColumns", 
"maxCharsPerColumn",
+ *   "mode", "columnNameOfCorruptRecord", "multiLine"
+ *
+ * Detailed information of these CSV options can be found at:
+ * 
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv-scala.collection.Seq-
+ *
+ * If the source Avro schema is provided through the {@link 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider}
+ * using "hoodie.deltastreamer.schemaprovider.source.schema.file" config, the 
schema is
+ * passed to the CSV reader without inferring the schema from the CSV file.
+ */
+public class CsvDFSSource extends RowSource {
+  // CsvSource config prefix
+  public static final String CSV_SRC_CONFIG_PREFIX = 
"hoodie.deltastreamer.csv.";
+  // CSV-specific configurations to pass in from Hudi to Spark
+  public static final List CSV_CONFIG_KEYS = Arrays.asList(
+  "sep", "encoding", "quote", "escape", "charToEscapeQuoteEscaping", 
"comment",
+  "header", "enforceSchema", "inferSchema", "samplingRatio", 
"ignoreLeadingWhiteSpace",
+  "ignoreTrailingWhiteSpace", "nullValue", "emptyValue", "nanValue", 
"positiveInf",
+  "negativeInf", "dateFormat", "timestampFormat", "maxColumns", 
"maxCharsPerColumn",
+  "mode", "columnNameOfCorruptRecord", "multiLine"
+  );
+
+  private final DFSPathSelector pathSelector;
+  private final StructType sourceSchema;
+
+  public CsvDFSSource(TypedProperties props,
+  JavaSparkContext sparkContext,
+  SparkSession sparkSession,
+  SchemaProvider schemaProvider) {
+super(props, sparkContext, sparkSession, schemaProvider);
+this.pathSelector = new DFSPathSelector(props, 
sparkContext.hadoopConfiguration());
+if (overriddenSchemaProvider != null) {
+  sourceSchema = (StructType) 
SchemaConverters.toSqlType(overriddenSchemaProvider.getSourceSchema()).dataType();
 
 Review comment:
   Good point.  Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
yihua commented on a change in pull request #1165: [HUDI-76] Add CSV Source 
support for Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r390761144
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -653,7 +659,7 @@ private void prepareParquetDFSSource(boolean 
useSchemaProvider, boolean hasTrans
 if (useSchemaProvider) {
   
parquetProps.setProperty("hoodie.deltastreamer.schemaprovider.source.schema.file",
 dfsBasePath + "/source.avsc");
   if (hasTransformer) {
-
parquetProps.setProperty("hoodie.deltastreamer.schemaprovider.source.schema.file",
 dfsBasePath + "/target.avsc");
+
parquetProps.setProperty("hoodie.deltastreamer.schemaprovider.target.schema.file",
 dfsBasePath + "/target.avsc");
 
 Review comment:
   I don't remember fixing unit tests.  Given that this is optional so it is 
possible that the data written may be different from the schema designated.  
However, I think the integration tests should be able to catch any issue due to 
schema mismatch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta 
Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-597457998
 
 
   > One question about using nested schema. Can you remind me what happens if 
someone passes in a nested schema for CsvDeltaStreamer?
   
   I used some code below to test the nested schema for CSV reader in Spark.  
It throws the following exception, which means that Spark CSV source does not 
support nested schema currently.
   
   In most cases, the CSV schemas should be flattened.  It depends on Spark's 
behavior whether nested schema is supported for CSV source (in the future 
nested schema may be supported for CSV).  So we don't enforce the check in our 
Hudi code. 
   
   ```
   org.apache.spark.sql.AnalysisException: CSV data source does not support 
struct data type.;
   
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:69)
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:67)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifySchema(DataSourceUtils.scala:67)
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifyReadSchema(DataSourceUtils.scala:41)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:400)
at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)
at 
org.apache.hudi.utilities.sources.CsvDFSSource.fromFiles(CsvDFSSource.java:120)
at 
org.apache.hudi.utilities.sources.CsvDFSSource.fetchNextBatch(CsvDFSSource.java:93)
at 
org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73)
at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:66)
at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:317)
at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121)
at 
org.apache.hudi.utilities.TestHoodieDeltaStreamer.testCsvDFSSourceWithNestedSchema(TestHoodieDeltaStreamer.java:812)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at 
com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStar

[GitHub] [incubator-hudi] bvaradar commented on issue #1392: [HUDI-689] Change CLI command names to not have overlap

2020-03-10 Thread GitBox
bvaradar commented on issue #1392: [HUDI-689] Change CLI command names to not 
have overlap
URL: https://github.com/apache/incubator-hudi/pull/1392#issuecomment-597434910
 
 
   @nsivabalan : Please review this PR


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1395: [HUDI-667] Fixing delete tests for DeltaStreamer

2020-03-10 Thread GitBox
bvaradar commented on issue #1395: [HUDI-667] Fixing delete tests for 
DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1395#issuecomment-597434678
 
 
   @lamber-ken : Can you review this PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597431353
 
 
   Sorry, I revert it, hudi already handled the empty checkpoint by 
`KafkaOffsetGen.KafkaResetOffsetStrategies`. 
   
   User can decide `EARLIEST` or `LATEST` by using `auto.offset.reset` property.
   
   
![image](https://user-images.githubusercontent.com/20113411/76381177-b10b4380-638f-11ea-8eb4-34542b6a06f3.png)
   
   
![image](https://user-images.githubusercontent.com/20113411/76380883-faa75e80-638e-11ea-8b6e-6eed6ff5aaa2.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597431353
 
 
   Sorry, I revert it, hudi already handled the empty checkpoint by 
`KafkaOffsetGen.KafkaResetOffsetStrategies`
   
   So, IMO, we 
   
   
![image](https://user-images.githubusercontent.com/20113411/76380883-faa75e80-638e-11ea-8b6e-6eed6ff5aaa2.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset 
not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597431353
 
 
   Sorry, I revert it, already handled the empty checkpoint by 
`KafkaOffsetGen.KafkaResetOffsetStrategies`
   
   
![image](https://user-images.githubusercontent.com/20113411/76380883-faa75e80-638e-11ea-8b6e-6eed6ff5aaa2.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-10 Thread GitBox
codecov-io commented on issue #1396: [HUDI-687] Stop incremental reader on RO 
table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-597430876
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr&el=h1) 
Report
   > Merging 
[#1396](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806?src=pr&el=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1396/graphs/tree.svg?width=650&token=VTTXabwbs2&height=150&src=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #1396  +/-   ##
   ===
   + Coverage  67.4%   67.4%   +<.01% 
 Complexity  230 230  
   ===
 Files   336 336  
 Lines 16366   16379  +13 
 Branches   16721673   +1 
   ===
   + Hits  11031   11041  +10 
   - Misses 46024603   +1 
   - Partials733 735   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...a/org/apache/hudi/common/table/HoodieTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | `100% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...oop/realtime/HoodieParquetRealtimeInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVBhcnF1ZXRSZWFsdGltZUlucHV0Rm9ybWF0LmphdmE=)
 | `73.4% <100%> (+0.28%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `93.24% <100%> (+1.05%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh)
 | `81.3% <100%> (+1.96%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `25% <0%> (-50%)` | `0% <0%> (ø)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0%> (-5.56%)` | `0% <0%> (ø)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr&el=footer).
 Last update 
[77d5b92...a023c31](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset 
not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597428982
 
 
   Thanks for reviewing @garyli1019 @vinothchandar. Had updated the pr by 
double check empty checkpoint in `KafkaOffsetGen#checkupValidOffsets`.
   
   > I'd appreciate it if we took into consideration how checkpoint is handled 
in a general source agnostic way and also fix this issue..
   
   This is good idea as bvaradar suggested, but it seems impossible, because 
different data streams handle empty checkpoint in different way.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #213

2020-03-10 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.33 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
or

[GitHub] [incubator-hudi] codecov-io commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
codecov-io commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset 
not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597426610
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr&el=h1) 
Report
   > Merging 
[#1377](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/5f85c267040fd51c186794fdae900162ab176b14?src=pr&el=desc)
 will **decrease** coverage by `66.32%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1377/graphs/tree.svg?width=650&token=VTTXabwbs2&height=150&src=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #1377   +/-   ##
   
   - Coverage 66.96%   0.64%   -66.33% 
   + Complexity  223   2  -221 
   
 Files   334 289   -45 
 Lines 16276   14375 -1901 
 Branches   16611467  -194 
   
   - Hits  10900  92-10808 
   - Misses 4639   14280 +9641 
   + Partials737   3  -734
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/model/HoodieDeltaWriteStat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZURlbHRhV3JpdGVTdGF0LmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...org/apache/hudi/common/model/HoodieFileFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVGb3JtYXQuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[.../common/util/queue/IteratorBasedQueueProducer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvSXRlcmF0b3JCYXNlZFF1ZXVlUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...rg/apache/hudi/index/bloom/KeyRangeLookupTree.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vS2V5UmFuZ2VMb29rdXBUcmVlLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...e/hudi/common/table/timeline/dto/FileGroupDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlR3JvdXBEVE8uamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...apache/hudi/timeline/service/handlers/Handler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvSGFuZGxlci5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[.../common/util/queue/FunctionBasedQueueProducer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvRnVuY3Rpb25CYXNlZFF1ZXVlUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...che/hudi/index/bloom/ListBasedIndexFileFilter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vTGlzdEJhc2VkSW5kZXhGaWxlRmlsdGVyLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | ... and [299 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https:

[GitHub] [incubator-hudi] satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-10 Thread GitBox
satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO 
table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-597421941
 
 
   @bvaradar sorry, I messed up rebase on 
https://github.com/apache/incubator-hudi/pull/1389, Please take a look at this 
instead. As discussed in the other PR, I updated RO and RT views. Spark 
DataSource does not seem to support MOR tables, so i'm skipping that part for 
now. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha opened a new pull request #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-10 Thread GitBox
satishkotha opened a new pull request #1396: [HUDI-687] Stop incremental reader 
on RO table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396
 
 
   ## What is the purpose of the pull request
   example timeline:
   
   t0 -> create bucket1.parquet
   t1 -> create and append updates bucket1.log
   t2 -> request compaction
   t3 -> create bucket2.parquet
   
   if compaction at t2 takes a long time, incremental reads using 
HoodieParquetInputFormat may make progress to read commits at t3 and skip data 
ingested at t1 leading to 'data loss' .(Data will still be on disk, but 
incremental readers wont see it because its in log file and readers move to t3)
   
   To workaround this problem, we want to stop returning data belonging to 
commits > compaction_requested/inprogress_instant. After compaction is 
complete, incremental reader would see updates in t2, t3, so on. Disadvantage 
is that long running compactions can make it look like reader is 'stuck'. But 
that is better than skipping updates.
   
   ## Brief change log
   
   - Change HoodieParquetInputFormat to read commits prior to compaction instant
   - Added unit tests to validate behavior
   - Fix broken test utils for reading records
   
   ## Verify this pull request
   This change added tests and can be verified as follows:
   mvn test (TestMergeOnReadTable and TestHoodieActiveTimeline)
   
   Some discussion is on https://github.com/apache/incubator-hudi/pull/1389, 
sorry I messed up rebase, so resending as a new PR to avoid confusion
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha closed pull request #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
satishkotha closed pull request #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
codecov-io edited a comment on issue #1389: [HUDI-687] Stop incremental reader 
when there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-596951199
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=h1) 
Report
   > Merging 
[#1389](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/415882f9023795994e9cc8a8294909bbec7ab191?src=pr&el=desc)
 will **increase** coverage by `0.21%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1389/graphs/tree.svg?width=650&token=VTTXabwbs2&height=150&src=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #1389  +/-   ##
   ===
   + Coverage 67.19%   67.4%   +0.21% 
   - Complexity  223 230   +7 
   ===
 Files   335 336   +1 
 Lines 16279   16376  +97 
 Branches   16611673  +12 
   ===
   + Hits  10939   11039 +100 
   + Misses 46044602   -2 
   + Partials736 735   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...a/org/apache/hudi/common/table/HoodieTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | `100% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/client/HoodieReadClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVJlYWRDbGllbnQuamF2YQ==)
 | `100% <100%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=)
 | `52.79% <100%> (-0.87%)` | `0 <0> (ø)` | |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `93.24% <100%> (+1.05%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...src/main/java/org/apache/hudi/DataSourceUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9EYXRhU291cmNlVXRpbHMuamF2YQ==)
 | `50.56% <100%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh)
 | `80.99% <100%> (+1.65%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `72.41% <100%> (ø)` | `38 <0> (ø)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0%> (-5.56%)` | `0% <0%> (ø)` | |
   | 
[...a/org/apache/hudi/client/AbstractHoodieClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0Fic3RyYWN0SG9vZGllQ2xpZW50LmphdmE=)
 | `76.31% <0%> (-2.64%)` | `0% <0%> (ø)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0%> (-0.88%)` | `0% <0%> (ø)` | |
   | ... and [4 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.i

[GitHub] [incubator-hudi] codecov-io commented on issue #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

2020-03-10 Thread GitBox
codecov-io commented on issue #1394: [HUDI-656][Performance] Return a dummy 
Spark relation after writing the DataFrame
URL: https://github.com/apache/incubator-hudi/pull/1394#issuecomment-597393691
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr&el=h1) 
Report
   > Merging 
[#1394](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806&el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `75.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1394/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1394  +/-   ##
   
   - Coverage 67.40%   67.38%   -0.02% 
 Complexity  230  230  
   
 Files   336  337   +1 
 Lines 1636616369   +3 
 Branches   1672 1672  
   
   - Hits  1103111030   -1 
   - Misses 4602 4603   +1 
   - Partials733  736   +3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...main/scala/org/apache/hudi/HudiEmptyRelation.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1394/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSHVkaUVtcHR5UmVsYXRpb24uc2NhbGE=)
 | `66.66% <66.66%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1394/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvRGVmYXVsdFNvdXJjZS5zY2FsYQ==)
 | `70.58% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1394/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1394/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr&el=footer).
 Last update 
[77d5b92...4e9198c](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r390695516
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   hi @bvaradar, add `!commitMetadata.getMetadata(CHECKPOINT_KEY).isEmpty()`.
   but if we do that, the application will always throw 
`HoodieDeltaStreamerException`
   
![image](https://user-images.githubusercontent.com/20113411/76372301-ee63d700-6377-11ea-863a-21a99028dc5d.png)
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
satishkotha commented on issue #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-597385302
 
 
   looks like i messed up merging. i'm going to close this one and open a new 
PR. sorry about noise


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan closed pull request #1393: [WIP] Fixing delta streamer tests.

2020-03-10 Thread GitBox
nsivabalan closed pull request #1393: [WIP] Fixing delta streamer tests. 
URL: https://github.com/apache/incubator-hudi/pull/1393
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-667) HoodieTestDataGenerator does not delete keys correctly

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-667:

Labels: pull-request-available  (was: )

> HoodieTestDataGenerator does not delete keys correctly
> --
>
> Key: HUDI-667
> URL: https://issues.apache.org/jira/browse/HUDI-667
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> HoodieTestDataGenerator is used to generate sample data for unit-tests. It 
> allows generating HoodieRecords for insert/update/delete. It maintains the 
> record keys in a HashMap.
> private final Map existingKeys;
> There are two issues in the implementation:
>  # Delete from existingKeys uses KeyPartition rather than Integer keys
>  # Inserting records after deletes is not correctly handled
> The implementation uses the Integer key so that values can be looked up 
> randomly. Assume three values were inserted, then the HashMap will hold:
> 0 -> KeyPartition1
> 1 -> KeyPartition2
> 2 -> KeyPartition3
> Now if we delete KeyPartition2  (generate a random record for deletion), the 
> HashMap will be:
> 0 -> KeyPartition1
> 2 -> KeyPartition3
>  
> Now if we issue a insertBatch() then the insert is 
> existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the 
> KeyPartition3 already in the map rather than actually inserting a new entry 
> in the map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] nsivabalan opened a new pull request #1395: [HUDI-667] Fixing delete tests for DeltaStreamer

2020-03-10 Thread GitBox
nsivabalan opened a new pull request #1395: [HUDI-667] Fixing delete tests for 
DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1395
 
 
   PR fixes a bug in delete record generation for tests in hoodie delta 
streamer. 
   
   ## Brief change log
   - Fixing a bug in delete record generation for tests in hoodie delta streamer
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests. Most tests in 
TestHoodieDeltaStreamer tests deletes. Have to make some fixes to continuous 
tests as part of the bug fix.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-656) Write Performance - Driver spends too much time creating Parquet DataSource after writes

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-656:

Labels: pull-request-available  (was: )

> Write Performance - Driver spends too much time creating Parquet DataSource 
> after writes
> 
>
> Key: HUDI-656
> URL: https://issues.apache.org/jira/browse/HUDI-656
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Performance, Spark Integration
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> h2. Problem Statement
> We have noticed this performance bottleneck at EMR, and it has been reported 
> here as well [https://github.com/apache/incubator-hudi/issues/1371]
> Hudi for writes through DataSource API uses 
> [this|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala#L85]
>  to create the spark relation. Here it uses HoodieSparkSqlWriter to write the 
> dataframe and after it tries to 
> [return|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala#L92]
>  a relation by creating it through parquet data source 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala#L72]
> In the process of creating this parquet data source, Spark creates an 
> *InMemoryFileIndex* 
> [here|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L371]
>  as part of which it performs file listing of the base path. While the 
> listing itself is 
> [parallelized|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala#L289],
>  the filter that we pass which is *HoodieROTablePathFilter* is applied 
> [sequentially|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala#L294]
>  on the driver side on all the 1000s of files returned during listing. This 
> part is not parallelized by spark, and it takes a lot of time probably 
> because of the filters logic. This causes the driver to just spend time 
> filtering. We have seen it take 10-12 minutes to do this process for just 50 
> partitions in S3, and this time is spent after the writing has finished.
> Solving this will significantly reduce the writing time across all sorts of 
> writes. This time is essentially getting wasted, because we do not really 
> have to return a relation after the write. This relation is never really used 
> by Spark either ways 
> [here|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SaveIntoDataSourceCommand.scala#L45]
>  and writing process returns empty set of rows..
> h2. Proposed Solution
> Proposal is to return an Empty Spark relation after the write, which will cut 
> down all this unnecessary time spent to create a parquet relation that never 
> gets used.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] umehrot2 opened a new pull request #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

2020-03-10 Thread GitBox
umehrot2 opened a new pull request #1394: [HUDI-656][Performance] Return a 
dummy Spark relation after writing the DataFrame
URL: https://github.com/apache/incubator-hudi/pull/1394
 
 
   ## What is the purpose of the pull request
   
   This PR fixes the performance issue mentioned in 
https://issues.apache.org/jira/browse/HUDI-656 by returning a dummy Spark 
relation after the write, instead of creating a Parquet data source relation.
   
   ## Brief change log
   
   - Update `DefaultSource.scala` to return a dummy relation after writing the 
data frame
   - Added a dummy relation `HudiEmptyRelation`
   
   ## Verify this pull request
   
   - Manual verification on EMR cluster
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha closed issue #910: hoodie.*.consume.* should be set whitelist in hive-site.xml

2020-03-10 Thread GitBox
bhasudha closed issue #910: hoodie.*.consume.* should be set whitelist in 
hive-site.xml
URL: https://github.com/apache/incubator-hudi/issues/910
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on issue #910: hoodie.*.consume.* should be set whitelist in hive-site.xml

2020-03-10 Thread GitBox
bhasudha commented on issue #910: hoodie.*.consume.* should be set whitelist in 
hive-site.xml
URL: https://github.com/apache/incubator-hudi/issues/910#issuecomment-597340894
 
 
   > @bhasudha : Can you kindly let me know if this is documented so that we 
can close this ticket.
   
   This is not documented yet. I created a jira issue to track this - 
https://issues.apache.org/jira/browse/HUDI-691. Will close the GH issue to 
track this in Jira further.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-691) hoodie.*.consume.* should be set whitelist in hive-site.xml

2020-03-10 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-691:
--

 Summary: hoodie.*.consume.* should be set whitelist in 
hive-site.xml
 Key: HUDI-691
 URL: https://issues.apache.org/jira/browse/HUDI-691
 Project: Apache Hudi (incubating)
  Issue Type: Task
  Components: Docs, newbie
Reporter: Bhavani Sudha
 Fix For: 0.6.0


More details in this GH issue - 
https://github.com/apache/incubator-hudi/issues/910



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-03-10 Thread Jasmine Omeke (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056437#comment-17056437
 ] 

Jasmine Omeke edited comment on HUDI-690 at 3/10/20, 9:42 PM:
--

pinging to triage 

[~vbalaji]


was (Author: jomeke):
 

[~vbalaji]

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Priority: Major
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
> "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
> 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
> 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
> "TOTAL_LOG_FILE_SIZE": 44197.0}})at 
> org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator

[jira] [Commented] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-03-10 Thread Jasmine Omeke (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056437#comment-17056437
 ] 

Jasmine Omeke commented on HUDI-690:


 

[~vbalaji]

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Priority: Major
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
> "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
> 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
> 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
> "TOTAL_LOG_FILE_SIZE": 44197.0}})at 
> org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> java.util.stream.AbstractPipeline.copyInto(Abstr

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390527605
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
 
 Review comment:
   We can move this up


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390471157
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
 
 Review comment:
   Link seems to point to 0.5.1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390505002
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
 
 Review comment:
   Move this down ? Also, maybe add a line to describe an example on how to use 
it ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390622214
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
+ * Client allows to overwrite the payload implementation in 
`hoodie.properties`. Previously, once the payload class is set once in 
`hoodie.properties`, it cannot be changed. In some cases, if a code refactor is 
done and the jar updated, one may need to pass the new payload class name.
 
 Review comment:
   nit: change "Client allows to overwrite the payload implementation in 
`hoodie.properties` " to "Support for overwriting payload implementation in 
`hoodie.properties`.
   
   Also specify, how to do this ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390622859
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
+ * Client allows to overwrite the payload implementation in 
`hoodie.properties`. Previously, once the payload class is set once in 
`hoodie.properties`, it cannot be changed. In some cases, if a code refactor is 
done and the jar updated, one may need to pass the new payload class name.
+ * With 0.5.2, the community has supported to published the coverage to 
codecov.io on every build. With this feature, the community will know the 
change of test coverage more clearly.
 
 Review comment:
   Wondering if this should be part of Release note ? It is not user facing 
right ? It is only interesting for PR submitters. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390621530
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
 
 Review comment:
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390528487
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
 
 Review comment:
   You can group all the CLI related changes together and add sub-bullet points.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390527272
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
 
 Review comment:
   nit: load -> loading 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390527950
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
 
 Review comment:
   Link to any config that needs to setup ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390623457
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
+ * Client allows to overwrite the payload implementation in 
`hoodie.properties`. Previously, once the payload class is set once in 
`hoodie.properties`, it cannot be changed. In some cases, if a code refactor is 
done and the jar updated, one may need to pass the new payload class name.
+ * With 0.5.2, the community has supported to published the coverage to 
codecov.io on every build. With this feature, the community will know the 
change of test coverage more clearly.
+ * A `JdbcbasedSchemaProvider` schema provider has been provided to get 
metadata through JDBC. For the use case that users want to synchronize data 
from MySQL, and at the same time, want to get the schema from the database, 
it's very helpful.
+ * Simplify `HoodieBloomIndex` without the need for 2GB limit handling. Prior 
to spark 2.4.0, each spark partition has a limit of 2GB. In Hudi 0.5.1, after 
we upgraded to spark 2.4.4, we don't have the limitation anymore. Hence 
removing the safe parallelism constraint we had in` HoodieBloomIndex`.
+ * Write Client restructuring has moved classes around 
([HUDI-554](https://issues.apache.org/jira/browse/HUDI-554))
 
 Review comment:
   I think we can skip the refactoring part unless it is user-facing. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta 
Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-597292426
 
 
   Sorry for the delay.  I'll get to this PR this week.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan opened a new pull request #1393: [WIP] Fixing delta streamer tests.

2020-03-10 Thread GitBox
nsivabalan opened a new pull request #1393: [WIP] Fixing delta streamer tests. 
URL: https://github.com/apache/incubator-hudi/pull/1393
 
 
   WIP. Draft PR.
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
prashantwason commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390562273
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -394,8 +394,8 @@ private void testUpsertsContinuousMode(HoodieTableType 
tableType, String tempDir
   } else {
 TestHelpers.assertAtleastNCompactionCommits(5, datasetBasePath, dfs);
   }
-  TestHelpers.assertRecordCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
-  TestHelpers.assertDistanceCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
+  TestHelpers.assertRecordCount(totalRecords + 200, datasetBasePath + 
"/*/*.parquet", sqlContext);
 
 Review comment:
   Thanks Sivabalan for such a quick reply.
   
   I had filed https://issues.apache.org/jira/browse/HUDI-667 to work on this 
fix. You may wish to use it to submit your PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390557620
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -394,8 +394,8 @@ private void testUpsertsContinuousMode(HoodieTableType 
tableType, String tempDir
   } else {
 TestHelpers.assertAtleastNCompactionCommits(5, datasetBasePath, dfs);
   }
-  TestHelpers.assertRecordCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
-  TestHelpers.assertDistanceCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
+  TestHelpers.assertRecordCount(totalRecords + 200, datasetBasePath + 
"/*/*.parquet", sqlContext);
 
 Review comment:
   you might be right. as mentioned in the other thread, I am working on the 
fix. For some reason, my continuous tests times out w/ hitting the expected no 
of commits. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390557169
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -435,11 +439,46 @@ public HoodieRecord generateUpdateRecord(HoodieKey key, 
String commitTime) throw
 index = (index + 1) % numExistingKeys;
 kp = existingKeys.get(index);
   }
+  existingKeys.remove(kp);
 
 Review comment:
   yes, you are right. I figured this recently. working on the fix. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
prashantwason commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390555211
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -394,8 +394,8 @@ private void testUpsertsContinuousMode(HoodieTableType 
tableType, String tempDir
   } else {
 TestHelpers.assertAtleastNCompactionCommits(5, datasetBasePath, dfs);
   }
-  TestHelpers.assertRecordCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
-  TestHelpers.assertDistanceCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
+  TestHelpers.assertRecordCount(totalRecords + 200, datasetBasePath + 
"/*/*.parquet", sqlContext);
 
 Review comment:
   I could not reason why there is +200 here? The inserts/deletes/updates are 
calculated to keep the number of records equal to totalRecords.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
prashantwason commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390553417
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -435,11 +439,46 @@ public HoodieRecord generateUpdateRecord(HoodieKey key, 
String commitTime) throw
 index = (index + 1) % numExistingKeys;
 kp = existingKeys.get(index);
   }
+  existingKeys.remove(kp);
 
 Review comment:
   Shouldn't the remove be with the key rather than the value?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-03-10 Thread Jasmine Omeke (Jira)
Jasmine Omeke created HUDI-690:
--

 Summary: filtercompletedInstants in HudiSnapshotCopier not working 
as expected for MOR tables
 Key: HUDI-690
 URL: https://issues.apache.org/jira/browse/HUDI-690
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: Jasmine Omeke


Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
Backup of merge on read tables: 
[https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]

 

The error:

 
{code:java}
20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
web-proxy.bt.local Proxy Port: 3128
20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
/.hoodie/hoodie.properties
20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
web-proxy.bt.local Proxy Port: 3128
20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
MERGE_ON_READ from 
20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@77f7352a
20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) with 
ID 2
20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has registered 
(new total is 1)
20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
1, None)
20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) with 
ID 4
20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has registered 
(new total is 2)Exception in thread "main" java.lang.IllegalStateException: 
Hudi File Id (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
"deltaFilePaths": 
[".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
 ".7104bb0b-20f6-4dec-981b-c11
bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
 ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
177872977", 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
 "dataFilePath": 
"7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
 "fileId": "7
104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
"created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
"TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
"TOTAL_IO_WRITE_MB": 512.0,
 "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
(20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
[".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
 ".7104bb0b-20f6-4dec-981b-c11bf20ad
e4a-0_20200308180755.log.4_3-727192-165430450", 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
 "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
"partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
"TOTAL_LOG_FILE_SIZE": 44197.0}})at 
org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Write 
release blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390530073
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
 
 Review comment:
   may be expand this a bit more? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Write 
release blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390531386
 
 

 ##
 File path: docs/_pages/releases.md
 ##
 @@ -6,6 +6,31 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
 
 Review comment:
   can we call out user facing changes for upgrading in a `### Migration Guide 
for this release` section right before highlights? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Write 
release blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390529875
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
+ * Client allows to overwrite the payload implementation in 
`hoodie.properties`. Previously, once the payload class is set once in 
`hoodie.properties`, it cannot be changed. In some cases, if a code refactor is 
done and the jar updated, one may need to pass the new payload class name.
+ * With 0.5.2, the community has supported to published the coverage to 
codecov.io on every build. With this feature, the community will know the 
change of test coverage more clearly.
+ * A `JdbcbasedSchemaProvider` schema provider has been provided to get 
metadata through JDBC. For the use case that users want to synchronize data 
from MySQL, and at the same time, want to get the schema from the database, 
it's very helpful.
+ * Simplify `HoodieBloomIndex` without the need for 2GB limit handling. Prior 
to spark 2.4.0, each spark partition has a limit of 2GB. In Hudi 0.5.1, after 
we upgraded to spark 2.4.4, we don't have the limitation anymore. Hence 
removing the safe parallelism constraint we had in` HoodieBloomIndex`.
+ * Write Client restructuring has moved classes around 
([HUDI-554](https://issues.apache.org/jira/browse/HUDI-554))
+   - `client` now has all the various client classes, that do the transaction 
management
 
 Review comment:
   can we remove the bullets and summarize it further?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1392: [HUDI-689] Change CLI command names to not have overlap

2020-03-10 Thread GitBox
codecov-io edited a comment on issue #1392: [HUDI-689] Change CLI command names 
to not have overlap
URL: https://github.com/apache/incubator-hudi/pull/1392#issuecomment-597241101
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=h1) 
Report
   > Merging 
[#1392](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806&el=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1392/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1392  +/-   ##
   
   - Coverage 67.40%   67.37%   -0.03% 
 Complexity  230  230  
   
 Files   336  336  
 Lines 1636616366  
 Branches   1672 1672  
   
   - Hits  1103111027   -4 
 Misses 4602 4602  
   - Partials733  737   +4 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | `72.00% <0.00%> (-4.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=footer).
 Last update 
[77d5b92...c3f11b8](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1392: [HUDI-689] Change CLI command names to not have overlap

2020-03-10 Thread GitBox
codecov-io commented on issue #1392: [HUDI-689] Change CLI command names to not 
have overlap
URL: https://github.com/apache/incubator-hudi/pull/1392#issuecomment-597241101
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=h1) 
Report
   > Merging 
[#1392](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806&el=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1392/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1392  +/-   ##
   
   - Coverage 67.40%   67.37%   -0.03% 
 Complexity  230  230  
   
 Files   336  336  
 Lines 1636616366  
 Branches   1672 1672  
   
   - Hits  1103111027   -4 
 Misses 4602 4602  
   - Partials733  737   +4 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | `72.00% <0.00%> (-4.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=footer).
 Last update 
[77d5b92...c3f11b8](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
satishkotha commented on issue #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-597231598
 
 
   > @satishkotha : I did a quick look. Will do a more comprehensive review 
later.
   > 
   > One quick comment :
   > The code change in HoodieParquetInputFormat also affects 
HoodieRealtimeParquetInputFormat which should not be the case. 
HoodieRealtimeParquetInputFormat should be allowed to read past earliest 
pending compaction instants.
   > 
   > Balaji.V
   
   Ah, didn't see the inheritance. Thanks for context. I'll work on fixing and 
adding unit tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-687) incremental reads on MOR tables using RO view can lead to missing updates

2020-03-10 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-687:

Summary: incremental reads on MOR tables using RO view can lead to missing 
updates  (was: incremental reads on MOR RO tables can lead to data loss)

> incremental reads on MOR tables using RO view can lead to missing updates
> -
>
> Key: HUDI-687
> URL: https://issues.apache.org/jira/browse/HUDI-687
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
bvaradar commented on issue #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-597229288
 
 
   @satishkotha : Also, Spark DataSource for Incremental reads needs to employ 
similar mechanism 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-687) incremental reads on MOR RO tables can lead to data loss

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-687:

Summary: incremental reads on MOR RO tables can lead to data loss  (was: 
incremental reads on MOR tables can lead to data loss)

> incremental reads on MOR RO tables can lead to data loss
> 
>
> Key: HUDI-687
> URL: https://issues.apache.org/jira/browse/HUDI-687
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-687) incremental reads on MOR tables can lead to data loss

2020-03-10 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056186#comment-17056186
 ] 

Balaji Varadarajan commented on HUDI-687:
-

cc [~vinothchandar] 

Just to be really clear, the potential race-condition happens only when doing 
incremental read using RO view (not RT) against MOR table.  In this case, 
Incremental Read will not make progress past the earliest pending compaction 
time to avoid any data-loss.

 

> incremental reads on MOR tables can lead to data loss
> -
>
> Key: HUDI-687
> URL: https://issues.apache.org/jira/browse/HUDI-687
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] satishkotha opened a new pull request #1392: [HUDI-689] Change CLI command names to not have overlap

2020-03-10 Thread GitBox
satishkotha opened a new pull request #1392: [HUDI-689] Change CLI command 
names to not have overlap
URL: https://github.com/apache/incubator-hudi/pull/1392
 
 
   ## What is the purpose of the pull request
   I broke CLI when i added compactions show archived command.
I still dont understand spring shell well enough to explain why the 
existing commands wont work. But the alternative commands I picked seemed to 
work. Please let me know if any of you have seen similar issues with spring 
shell and how command parsing works.
   
   CLI 'commits show archived' fails with
   
   ->commits show archived
   Option '' is not available for this command. Use tab assist or the "help" 
command to see the legal options
   This seems to be because 'compactions show archived'. If i remove 
@CliCommand annotation from compactions, commits show archived works.
   
   ## Brief change log
   
   - Chose different command names to make all commands 'work'
   - change method names to not overlap with each other
   
   
   ## Verify this pull request
   
   Manually verified the change by running CLI
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-689:

Labels: pull-request-available  (was: )

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056166#comment-17056166
 ] 

Balaji Varadarajan commented on HUDI-689:
-

[~satishkotha] : Can you add more context for this ticket so that everybody can 
understand.

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-689:

Status: Open  (was: New)

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-689:

Description: 
CLI 'commits show archived' fails with
{code}
->commits show archived
Option '' is not available for this command. Use tab assist or the "help" 
command to see the legal options
{code}

This seems to be because 'compactions show archived'. If i remove @CliCommand 
annotation from compactions, commits show archived works.  

  was:




> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-689:

Labels:   (was: pull-request-available)

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-689) Fix hudi cli commands with overlap

2020-03-10 Thread satish (Jira)
satish created HUDI-689:
---

 Summary: Fix hudi cli commands with overlap
 Key: HUDI-689
 URL: https://issues.apache.org/jira/browse/HUDI-689
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: satish
Assignee: satish


example timeline:

t0 -> create bucket1.parquet
t1 -> create and append updates bucket1.log
t2 -> request compaction 
t3 -> create bucket2.parquet

if compaction at t2 takes a long time, incremental reads using 
HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
(Data will still be on disk, but incremental readers wont see it because its in 
log file and readers move to t3)

To workaround this problem, we want to stop returning data belonging to commits 
> t1. After compaction is complete, incremental reader would see updates in t2, 
t3, so on.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-689:

Summary: Fix hudi cli commands with overlapping words  (was: Fix hudi cli 
commands with overlap)

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-689:

Description: 



  was:
example timeline:

t0 -> create bucket1.parquet
t1 -> create and append updates bucket1.log
t2 -> request compaction 
t3 -> create bucket2.parquet

if compaction at t2 takes a long time, incremental reads using 
HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
(Data will still be on disk, but incremental readers wont see it because its in 
log file and readers move to t3)

To workaround this problem, we want to stop returning data belonging to commits 
> t1. After compaction is complete, incremental reader would see updates in t2, 
t3, so on.



> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-669) HoodieDeltaStreamer offset not handled correctly when using LATEST offset reset strategy

2020-03-10 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan resolved HUDI-669.
-
Resolution: Duplicate

> HoodieDeltaStreamer offset not handled correctly when using LATEST offset 
> reset strategy
> 
>
> Key: HUDI-669
> URL: https://issues.apache.org/jira/browse/HUDI-669
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Context : [https://github.com/apache/incubator-hudi/issues/1375]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-669) HoodieDeltaStreamer offset not handled correctly when using LATEST offset reset strategy

2020-03-10 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056140#comment-17056140
 ] 

Balaji Varadarajan commented on HUDI-669:
-

THanks [~lamber-ken]. Closing this as duplicate

> HoodieDeltaStreamer offset not handled correctly when using LATEST offset 
> reset strategy
> 
>
> Key: HUDI-669
> URL: https://issues.apache.org/jira/browse/HUDI-669
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Context : [https://github.com/apache/incubator-hudi/issues/1375]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io commented on issue #1391: [HUDI-688] Paring down the NOTICE file to minimum required notices

2020-03-10 Thread GitBox
codecov-io commented on issue #1391: [HUDI-688] Paring down the NOTICE file to 
minimum required notices
URL: https://github.com/apache/incubator-hudi/pull/1391#issuecomment-597203280
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr&el=h1) 
Report
   > Merging 
[#1391](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806&el=desc)
 will **decrease** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1391/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1391  +/-   ##
   
   - Coverage 67.40%   67.39%   -0.01% 
 Complexity  230  230  
   
 Files   336  336  
 Lines 1636616366  
 Branches   1672 1672  
   
   - Hits  1103111030   -1 
 Misses 4602 4602  
   - Partials733  734   +1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1391/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr&el=footer).
 Last update 
[77d5b92...87bb2da](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-662:

Fix Version/s: (was: 0.5.2)

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> [https://www.apache.org/legal/resolved.html] is the comprehensive guide here.
>  [http://www.apache.org/dev/licensing-howto.html] is the comprehensive guide 
> here.'
> [http://www.apache.org/legal/src-headers.html] also 
>  
> Previously, we asked about some specific dependencies here
>  https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-10 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056136#comment-17056136
 ] 

Vinoth Chandar commented on HUDI-662:
-

We can again revisit this if needed.. untagging fix version 

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> [https://www.apache.org/legal/resolved.html] is the comprehensive guide here.
>  [http://www.apache.org/dev/licensing-howto.html] is the comprehensive guide 
> here.'
> [http://www.apache.org/legal/src-headers.html] also 
>  
> Previously, we asked about some specific dependencies here
>  https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r390466238
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   @lamber-ken : Also, Instead of handling the empty checkpoints only for 
kafka, can we handle it generically in DeltaSync 
(https://github.com/apache/incubator-hudi/blob/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L262)
 so that we can have an uniform handling of checkpoints across sources ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar closed issue #1359: [SUPPORT] handle partition value containing colon ?

2020-03-10 Thread GitBox
bvaradar closed issue #1359: [SUPPORT] handle partition value containing colon ?
URL: https://github.com/apache/incubator-hudi/issues/1359
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1359: [SUPPORT] handle partition value containing colon ?

2020-03-10 Thread GitBox
bvaradar commented on issue #1359: [SUPPORT] handle partition value containing 
colon ?
URL: https://github.com/apache/incubator-hudi/issues/1359#issuecomment-597184490
 
 
   Closing this ticket due to inactivity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-688:

Labels: pull-request-available  (was: )

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar opened a new pull request #1391: [HUDI-688] Paring down the NOTICE file to minimum required notices

2020-03-10 Thread GitBox
vinothchandar opened a new pull request #1391: [HUDI-688] Paring down the 
NOTICE file to minimum required notices
URL: https://github.com/apache/incubator-hudi/pull/1391
 
 
- Based on analysis, we don't need to call out anything
- We only do source releases at this time
- Fix typo in LICENSE
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-680) Update Jackson databind to 2.6.7.3

2020-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-680:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Update Jackson databind to 2.6.7.3
> --
>
> Key: HUDI-680
> URL: https://issues.apache.org/jira/browse/HUDI-680
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Aki Tanaka
>Assignee: Aki Tanaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I would like to update Jackson databind to 2.6.7.3. Because this version is 
> the latest jackson-databind of 2.6.7.x line and it has all CVE fixes up to 
> 2.9.10.
> https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.6.7.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] smarthi commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
smarthi commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390284780
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,33 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * Dependency Version Upgrades
+   - Upgrade from Jackson-databind 2.6.7.1 to 2.6.7.3
 
 Review comment:
   Did we cherry-pick this for 0.5.2 release ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua edited a comment on issue #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
yanghua edited a comment on issue #1390: [HUDI-634] Write release blog and 
document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#issuecomment-597037052
 
 
   Please temporarily ignore those links.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
yanghua commented on issue #1390: [HUDI-634] Write release blog and document 
breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#issuecomment-597037052
 
 
   Please temporarily ignore the those links.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-634) Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-634:

Labels: pull-request-available  (was: )

> Write release blog and document breaking changes for 0.5.2 release
> --
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua opened a new pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
yanghua opened a new pull request #1390: [HUDI-634] Write release blog and 
document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390
 
 
   
   
   ## What is the purpose of the pull request
   
   *This pull request writes release blog and document breaking changes for 
0.5.2 release*
   
   ## Brief change log
   
 - *Write release blog and document breaking changes for 0.5.2 release*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-634) Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-634:
--
Summary: Write release blog and document breaking changes for 0.5.2 release 
 (was: Document breaking changes for 0.5.2 release)

> Write release blog and document breaking changes for 0.5.2 release
> --
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 0.5.2
>
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-10 Thread Suneel Marthi (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055810#comment-17055810
 ] 

Suneel Marthi commented on HUDI-688:


We are back to what [~lresende] had suggested what the NOTICE file should have 
- nothing whatsoever.  The confusion stemmed from Justin's unclear comments 
from last release.   Let's pare down the NOTICE file to what it should be. 

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io commented on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
codecov-io commented on issue #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-596951199
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=h1) 
Report
   > Merging 
[#1389](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/f93e64fee413ed1b774156e688794ee7937cc01a?src=pr&el=desc)
 will **increase** coverage by `0.22%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1389/graphs/tree.svg?width=650&token=VTTXabwbs2&height=150&src=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #1389  +/-   ##
   ===
   + Coverage 67.18%   67.4%   +0.22% 
   - Complexity  221 230   +9 
   ===
 Files   335 336   +1 
 Lines 16272   16376 +104 
 Branches   16611673  +12 
   ===
   + Hits  10933   11039 +106 
   + Misses 46044602   -2 
 Partials735 735
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...a/org/apache/hudi/common/table/HoodieTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | `100% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `93.24% <100%> (+1.05%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh)
 | `80.99% <100%> (+1.65%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0%> (-0.88%)` | `0% <0%> (ø)` | |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `39.36% <0%> (ø)` | `7% <0%> (?)` | |
   | 
[...in/java/org/apache/hudi/metrics/HoodieMetrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Ib29kaWVNZXRyaWNzLmphdmE=)
 | `87.5% <0%> (+57.69%)` | `0% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=footer).
 Last update 
[f93e64f...e639491](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services