[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by 
default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368398474
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 ##
 @@ -117,6 +117,13 @@ public HoodieTableMetaClient(Configuration conf, String 
basePath, boolean loadAc
 TableNotFoundException.checkTableValidity(fs, basePathDir, metaPathDir);
 this.tableConfig = new HoodieTableConfig(fs, metaPath, payloadClassName);
 this.tableType = tableConfig.getTableType();
+if (layoutVersion.isPresent()) {
+  // Ensure layout version passed in config is not lower than the one seen 
in hoodie.properties
+  TimelineLayoutVersion tableConfigVersion = 
tableConfig.getTimelineLayoutVersion();
 
 Review comment:
   You are correct, Table Config Version can be null. I have made 
tableConfig.getTimelineLayoutVersion() to return an option instead


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by 
default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368398278
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/ClientUtils.java
 ##
 @@ -37,6 +37,8 @@
   public static HoodieTableMetaClient createMetaClient(JavaSparkContext jsc, 
HoodieWriteConfig config,
   boolean loadActiveTimelineOnLoad) {
 return new HoodieTableMetaClient(jsc.hadoopConfiguration(), 
config.getBasePath(), loadActiveTimelineOnLoad,
-config.getConsistencyGuardConfig(), Option.of(new 
TimelineLayoutVersion(config.getTimelineLayoutVersion(;
+config.getConsistencyGuardConfig(),
+Option.ofNullable((config.getTimelineLayoutVersion() != null)
+? new TimelineLayoutVersion(config.getTimelineLayoutVersion()) : 
null));
 
 Review comment:
   Yeah, this can be simplified. HoodieWriteConfig's TimelineLayoutVersion will 
not be null as it has default. Will change based on that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (HUDI-538) Restructuring hudi client module for multi engine support

2020-01-19 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019265#comment-17019265
 ] 

vinoyang edited comment on HUDI-538 at 1/20/20 7:14 AM:


[~vinoth] OK, another thing we may need to consider. Based on our discussion, 
we agreed on put {{hudi-utilities}} aside. However, for both Flink and Spark, 
they follow {{source -> transform -> sink}} mode. Currently, the sources host 
in {{hudi-utilities}} package and they are not Spark-free. So, it seems we also 
need to consider it. WDYT?


was (Author: yanghua):
[~vinoth] OK, another thing we may need to consider. Based on our discussion, 
we agreed on put {{hudi-utilities}} aside. However, for both Flink and Spark, 
they observe {{source -> transform -> sink}} mode. Currently, the sources host 
in {{hudi-utilities}} package and they are not Spark-free. So, it seems we also 
need to consider it. WDYT?

> Restructuring hudi client module for multi engine support
> -
>
> Key: HUDI-538
> URL: https://issues.apache.org/jira/browse/HUDI-538
> Project: Apache Hudi (incubating)
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: vinoyang
>Priority: Major
>
> Hudi is currently tightly coupled with the Spark framework. It caused the 
> integration with other computing engine more difficult. We plan to decouple 
> it with Spark. This umbrella issue used to track this work.
> Some thoughts wrote here: 
> https://docs.google.com/document/d/1Q9w_4K6xzGbUrtTS0gAlzNYOmRXjzNUdbbe0q59PX9w/edit?usp=sharing
> The feature branch is {{restructure-hudi-client}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-538) Restructuring hudi client module for multi engine support

2020-01-19 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019265#comment-17019265
 ] 

vinoyang commented on HUDI-538:
---

[~vinoth] OK, another thing we may need to consider. Based on our discussion, 
we agreed on put {{hudi-utilities}} aside. However, for both Flink and Spark, 
they observe {{source -> transform -> sink}} mode. Currently, the sources host 
in {{hudi-utilities}} package and they are not Spark-free. So, it seems we also 
need to consider it. WDYT?

> Restructuring hudi client module for multi engine support
> -
>
> Key: HUDI-538
> URL: https://issues.apache.org/jira/browse/HUDI-538
> Project: Apache Hudi (incubating)
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: vinoyang
>Priority: Major
>
> Hudi is currently tightly coupled with the Spark framework. It caused the 
> integration with other computing engine more difficult. We plan to decouple 
> it with Spark. This umbrella issue used to track this work.
> Some thoughts wrote here: 
> https://docs.google.com/document/d/1Q9w_4K6xzGbUrtTS0gAlzNYOmRXjzNUdbbe0q59PX9w/edit?usp=sharing
> The feature branch is {{restructure-hudi-client}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] dengziming edited a comment on issue #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-01-19 Thread GitBox
dengziming edited a comment on issue #1151: [WIP] [HUDI-476] Add hudi-examples 
module
URL: https://github.com/apache/incubator-hudi/pull/1151#issuecomment-576130421
 
 
   @vinothchandar hi, vinoth, I have added the DeltaStreamExample.
   And I run `mvn test -B` successful locally, but the Travis CI build failed 
with a:
   ```
   [ERROR] Failed to execute goal on project hudi-examples: Could not resolve 
dependencies for project org.apache.hudi:hudi-examples:jar:0.5.1-SNAPSHOT: The 
following artifacts could not be resolved: 
org.apache.hudi:hudi-utilities:jar:0.5.1-SNAPSHOT, 
org.apache.hudi:hudi-spark:jar:0.5.1-SNAPSHOT: Failure to find 
org.apache.hudi:hudi-utilities:jar:0.5.1-SNAPSHOT in 
https://oss.sonatype.org/content/repositories/snapshots/ was cached in the 
local repository, resolution will not be reattempted until the update interval 
of sonatype-snapshots has elapsed or updates are forced -> [Help 1]
   ```
   I searched for this error and found it could be solved by deleting the file  
cached in the local repository, but I don't have the privilege, could you help 
me to solve this problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] dengziming commented on issue #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-01-19 Thread GitBox
dengziming commented on issue #1151: [WIP] [HUDI-476] Add hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#issuecomment-576130421
 
 
   @vinothchandar why it 
   I run `mvn test -B` successful locally, but the Travis CI build failed with 
a:
   ```
   [ERROR] Failed to execute goal on project hudi-examples: Could not resolve 
dependencies for project org.apache.hudi:hudi-examples:jar:0.5.1-SNAPSHOT: The 
following artifacts could not be resolved: 
org.apache.hudi:hudi-utilities:jar:0.5.1-SNAPSHOT, 
org.apache.hudi:hudi-spark:jar:0.5.1-SNAPSHOT: Failure to find 
org.apache.hudi:hudi-utilities:jar:0.5.1-SNAPSHOT in 
https://oss.sonatype.org/content/repositories/snapshots/ was cached in the 
local repository, resolution will not be reattempted until the update interval 
of sonatype-snapshots has elapsed or updates are forced -> [Help 1]
   ```
   I searched for this error and found it could be solved by deleting the file  
cached in the local repository, but I don't have the privilege, could you help 
me to solve this problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] dengziming edited a comment on issue #1151: [WIP] [HUDI-476] Add hudi-examples module

2020-01-19 Thread GitBox
dengziming edited a comment on issue #1151: [WIP] [HUDI-476] Add hudi-examples 
module
URL: https://github.com/apache/incubator-hudi/pull/1151#issuecomment-576130421
 
 
   @vinothchandar hi, vinoth, 
   I run `mvn test -B` successful locally, but the Travis CI build failed with 
a:
   ```
   [ERROR] Failed to execute goal on project hudi-examples: Could not resolve 
dependencies for project org.apache.hudi:hudi-examples:jar:0.5.1-SNAPSHOT: The 
following artifacts could not be resolved: 
org.apache.hudi:hudi-utilities:jar:0.5.1-SNAPSHOT, 
org.apache.hudi:hudi-spark:jar:0.5.1-SNAPSHOT: Failure to find 
org.apache.hudi:hudi-utilities:jar:0.5.1-SNAPSHOT in 
https://oss.sonatype.org/content/repositories/snapshots/ was cached in the 
local repository, resolution will not be reattempted until the update interval 
of sonatype-snapshots has elapsed or updates are forced -> [Help 1]
   ```
   I searched for this error and found it could be solved by deleting the file  
cached in the local repository, but I don't have the privilege, could you help 
me to solve this problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make 
sure by default table layout version honors the configuration in 
hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368386515
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
 ##
 @@ -117,6 +117,13 @@ public HoodieTableMetaClient(Configuration conf, String 
basePath, boolean loadAc
 TableNotFoundException.checkTableValidity(fs, basePathDir, metaPathDir);
 this.tableConfig = new HoodieTableConfig(fs, metaPath, payloadClassName);
 this.tableType = tableConfig.getTableType();
+if (layoutVersion.isPresent()) {
+  // Ensure layout version passed in config is not lower than the one seen 
in hoodie.properties
+  TimelineLayoutVersion tableConfigVersion = 
tableConfig.getTimelineLayoutVersion();
 
 Review comment:
   can't this be `null`? if so, would `compareTo` below be still happy?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make 
sure by default table layout version honors the configuration in 
hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368386980
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/client/utils/ClientUtils.java
 ##
 @@ -37,6 +37,8 @@
   public static HoodieTableMetaClient createMetaClient(JavaSparkContext jsc, 
HoodieWriteConfig config,
   boolean loadActiveTimelineOnLoad) {
 return new HoodieTableMetaClient(jsc.hadoopConfiguration(), 
config.getBasePath(), loadActiveTimelineOnLoad,
-config.getConsistencyGuardConfig(), Option.of(new 
TimelineLayoutVersion(config.getTimelineLayoutVersion(;
+config.getConsistencyGuardConfig(),
+Option.ofNullable((config.getTimelineLayoutVersion() != null)
+? new TimelineLayoutVersion(config.getTimelineLayoutVersion()) : 
null));
 
 Review comment:
   can the default in HoodieWriteConfig be `Option.empty` instead of null? 
Then, we can simply make `config.getTimelineLayoutVersion()` return a Option 
directly


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-538) Restructuring hudi client module for multi engine support

2020-01-19 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019249#comment-17019249
 ] 

Vinoth Chandar commented on HUDI-538:
-

>Do you mean the logic in the {{DeltaSync#readFromSource}}? A little bit more 
>specific, do you mean {{KeyGenerator}}?

sort of. We have logic there that constructs a `HoodieRecord` from a Spark 
`Row` or `GenericRecord`. I am saying we should push this further into the 
stack and do this lazily at write/index time as needed.. Alternative is to work 
with a Spark `DataSet` or Flink `DataStream` 
similar to JavaRDD  now... Atleast for Spark,  not sure if any 
one uses anything other than `Row` with DataSet. 

> Restructuring hudi client module for multi engine support
> -
>
> Key: HUDI-538
> URL: https://issues.apache.org/jira/browse/HUDI-538
> Project: Apache Hudi (incubating)
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: vinoyang
>Priority: Major
>
> Hudi is currently tightly coupled with the Spark framework. It caused the 
> integration with other computing engine more difficult. We plan to decouple 
> it with Spark. This umbrella issue used to track this work.
> Some thoughts wrote here: 
> https://docs.google.com/document/d/1Q9w_4K6xzGbUrtTS0gAlzNYOmRXjzNUdbbe0q59PX9w/edit?usp=sharing
> The feature branch is {{restructure-hudi-client}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-538) Restructuring hudi client module for multi engine support

2020-01-19 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019243#comment-17019243
 ] 

vinoyang commented on HUDI-538:
---

[~vinoth] Agree on both two core issues. About the second issue, especially on 
this sentence:

bq. We need a way to do these lazily by pushing the key extraction function 
into the entire writing path.

Do you mean the logic in the {{DeltaSync#readFromSource}}? A little bit more 
specific, do you mean {{KeyGenerator}}?



> Restructuring hudi client module for multi engine support
> -
>
> Key: HUDI-538
> URL: https://issues.apache.org/jira/browse/HUDI-538
> Project: Apache Hudi (incubating)
>  Issue Type: Wish
>  Components: Code Cleanup
>Reporter: vinoyang
>Priority: Major
>
> Hudi is currently tightly coupled with the Spark framework. It caused the 
> integration with other computing engine more difficult. We plan to decouple 
> it with Spark. This umbrella issue used to track this work.
> Some thoughts wrote here: 
> https://docs.google.com/document/d/1Q9w_4K6xzGbUrtTS0gAlzNYOmRXjzNUdbbe0q59PX9w/edit?usp=sharing
> The feature branch is {{restructure-hudi-client}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1245: [MINOR] Replace Collection.size > 0 with Collection.isEmpty()

2020-01-19 Thread GitBox
lamber-ken edited a comment on issue #1245: [MINOR] Replace Collection.size > 0 
with Collection.isEmpty()
URL: https://github.com/apache/incubator-hudi/pull/1245#issuecomment-576026737
 
 
   hi @vinothchandar @smarthi thanks for review this pr.
   
   Here are my thoughts:
   1, It's right that most collections use `size()==0` inside their `isEmpty()` 
method, but that doesn't mean that they all do, for example 
`java.util.concurrent.ConcurrentSkipListSet`. 
   The [ConcurrentSkipListSet 
documentation](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentSkipListSet.html)
 says:
   ```
   Beware that, unlike in most collections, the size method is not a 
constant-time operation.
   ```
   It means that `.size()` can be O(1) or O(N), depending on the data 
structure; `.isEmpty()` is never O(N).
   
   
   
   2, IDE always prompts me to make changes.
   
   
![image](https://user-images.githubusercontent.com/20113411/72685195-af0ccd00-3b22-11ea-8e11-24051be6e26f.png)
   
   
   
   3, In hudi project, some codes use `isEmpty()`, some codes `size() > 0`
   
   
![image](https://user-images.githubusercontent.com/20113411/72685273-715c7400-3b23-11ea-9fb8-090c81f469d6.png)
   
   
   
   4, `isEmpty()` is a clearer definition of what it is we actually care about, 
IMO, it's more understandable.
   
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1250: [HUDI-557] Additional work for supporting multiple version docs

2020-01-19 Thread GitBox
lamber-ken commented on issue #1250: [HUDI-557] Additional work for supporting 
multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576113149
 
 
   > Thanks @lamber-ken !
   
   You're welcome. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch asf-site updated: [HUDI-557] Additional work for supporting multiple version docs (#1250)

2020-01-19 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 4c3cf71  [HUDI-557] Additional work for supporting multiple version 
docs (#1250)
4c3cf71 is described below

commit 4c3cf71bf57aa0523bc06239904d9cc5fd9786fd
Author: lamber-ken 
AuthorDate: Mon Jan 20 13:06:28 2020 +0800

[HUDI-557] Additional work for supporting multiple version docs (#1250)
---
 content/404.html |  7 
 content/activity.html|  7 
 content/asf.html |  7 
 content/assets/css/main.css  |  2 +-
 content/assets/js/lunr/lunr-store.js |  8 ++---
 content/cn/activity.html |  7 
 content/cn/community.html|  7 
 content/cn/contributing.html |  7 
 content/cn/docs/0.5.0-admin_guide.html   | 39 +
 content/cn/docs/0.5.0-comparison.html| 39 +
 content/cn/docs/0.5.0-concepts.html  | 39 +
 content/cn/docs/0.5.0-configurations.html| 39 +
 content/cn/docs/0.5.0-docker_demo.html   | 39 +
 content/cn/docs/0.5.0-docs-versions.html | 51 ++--
 content/cn/docs/0.5.0-gcs_hoodie.html| 39 +
 content/cn/docs/0.5.0-migration_guide.html   | 39 +
 content/cn/docs/0.5.0-performance.html   | 39 +
 content/cn/docs/0.5.0-powered_by.html| 39 +
 content/cn/docs/0.5.0-privacy.html   | 39 +
 content/cn/docs/0.5.0-querying_data.html | 39 +
 content/cn/docs/0.5.0-quick-start-guide.html | 39 +
 content/cn/docs/0.5.0-s3_hoodie.html | 39 +
 content/cn/docs/0.5.0-use_cases.html | 39 +
 content/cn/docs/0.5.0-writing_data.html  | 39 +
 content/cn/docs/admin_guide.html |  7 
 content/cn/docs/comparison.html  |  7 
 content/cn/docs/concepts.html|  7 
 content/cn/docs/configurations.html  |  7 
 content/cn/docs/docker_demo.html |  7 
 content/cn/docs/docs-versions.html   | 19 +--
 content/cn/docs/gcs_hoodie.html  |  7 
 content/cn/docs/migration_guide.html |  7 
 content/cn/docs/performance.html |  7 
 content/cn/docs/powered_by.html  |  7 
 content/cn/docs/privacy.html |  7 
 content/cn/docs/querying_data.html   |  7 
 content/cn/docs/quick-start-guide.html   |  7 
 content/cn/docs/s3_hoodie.html   |  7 
 content/cn/docs/use_cases.html   |  7 
 content/cn/docs/writing_data.html|  7 
 content/cn/releases.html |  7 
 content/community.html   |  7 
 content/contributing.html|  7 
 content/docs/0.5.0-admin_guide.html  | 11 ++
 content/docs/0.5.0-comparison.html   | 11 ++
 content/docs/0.5.0-concepts.html | 11 ++
 content/docs/0.5.0-configurations.html   | 11 ++
 content/docs/0.5.0-docker_demo.html  | 11 ++
 content/docs/0.5.0-docs-versions.html| 23 ++---
 content/docs/0.5.0-gcs_hoodie.html   | 11 ++
 content/docs/0.5.0-migration_guide.html  | 11 ++
 content/docs/0.5.0-performance.html  | 11 ++
 content/docs/0.5.0-powered_by.html   | 11 ++
 content/docs/0.5.0-privacy.html  | 11 ++
 content/docs/0.5.0-querying_data.html| 11 ++
 content/docs/0.5.0-quick-start-guide.html| 11 ++
 content/docs/0.5.0-s3_hoodie.html| 11 ++
 content/docs/0.5.0-structure.html| 11 ++
 content/docs/0.5.0-use_cases.html| 11 ++
 content/docs/0.5.0-writing_data.html | 11 ++
 content/docs/admin_guide.html|  7 
 content/docs/comparison.html |  7 
 content/docs/concepts.html   |  7 
 content/docs/configurations.html |  7 
 content/docs/docker_demo.html|  7 
 content/docs/docs-versions.html  | 19 +--
 content/docs/gcs_hoodie.html |  7 
 content/docs/migration_guide.html|  7 
 content/docs/performance.html|  7 
 content/docs/powered_by.html |  7 
 content/docs/privacy.html|  7 
 content/docs/querying_data.html  |  7 
 content/docs/quick-start-guide.html  |  7 
 content/docs

[GitHub] [incubator-hudi] vinothchandar merged pull request #1250: [HUDI-557] Additional work for supporting multiple version docs

2020-01-19 Thread GitBox
vinothchandar merged pull request #1250: [HUDI-557] Additional work for 
supporting multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1250: [HUDI-557] Additional work for supporting multiple version docs

2020-01-19 Thread GitBox
vinothchandar commented on issue #1250: [HUDI-557] Additional work for 
supporting multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576107876
 
 
   Thanks @lamber-ken ! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make 
sure by default table layout version honors the configuration in 
hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368363395
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+return props.containsKey(TIMELINE_LAYOUT_VERSION)
+? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   Good point.. In an ideal world, `hoodie.properties` is the source of truth 
and once you set this version there, both timeline writers and readers respect 
that.. but the issue here seems to be that we want to change this to say 
VERSION_1 even for older tables and rely on the fact that null version and 
version_1 both can be read by older readers.. 
   
   > If we keep version_0 as default, it would override the version even for 
new tables which has Version_1 in hoodie.properties
   
   I'd imagine this will happen only if at some point, the user set the value 
to `version_1` and then switched back to default? This can always happen right, 
like the user going back to the previous release.. ? 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make 
sure by default table layout version honors the configuration in 
hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368363395
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+return props.containsKey(TIMELINE_LAYOUT_VERSION)
+? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   Good point.. In an ideal world, `hoodie.properties` is the source of truth 
and once you set this version there, both timeline writers and readers respect 
that.. but the issue here seems to be that we want to change this to say 
VERSION_1 even for older tables and rely on the fact that null version and 
version_1 both can be read by older readers.. 
   
   > If we keep version_0 as default, it would override the version even for 
new tables which has Version_1 in hoodie.properties
   
   I'd imagine this will happen only if at some point, the user set the value 
to `version_1` and then switched back to default? This can always happen right, 
like the user going back to the previous release.. ? 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make 
sure by default table layout version honors the configuration in 
hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368363395
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+return props.containsKey(TIMELINE_LAYOUT_VERSION)
+? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   Good point.. In an ideal world, `hoodie.properties` is the source of truth 
and once you set this version there, both timeline writers and readers respect 
that.. but the issue here seems to be that we want to change this to say 
VERSION_1 even for older tables and rely on the fact that null version and 
version_1 both can be read by older readers.. 
   
   > If we keep version_0 as default, it would override the version even for 
new tables which has Version_1 in hoodie.properties
   I'd imagine this will happen only if at some point, the user set the value 
to `version_1` and then switched back to default? This can always happen right, 
like the user going back to the previous release.. ? 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1250: [HUDI-557] Additional work for supporting multiple version docs

2020-01-19 Thread GitBox
lamber-ken commented on issue #1250: [HUDI-557] Additional work for supporting 
multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576098873
 
 
   > > I think we need to told user which version of docs they are visiting
   > 
   > this is valid. I still think we need a different mechanism than a banner 
for this. We can check out other sites for inspiration
   > 
   > > guide users to use the latest version as much as possible.
   > 
   > Sure. but we should not be doing this via a banner IMO.. Lets get rid of 
the banner and we can land this?
   
   Got it, I removed the banner currently. When I find a better way, I will 
talk with you :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make 
sure by default table layout version honors the configuration in 
hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368363395
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+return props.containsKey(TIMELINE_LAYOUT_VERSION)
+? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   Main issue here seems to be that, we want this config to be respected at a 
write level, even when the value is not in `hoodie.properties`, for e.g support 
timeline version =1 even for older tables. `hoodie.properties` is involved 
here, purely because the query (any timeline reader) needs to know how to read 
the timeline correct? 
   
   code style aside, I feel this is still not very clear to me.. :( 
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
bvaradar commented on a change in pull request #1255: [HUDI-559] : Make sure by 
default table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368362263
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+return props.containsKey(TIMELINE_LAYOUT_VERSION)
+? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   @vinothchandar : Unlike other configs, this is a config defined at 2 levels 
- hoodie.properties and in HoodieWriteConfig. If we actually keep a non-null 
value in config, the semantics is that it would override the one in 
hoodie.properties. If we keep version_0 as default, it would override the 
version even for new tables which has Version_1 in hoodie.properties ?   Let me 
know if I am missing something. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #1250: [HUDI-557] Additional work for supporting multiple version docs

2020-01-19 Thread GitBox
vinothchandar edited a comment on issue #1250: [HUDI-557] Additional work for 
supporting multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576096431
 
 
   >I think we need to told user which version of docs they are visiting
   
   this is valid. I still think we need a different mechanism than a banner for 
this. We can check out other sites for inspiration 
   
   >  guide users to use the latest version as much as possible.
   
   Sure. but we should not be doing this via a banner IMO.. Lets get rid of the 
banner and we can land this? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1250: [HUDI-557] Additional work for supporting multiple version docs

2020-01-19 Thread GitBox
vinothchandar commented on issue #1250: [HUDI-557] Additional work for 
supporting multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576096431
 
 
   >I think we need to told user which version of docs they are visiting
   this is valid. I still think we need a different mechanism than a banner for 
this. We can check out other sites for inspiration 
   
   >  guide users to use the latest version as much as possible.
   Sure. but we should not be doing this via a banner IMO.. Lets get rid of the 
banner and we can land this? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #164

2020-01-19 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.01 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4:
bin
boot
conf
lib
LICENSE
NOTICE
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/bin:
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark_2.11[jar]
[INFO] hudi-utilities_2.11[jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle_2.11 [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle_2.11

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make 
sure by default table layout version honors the configuration in 
hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368356422
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+return props.containsKey(TIMELINE_LAYOUT_VERSION)
+? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   and can't we do this using the regular defaults way? why the special 
handling for containsKey etc? `props.getProperty(k, default)`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
vinothchandar commented on a change in pull request #1255: [HUDI-559] : Make 
sure by default table layout version honors the configuration in 
hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255#discussion_r368356051
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 ##
 @@ -145,7 +145,8 @@ public Boolean shouldAssumeDatePartitioning() {
   }
 
   public Integer getTimelineLayoutVersion() {
-return Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION));
+return props.containsKey(TIMELINE_LAYOUT_VERSION)
+? Integer.parseInt(props.getProperty(TIMELINE_LAYOUT_VERSION)) : null;
 
 Review comment:
   should we just use a VERSION_0 instead of `null`? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] liujianhuiouc commented on a change in pull request #1216: [HUDI-525] lack of insert info in delta_commit inflight

2020-01-19 Thread GitBox
liujianhuiouc commented on a change in pull request #1216: [HUDI-525] lack of 
insert info in delta_commit inflight
URL: https://github.com/apache/incubator-hudi/pull/1216#discussion_r368354556
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##
 @@ -428,6 +428,12 @@ private void 
saveWorkloadProfileMetadataToInflight(WorkloadProfile profile, Hood
   HoodieCommitMetadata metadata = new HoodieCommitMetadata();
   profile.getPartitionPaths().forEach(path -> {
 WorkloadStat partitionStat = profile.getWorkloadStat(path.toString());
+HoodieWriteStat insertStat = new HoodieWriteStat();
+insertStat.setNumInserts(partitionStat.getNumInserts());
+insertStat.setFileId("");
 
 Review comment:
   unset will make test fail , that cause NPE


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-559) Make the timeline layout version default to be null version

2020-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-559:

Labels: pull-request-available  (was: )

> Make the timeline layout version default to be null version
> ---
>
> Key: HUDI-559
> URL: https://issues.apache.org/jira/browse/HUDI-559
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>
> We mulled and seems to be safer to turn this on as needed in next release.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar opened a new pull request #1255: [HUDI-559] : Make sure by default table layout version honors the configuration in hoodie.properties

2020-01-19 Thread GitBox
bvaradar opened a new pull request #1255: [HUDI-559] : Make sure by default 
table layout version honors the configuration in hoodie.properties
URL: https://github.com/apache/incubator-hudi/pull/1255
 
 
   
   ## What is the purpose of the pull request
   
   Make sure by default table layout version honors the configuration in 
hoodie.properties
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-556) Check if any license attribution is needed for PR 1233

2020-01-19 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-556.

Resolution: Fixed

Fixed via master: 7087e7d7668e39e5c0ce98c965750d228481748f

> Check if any license attribution is needed for PR 1233
> --
>
> Key: HUDI-556
> URL: https://issues.apache.org/jira/browse/HUDI-556
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We need to resolve this before we prepare a vote



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1250: [HUDI-557] Additional work for supporting multiple version docs

2020-01-19 Thread GitBox
lamber-ken edited a comment on issue #1250: [HUDI-557] Additional work for 
supporting multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576068152
 
 
   Hi @vinothchandar, how about this style :)
   
   
![image](https://user-images.githubusercontent.com/20113411/72691677-3168b180-3b62-11ea-9b5d-63e11bc234ed.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-69) Support realtime view in Spark datasource #136

2020-01-19 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-69:
---
Fix Version/s: 0.6.0

> Support realtime view in Spark datasource #136
> --
>
> Key: HUDI-69
> URL: https://issues.apache.org/jira/browse/HUDI-69
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>
> https://github.com/uber/hudi/issues/136



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-01-19 Thread GitBox
yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta 
Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-576069548
 
 
   @bvaradar @leesf Could any of you review this PR by EOD?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-01-19 Thread GitBox
yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta 
Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-576069136
 
 
   @vinothchandar From my side, the code change is ready.  I'm not sure if it 
can be reviewed and merged in time.  I'm fine with pushing this to v0.6.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1250: [HUDI-557] Additional work for supporting multiple version docs

2020-01-19 Thread GitBox
lamber-ken commented on issue #1250: [HUDI-557] Additional work for supporting 
multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576068152
 
 
   How about this style :)
   
![image](https://user-images.githubusercontent.com/20113411/72691677-3168b180-3b62-11ea-9b5d-63e11bc234ed.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1250: [HUDI-557] Additional job for supporting multiple version docs

2020-01-19 Thread GitBox
lamber-ken edited a comment on issue #1250: [HUDI-557] Additional job for 
supporting multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576062938
 
 
   > This LGTM overall.. have you tested this with the current site? should we 
also refresh the content
   
   Yes, I generate the site and sync to 
https://lamber-ken.github.io/docs/0.5.0-quick-start-guide.html 
   
   When we reach an agreement, I will generate the content. It is easy to 
review the pr currently. 😄 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1250: [HUDI-557] Additional job for supporting multiple version docs

2020-01-19 Thread GitBox
lamber-ken edited a comment on issue #1250: [HUDI-557] Additional job for 
supporting multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576062938
 
 
   > This LGTM overall.. have you tested this with the current site? should we 
also refresh the content
   
   Yes, I generate the site and sync to 
https://lamber-ken.github.io/docs/0.5.0-quick-start-guide.html 
   
   When we reach an agreement, I will generate the content. 😄 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1250: [HUDI-557] Additional job for supporting multiple version docs

2020-01-19 Thread GitBox
lamber-ken commented on issue #1250: [HUDI-557] Additional job for supporting 
multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576062938
 
 
   > This LGTM overall.. have you tested this with the current site? should we 
also refresh the content
   
   Yes, I generate the site and sync to 
https://lamber-ken.github.io/docs/0.5.0-quick-start-guide.html 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1250: [HUDI-557] Additional job for supporting multiple version docs

2020-01-19 Thread GitBox
lamber-ken commented on issue #1250: [HUDI-557] Additional job for supporting 
multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#issuecomment-576062580
 
 
   > Can we remove the "out of date" warning.. I feel its not necessary.. And 
also depending on situation, using latest may not be the right thing always.. 
for e.g if the user is still on 2.2
   
   I think we need to told user which version of docs they are visiting, and 
guide users to use the latest version as much as possible.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1250: [HUDI-557] Additional job for supporting multiple version docs

2020-01-19 Thread GitBox
lamber-ken commented on a change in pull request #1250: [HUDI-557] Additional 
job for supporting multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#discussion_r368334483
 
 

 ##
 File path: docs/_docs/3_2_docs_versions.md
 ##
 @@ -9,7 +9,7 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 
   {% for d in site.previous_docs %}
 
-{{ d.version }}
+ {{ d.version 
}}
 
 Review comment:
   From my side, the versions seems stick together.
   
   
![image](https://user-images.githubusercontent.com/20113411/72690729-23af2e00-3b5a-11ea-8780-3a31f467aa4d.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1250: [HUDI-557] Additional job for supporting multiple version docs

2020-01-19 Thread GitBox
vinothchandar commented on a change in pull request #1250: [HUDI-557] 
Additional job for supporting multiple version docs
URL: https://github.com/apache/incubator-hudi/pull/1250#discussion_r368333402
 
 

 ##
 File path: docs/_docs/3_2_docs_versions.md
 ##
 @@ -9,7 +9,7 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 
   {% for d in site.previous_docs %}
 
-{{ d.version }}
+ {{ d.version 
}}
 
 Review comment:
   why this change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #757: spark-hoodie-bundle using hive-serde to sync hive table(Hive2.3.5)

2020-01-19 Thread GitBox
vinothchandar commented on issue #757: spark-hoodie-bundle using hive-serde to 
sync hive table(Hive2.3.5)
URL: https://github.com/apache/incubator-hudi/issues/757#issuecomment-576060622
 
 
   @duongnt could you open a new one with details for your environment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-01-19 Thread GitBox
vinothchandar commented on issue #1165: [HUDI-76] Add CSV Source support for 
Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-576060472
 
 
   @yihua are you targeting this for the next release still


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-559) Make the timeline layout version default to be null version

2020-01-19 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-559:

Status: Open  (was: New)

> Make the timeline layout version default to be null version
> ---
>
> Key: HUDI-559
> URL: https://issues.apache.org/jira/browse/HUDI-559
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Blocker
> Fix For: 0.5.1
>
>
> We mulled and seems to be safer to turn this on as needed in next release.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-559) Make the timeline layout version default to be null version

2020-01-19 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-559:
---

 Summary: Make the timeline layout version default to be null 
version
 Key: HUDI-559
 URL: https://issues.apache.org/jira/browse/HUDI-559
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Writer Core
Reporter: Vinoth Chandar
Assignee: Balaji Varadarajan
 Fix For: 0.5.1


We mulled and seems to be safer to turn this on as needed in next release.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-543) Carefully draft release notes for 0.5.1 with all breaking/user impacting changes

2020-01-19 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-543:

Description: 
Call out all breaking changes : 
 * Spark 2.4 support drop, avro version change etc.  "Hudi 0.5.1+ above needs 
Spark 2.4+"
 * Need for shading custom Payloads 
 * --packages for spark-shell 
 * key generator changes 
 * _ro suffix for read optimized views.. 
 * Delta streamer command line changes
 * Scala version changes.. packages names now have _2.11

 

Also need to call out major release highlights (quoting docs/blogs as available)
 * better delete support
 * dynamic bloom filters
 * DMS support

 

 

I am also linking the different jiras as subtaks

  was:
Call out all breaking changes : 
 * Spark 2.4 support drop, avro version change etc.  "Hudi 0.5.1+ above needs 
Spark 2.4+"
 * Need for shading custom Payloads 
 * --packages for spark-shell 
 * key generator changes 
 * _ro suffix for read optimized views.. 

 

Also need to call out major release highlights (quoting docs/blogs as available)
 * better delete support
 * dynamic bloom filters
 * DMS support

 

 

I am also linking the different jiras as subtaks


> Carefully draft release notes for 0.5.1 with all breaking/user impacting 
> changes
> 
>
> Key: HUDI-543
> URL: https://issues.apache.org/jira/browse/HUDI-543
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Call out all breaking changes : 
>  * Spark 2.4 support drop, avro version change etc.  "Hudi 0.5.1+ above needs 
> Spark 2.4+"
>  * Need for shading custom Payloads 
>  * --packages for spark-shell 
>  * key generator changes 
>  * _ro suffix for read optimized views.. 
>  * Delta streamer command line changes
>  * Scala version changes.. packages names now have _2.11
>  
> Also need to call out major release highlights (quoting docs/blogs as 
> available)
>  * better delete support
>  * dynamic bloom filters
>  * DMS support
>  
>  
> I am also linking the different jiras as subtaks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-543) Carefully draft release notes for 0.5.1 with all breaking/user impacting changes

2020-01-19 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-543:

Description: 
Call out all breaking changes : 
 * Spark 2.4 support drop, avro version change etc.  "Hudi 0.5.1+ above needs 
Spark 2.4+"
 * Need for shading custom Payloads 
 * --packages for spark-shell 
 * key generator changes 
 * _ro suffix for read optimized views.. 

 

Also need to call out major release highlights (quoting docs/blogs as available)
 * better delete support
 * dynamic bloom filters
 * DMS support

 

 

I am also linking the different jiras as subtaks

  was:
Call out all breaking changs : 
 * Spark 2.4 support drop, avro version change etc.
 * Need for shading custom Payloads 
 * --packages for spark-shell 
 * key generator changes 
 * _ro suffix for read optimized views.. 

 

Also need to call out major release highlights (quoting docs/blogs as available)
 * better delete support
 * dynamic bloom filters
 * DMS support

 

 

I am also linking the different jiras as subtaks


> Carefully draft release notes for 0.5.1 with all breaking/user impacting 
> changes
> 
>
> Key: HUDI-543
> URL: https://issues.apache.org/jira/browse/HUDI-543
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Call out all breaking changes : 
>  * Spark 2.4 support drop, avro version change etc.  "Hudi 0.5.1+ above needs 
> Spark 2.4+"
>  * Need for shading custom Payloads 
>  * --packages for spark-shell 
>  * key generator changes 
>  * _ro suffix for read optimized views.. 
>  
> Also need to call out major release highlights (quoting docs/blogs as 
> available)
>  * better delete support
>  * dynamic bloom filters
>  * DMS support
>  
>  
> I am also linking the different jiras as subtaks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-519) Document the need for Avro dependency shading/relocation for custom payloads, need for spark-avro

2020-01-19 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019064#comment-17019064
 ] 

Vinoth Chandar commented on HUDI-519:
-

[~uditme] are you able to do this? if not, will find a owner .. :) . let me know

> Document the need for Avro dependency shading/relocation for custom payloads, 
> need for spark-avro
> -
>
> Key: HUDI-519
> URL: https://issues.apache.org/jira/browse/HUDI-519
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs, Usability
>Reporter: Udit Mehrotra
>Priority: Major
> Fix For: 0.5.1
>
>
> In [https://github.com/apache/incubator-hudi/pull/1005] we are migrating Hudi 
> to Spark 2.4.4. As part of this migration, we also had to migrate Hudi to use 
> Avro 1.8.2 (required by spark), while Hive still uses older version of Avro.
> This has resulted in the need to shade Avro in *hadoop-mr-bundle*. This has 
> implications on users of Hudi, who implement custom record payloads. They 
> would have start shading Avro in there custom jars, similar to how it shaded 
> in *hadoop-mr-bundle*.
> This Jira is to track the documentation of this caveat in release notes, and 
> if needed at other places like website etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-558) Introduce ability to compress bloom filters while storing in parquet

2020-01-19 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-558:

Status: Open  (was: New)

> Introduce ability to compress bloom filters while storing in parquet
> 
>
> Key: HUDI-558
> URL: https://issues.apache.org/jira/browse/HUDI-558
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Based on performance study 
> [https://docs.google.com/spreadsheets/d/1KCmmdgaFTWBmpOk9trePdQ2m6wPVj2G328fTcRnQP1M/edit?usp=sharing]
>  we found that there is benefit in compressing bloom filters when storing in 
> parquet. As this is an experimental feature, we will need to disable this 
> feature by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1254: mvn clean package -DskipTests -DskipITs -Dhadoop.version=2.4.0 -Dhive.version=2.1.1

2020-01-19 Thread GitBox
vinothchandar commented on issue #1254: mvn clean package -DskipTests -DskipITs 
-Dhadoop.version=2.4.0  -Dhive.version=2.1.1  
URL: https://github.com/apache/incubator-hudi/issues/1254#issuecomment-576058818
 
 
   @rongyousu 2.4.0 is a really old hadoop version and the Hudi code may be 
using newer APIs and hadoop does not always provide backwards compatible ways 
for this.. Are you able to try a higher version like 2.6/2.7? even spark may 
not work at these  old hadoop versions


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1245: [MINOR] Replace Collection.size > 0 with Collection.isEmpty()

2020-01-19 Thread GitBox
lamber-ken commented on issue #1245: [MINOR] Replace Collection.size > 0 with 
Collection.isEmpty()
URL: https://github.com/apache/incubator-hudi/pull/1245#issuecomment-576026737
 
 
   hi @vinothchandar @smarthi thanks for review this pr.
   
   Here are my thoughts:
   1, It's right that some collections use `size()==0` inside their `isEmpty()` 
method, but that doesn't mean that they all do, for example 
`java.util.concurrent.ConcurrentSkipListSet`. 
   The [ConcurrentSkipListSet 
documentation](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentSkipListSet.html)
 says:
   ```
   Beware that, unlike in most collections, the size method is not a 
constant-time operation.
   ```
   It means that `.size()` can be O(1) or O(N), depending on the data 
structure; `.isEmpty()` is never O(N).
   
   
   
   2, IDE always prompts me to make changes.
   
   
![image](https://user-images.githubusercontent.com/20113411/72685195-af0ccd00-3b22-11ea-8e11-24051be6e26f.png)
   
   
   
   3, In hudi project, some codes use `isEmpty()`, some codes `size() > 0`
   
   
![image](https://user-images.githubusercontent.com/20113411/72685273-715c7400-3b23-11ea-9fb8-090c81f469d6.png)
   
   
   
   4, `isEmpty()` is a clearer definition of what it is we actually care about, 
IMO, it's more understandable.
   
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (9489d0f -> 7087e7d)

2020-01-19 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 9489d0f  [HUDI-551] Abstract a test case class for DFS Source to make 
it extensible (#1239)
 add 7087e7d  [HUDI-556] Add lisence for PR#1233

No new revisions were added by this update.

Summary of changes:
 LICENSE | 10 ++
 1 file changed, 10 insertions(+)



[GitHub] [incubator-hudi] bvaradar merged pull request #1252: [HUDI-556] Add license for PR#1233

2020-01-19 Thread GitBox
bvaradar merged pull request #1252: [HUDI-556] Add license for PR#1233
URL: https://github.com/apache/incubator-hudi/pull/1252
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] rongyousu commented on issue #1254: mvn clean package -DskipTests -DskipITs -Dhadoop.version=2.4.0 -Dhive.version=2.1.1

2020-01-19 Thread GitBox
rongyousu commented on issue #1254: mvn clean package -DskipTests -DskipITs 
-Dhadoop.version=2.4.0  -Dhive.version=2.1.1  
URL: https://github.com/apache/incubator-hudi/issues/1254#issuecomment-576000413
 
 
   why the hadoop=2.4.0  can't  mvn ,how can i do ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] rongyousu opened a new issue #1254: mvn clean package -DskipTests -DskipITs -Dhadoop.version=2.4.0 -Dhive.version=2.1.1

2020-01-19 Thread GitBox
rongyousu opened a new issue #1254: mvn clean package -DskipTests -DskipITs 
-Dhadoop.version=2.4.0  -Dhive.version=2.1.1  
URL: https://github.com/apache/incubator-hudi/issues/1254
 
 
   [INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @ 
hudi-common ---
   [INFO] Changes detected - recompiling the module!
   [INFO] Compiling 178 source files to 
/tol/app/hudi/incubator-hudi/hudi-common/target/classes
   [INFO] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:
 Some input files use or override a deprecated API.
   [INFO] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:
 Recompile with -Xlint:deprecation for details.
   [INFO] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/util/ReflectionUtils.java:
 Some input files use unchecked or unsafe operations.
   [INFO] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/util/ReflectionUtils.java:
 Recompile with -Xlint:unchecked for details.
   [INFO] -
   [ERROR] COMPILATION ERROR : 
   [INFO] -
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[44,28]
 cannot find symbol
 symbol:   class XAttrSetFlag
 location: package org.apache.hadoop.fs
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[791,70]
 cannot find symbol
 symbol:   class XAttrSetFlag
 location: class org.apache.hudi.common.io.storage.HoodieWrapperFileSystem
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[670,3]
 method does not override or implement a method from a supertype
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[672,15]
 cannot find symbol
 symbol:   method 
access(org.apache.hadoop.fs.Path,org.apache.hadoop.fs.permission.FsAction)
 location: variable fileSystem of type org.apache.hadoop.fs.FileSystem
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[700,3]
 method does not override or implement a method from a supertype
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[702,22]
 method getFileChecksum in class org.apache.hadoop.fs.FileSystem cannot be 
applied to given types;
 required: org.apache.hadoop.fs.Path
 found: org.apache.hadoop.fs.Path,long
 reason: actual and formal argument lists differ in length
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[785,3]
 method does not override or implement a method from a supertype
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[787,15]
 cannot find symbol
 symbol:   method 
setXAttr(org.apache.hadoop.fs.Path,java.lang.String,byte[])
 location: variable fileSystem of type org.apache.hadoop.fs.FileSystem
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[790,3]
 method does not override or implement a method from a supertype
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[795,3]
 method does not override or implement a method from a supertype
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[797,22]
 cannot find symbol
 symbol:   method getXAttr(org.apache.hadoop.fs.Path,java.lang.String)
 location: variable fileSystem of type org.apache.hadoop.fs.FileSystem
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[800,3]
 method does not override or implement a method from a supertype
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[802,22]
 cannot find symbol
 symbol:   method getXAttrs(org.apache.hadoop.fs.Path)
 location: variable fileSystem of type org.apache.hadoop.fs.FileSystem
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[805,3]
 method does not override or implement a method from a supertype
   [ERROR] 
/tol/app/hudi/incubator-hudi/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java:[807,22]
 ca

[jira] [Updated] (HUDI-558) Introduce ability to compress bloom filters while storing in parquet

2020-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-558:

Labels: pull-request-available  (was: )

> Introduce ability to compress bloom filters while storing in parquet
> 
>
> Key: HUDI-558
> URL: https://issues.apache.org/jira/browse/HUDI-558
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>
> Based on performance study 
> [https://docs.google.com/spreadsheets/d/1KCmmdgaFTWBmpOk9trePdQ2m6wPVj2G328fTcRnQP1M/edit?usp=sharing]
>  we found that there is benefit in compressing bloom filters when storing in 
> parquet. As this is an experimental feature, we will need to disable this 
> feature by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar opened a new pull request #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-01-19 Thread GitBox
bvaradar opened a new pull request #1253: [HUDI-558] Introduce ability to 
compress bloom filters while storing in parquet
URL: https://github.com/apache/incubator-hudi/pull/1253
 
 
   
   ## What is the purpose of the pull request
   
   * Introduce ability to compress bloom filters while storing in parquet
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   ## Committer checklist
   
- [X ] Has a corresponding JIRA in PR title & commit

- [ X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-558) Introduce ability to compress bloom filters while storing in parquet

2020-01-19 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-558:
---

 Summary: Introduce ability to compress bloom filters while storing 
in parquet
 Key: HUDI-558
 URL: https://issues.apache.org/jira/browse/HUDI-558
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Common Core
Reporter: Balaji Varadarajan
 Fix For: 0.5.1


Based on performance study 
[https://docs.google.com/spreadsheets/d/1KCmmdgaFTWBmpOk9trePdQ2m6wPVj2G328fTcRnQP1M/edit?usp=sharing]
 we found that there is benefit in compressing bloom filters when storing in 
parquet. As this is an experimental feature, we will need to disable this 
feature by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-556) Check if any license attribution is needed for PR 1233

2020-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-556:

Labels: pull-request-available  (was: )

> Check if any license attribution is needed for PR 1233
> --
>
> Key: HUDI-556
> URL: https://issues.apache.org/jira/browse/HUDI-556
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Release & Administrative
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>
> We need to resolve this before we prepare a vote



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf opened a new pull request #1252: [HUDI-556] Add license for PR#1233

2020-01-19 Thread GitBox
leesf opened a new pull request #1252: [HUDI-556] Add license for PR#1233
URL: https://github.com/apache/incubator-hudi/pull/1252
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Add license for PR#1233
   
   ## Brief change log
   
   Update LICENSE file.
   
   ## Verify this pull request
   
   
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-551) Abstract a test case class for DFS Source to make it extensible

2020-01-19 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-551.

Resolution: Fixed

Fixed via master: 9489d0fb844208443a18964be878102e9560bd0d

> Abstract a test case class for DFS Source to make it extensible
> ---
>
> Key: HUDI-551
> URL: https://issues.apache.org/jira/browse/HUDI-551
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: DeltaStreamer, Testing
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Create a new class {{AbstractDFSSourceTestBase}} based on 
> {{DFSSourceTestCase}} in the last commit
>  * The common test logic still resides in {{AbstractDFSSourceTestBase}}
>  * For each DFS Source class, extend from {{AbstractDFSSourceTestBase}} to 
> add source-specific test logic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-551) Abstract a test case class for DFS Source to make it extensible

2020-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-551:

Labels: pull-request-available  (was: )

> Abstract a test case class for DFS Source to make it extensible
> ---
>
> Key: HUDI-551
> URL: https://issues.apache.org/jira/browse/HUDI-551
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: DeltaStreamer, Testing
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>
> * Create a new class {{AbstractDFSSourceTestBase}} based on 
> {{DFSSourceTestCase}} in the last commit
>  * The common test logic still resides in {{AbstractDFSSourceTestBase}}
>  * For each DFS Source class, extend from {{AbstractDFSSourceTestBase}} to 
> add source-specific test logic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf merged pull request #1239: [HUDI-551] Abstract a test case class for DFS Source to make it extensible

2020-01-19 Thread GitBox
leesf merged pull request #1239: [HUDI-551] Abstract a test case class for DFS 
Source to make it extensible
URL: https://github.com/apache/incubator-hudi/pull/1239
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (d0ee95e -> 9489d0f)

2020-01-19 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from d0ee95e  [HUDI-552] Fix the schema mismatch in Row-to-Avro conversion 
(#1246)
 add 9489d0f  [HUDI-551] Abstract a test case class for DFS Source to make 
it extensible (#1239)

No new revisions were added by this update.

Summary of changes:
 .../sources/AbstractDFSSourceTestBase.java | 178 +++
 .../hudi/utilities/sources/TestDFSSource.java  | 194 -
 .../hudi/utilities/sources/TestJsonDFSSource.java  |  55 ++
 .../utilities/sources/TestParquetDFSSource.java|  32 ++--
 4 files changed, 253 insertions(+), 206 deletions(-)
 create mode 100644 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/AbstractDFSSourceTestBase.java
 delete mode 100644 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestDFSSource.java
 create mode 100644 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJsonDFSSource.java
 copy 
hudi-common/src/main/java/org/apache/hudi/common/util/NoOpConsistencyGuard.java 
=> 
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestParquetDFSSource.java
 (51%)