[GitHub] [hudi] codecov-commenter edited a comment on pull request #1699: [HUDI-989]Support long options for prepare_integration_suite

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1699:
URL: https://github.com/apache/hudi/pull/1699#issuecomment-638613017


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=h1) Report
   > Merging 
[#1699](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=desc) into 
[hudi_test_suite_refactor](https://codecov.io/gh/apache/hudi/commit/6a0f4191ac34fc393d72f36530c8573273d4d045&el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1699/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff   @@
   ## hudi_test_suite_refactor#1699  +/-   ##
   ==
   - Coverage   18.23%   18.21%   -0.02% 
   + Complexity857  856   -1 
   ==
 Files 348  348  
 Lines   1534615346  
 Branches 1524 1524  
   ==
   - Hits 2798 2796   -2 
   - Misses  1219112193   +2 
 Partials  357  357  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/hudi/pull/1699/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `21.98% <0.00%> (-0.71%)` | `28.00% <0.00%> (-1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=footer). Last 
update 
[6a0f419...3340a10](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1699: [HUDI-989]Support long options for prepare_integration_suite

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1699:
URL: https://github.com/apache/hudi/pull/1699#issuecomment-638613017


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=h1) Report
   > Merging 
[#1699](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=desc) into 
[hudi_test_suite_refactor](https://codecov.io/gh/apache/hudi/commit/6a0f4191ac34fc393d72f36530c8573273d4d045&el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1699/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff   @@
   ## hudi_test_suite_refactor#1699  +/-   ##
   ==
   - Coverage   18.23%   18.21%   -0.02% 
   + Complexity857  856   -1 
   ==
 Files 348  348  
 Lines   1534615346  
 Branches 1524 1524  
   ==
   - Hits 2798 2796   -2 
   - Misses  1219112193   +2 
 Partials  357  357  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/hudi/pull/1699/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `21.98% <0.00%> (-0.71%)` | `28.00% <0.00%> (-1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=footer). Last 
update 
[6a0f419...3340a10](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter commented on pull request #1699: [HUDI-989]Support long options for prepare_integration_suite

2020-06-03 Thread GitBox


codecov-commenter commented on pull request #1699:
URL: https://github.com/apache/hudi/pull/1699#issuecomment-638613017


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=h1) Report
   > Merging 
[#1699](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=desc) into 
[hudi_test_suite_refactor](https://codecov.io/gh/apache/hudi/commit/6a0f4191ac34fc393d72f36530c8573273d4d045&el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1699/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=tree)
   
   ```diff
   @@  Coverage Diff   @@
   ## hudi_test_suite_refactor#1699  +/-   ##
   ==
   - Coverage   18.23%   18.21%   -0.02% 
   + Complexity857  856   -1 
   ==
 Files 348  348  
 Lines   1534615346  
 Branches 1524 1524  
   ==
   - Hits 2798 2796   -2 
   - Misses  1219112193   +2 
 Partials  357  357  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/hudi/pull/1699/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `21.98% <0.00%> (-0.71%)` | `28.00% <0.00%> (-1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=footer). Last 
update 
[6a0f419...3340a10](https://codecov.io/gh/apache/hudi/pull/1699?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1697: [WIP][HUDI-988] Fix issues causing Unit Test Flakiness

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1697:
URL: https://github.com/apache/hudi/pull/1697#issuecomment-637484534


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=h1) Report
   > Merging 
[#1697](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/a9a97d6af47841caaa745497ec425267db0873c8&el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `33.33%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1697/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1697   +/-   ##
   =
 Coverage 18.18%   18.19%   
   - Complexity  856  857+1 
   =
 Files   348  348   
 Lines 1535115358+7 
 Branches   1524 1525+1 
   =
   + Hits   2792 2794+2 
   - Misses1220212206+4 
   - Partials357  358+1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...on/rollback/CopyOnWriteRollbackActionExecutor.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL3JvbGxiYWNrL0NvcHlPbldyaXRlUm9sbGJhY2tBY3Rpb25FeGVjdXRvci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...on/rollback/MergeOnReadRollbackActionExecutor.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvYWN0aW9uL3JvbGxiYWNrL01lcmdlT25SZWFkUm9sbGJhY2tBY3Rpb25FeGVjdXRvci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `52.30% <0.00%> (ø)` | `28.00 <1.00> (ø)` | |
   | 
[.../hudi/client/embedded/EmbeddedTimelineService.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2VtYmVkZGVkL0VtYmVkZGVkVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | `72.22% <50.00%> (-2.07%)` | `7.00 <1.00> (ø)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `62.68% <50.00%> (-0.81%)` | `9.00 <1.00> (+1.00)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=footer). Last 
update 
[a9a97d6...7ec34e0](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-994) Identify functional tests that are convertible to unit tests with mocks

2020-06-03 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-994:

Description: 
* Identify convertible functional tests and re-implement by using mock
 * remove/merge duplicate/overlapping functional tests if possible

  was:Identify convertible functional tests and re-implement by using mock


> Identify functional tests that are convertible to unit tests with mocks
> ---
>
> Key: HUDI-994
> URL: https://issues.apache.org/jira/browse/HUDI-994
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Priority: Major
>
> * Identify convertible functional tests and re-implement by using mock
>  * remove/merge duplicate/overlapping functional tests if possible



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-781) Re-design test utilities

2020-06-03 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-781:

Status: Open  (was: New)

> Re-design test utilities
> 
>
> Key: HUDI-781
> URL: https://issues.apache.org/jira/browse/HUDI-781
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: Raymond Xu
>Priority: Major
>
> Test utility classes are to re-designed with considerations like
>  * Use more mockings
>  * Reduce spark context setup
>  * Improve/clean up data generator
> An RFC would be preferred for illustrating the design work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-995) Add hudi-testutils module

2020-06-03 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-995:

Description: 
* add a new module {{hudi-testutils}} and add it to all other modules as test 
dep and remove {{hudi-common}} etc from test dep list
 * selectively migrate test util classes like data gen to {{hudi-testutils}}
 * provide utils to be able generalize base file/log file style testing.

  was:
* add a new module {{hudi-testutils}} and add it to all other modules as test 
dep and remove {{hudi-common}} etc from test dep list
 * selectively migrate test util classes like data gen to {{hudi-testutils}}


> Add hudi-testutils module
> -
>
> Key: HUDI-995
> URL: https://issues.apache.org/jira/browse/HUDI-995
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Priority: Major
>
> * add a new module {{hudi-testutils}} and add it to all other modules as test 
> dep and remove {{hudi-common}} etc from test dep list
>  * selectively migrate test util classes like data gen to {{hudi-testutils}}
>  * provide utils to be able generalize base file/log file style testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-996) Use shared spark session provider

2020-06-03 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-996:
---

 Summary: Use shared spark session provider 
 Key: HUDI-996
 URL: https://issues.apache.org/jira/browse/HUDI-996
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Testing
Reporter: Raymond Xu


* implement a shared spark session provider to be used for test suites, setup 
and tear down less spark sessions and other mini servers
 * add functional tests with similar setup logic to test suites, to make use of 
shared spark session



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-995) Add hudi-testutils module

2020-06-03 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-995:
---

 Summary: Add hudi-testutils module
 Key: HUDI-995
 URL: https://issues.apache.org/jira/browse/HUDI-995
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Testing
Reporter: Raymond Xu


* add a new module {{hudi-testutils}} and add it to all other modules as test 
dep and remove {{hudi-common}} etc from test dep list
 * selectively migrate test util classes like data gen to {{hudi-testutils}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-994) Identify functional tests that are convertible to unit tests with mocks

2020-06-03 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-994:
---

 Summary: Identify functional tests that are convertible to unit 
tests with mocks
 Key: HUDI-994
 URL: https://issues.apache.org/jira/browse/HUDI-994
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Testing
Reporter: Raymond Xu


Identify convertible functional tests and re-implement by using mock



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-896) Parallelize CI testing to reduce CI wait time

2020-06-03 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-896:

Parent: HUDI-781
Issue Type: Sub-task  (was: Improvement)

> Parallelize CI testing to reduce CI wait time
> -
>
> Key: HUDI-896
> URL: https://issues.apache.org/jira/browse/HUDI-896
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> - 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] lamber-ken commented on pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-06-03 Thread GitBox


lamber-ken commented on pull request #1469:
URL: https://github.com/apache/hudi/pull/1469#issuecomment-638574852


   > @lamber-ken : LMK once the patch is ready to be reviewed again.
   
   Big thanks for reviewing this pr very much. Sorry for the delay, I'm working 
on something else, when I'm ready, will ping you 👍 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashanthpdesai closed issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


prashanthpdesai closed issue #1695:
URL: https://github.com/apache/hudi/issues/1695


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashanthpdesai edited a comment on issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


prashanthpdesai edited a comment on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638572601


   @bvaradar: Thank you for the clarification. i am able to read the hudi 
parquet files. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashanthpdesai commented on issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


prashanthpdesai commented on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638572601


   @bvaradar: Thank you for the clarification. i am able to read the parquet 
files. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 commented on a change in pull request #1702: Bootstrap datasource changes

2020-06-03 Thread GitBox


garyli1019 commented on a change in pull request #1702:
URL: https://github.com/apache/hudi/pull/1702#discussion_r434962196



##
File path: hudi-spark/src/main/scala/org/apache/hudi/HudiBootstrapRelation.scala
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.common.model.HoodieBaseFile
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView
+import org.apache.hudi.exception.HoodieException
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources.PartitionedFile
+import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
+import org.apache.spark.sql.{Row, SQLContext}
+import org.apache.spark.sql.sources.{BaseRelation, Filter, PrunedFilteredScan}
+import org.apache.spark.sql.types.StructType
+
+import scala.collection.JavaConverters._
+
+/**
+  * This is Spark relation that can be used for querying metadata/fully 
bootstrapped query hudi tables, as well as
+  * non-bootstrapped tables. It implements PrunedFilteredScan interface in 
order to support column pruning and filter
+  * push-down. For metadata bootstrapped files, if we query columns from both 
metadata and actual data then it will
+  * perform a merge of both to return the result.
+  *
+  * Caveat: Filter push-down does not work when querying both metadata and 
actual data columns over metadata
+  * bootstrapped files, because then the metadata file and data file can 
return different number of rows causing errors
+  * merging.
+  *
+  * @param _sqlContext Spark SQL Context
+  * @param userSchema User specified schema in the datasource query
+  * @param globPaths Globbed paths obtained from the user provided path for 
querying
+  * @param metaClient Hudi table meta client
+  * @param optParams DataSource options passed by the user
+  */
+class HudiBootstrapRelation(@transient val _sqlContext: SQLContext,
+val userSchema: StructType,
+val globPaths: Seq[Path],
+val metaClient: HoodieTableMetaClient,
+val optParams: Map[String, String]) extends 
BaseRelation
+  with PrunedFilteredScan with Logging {
+
+  val skeletonSchema: StructType = HudiSparkUtils.getHudiMetadataSchema
+  var dataSchema: StructType = _
+  var fullSchema: StructType = _
+
+  val fileIndex: HudiBootstrapFileIndex = buildFileIndex()
+
+  override def sqlContext: SQLContext = _sqlContext
+
+  override val needConversion: Boolean = false
+
+  override def schema: StructType = inferFullSchema()
+
+  override def buildScan(requiredColumns: Array[String], filters: 
Array[Filter]): RDD[Row] = {
+logInfo("Starting scan..")
+
+// Compute splits
+val bootstrapSplits = fileIndex.files.map(hoodieBaseFile => {
+  var skeletonFile: Option[PartitionedFile] = Option.empty
+  var dataFile: PartitionedFile = null
+
+  if (hoodieBaseFile.getExternalBaseFile.isPresent) {
+skeletonFile = Option(PartitionedFile(InternalRow.empty, 
hoodieBaseFile.getPath, 0, hoodieBaseFile.getFileLen))
+dataFile = PartitionedFile(InternalRow.empty, 
hoodieBaseFile.getExternalBaseFile.get().getPath, 0,
+  hoodieBaseFile.getExternalBaseFile.get().getFileLen)
+  } else {
+dataFile = PartitionedFile(InternalRow.empty, hoodieBaseFile.getPath, 
0, hoodieBaseFile.getFileLen)
+  }
+  HudiBootstrapSplit(dataFile, skeletonFile)
+})
+val tableState = HudiBootstrapTableState(bootstrapSplits)
+
+// Get required schemas for column pruning
+var requiredDataSchema = StructType(Seq())
+var requiredSkeletonSchema = StructType(Seq())
+requiredColumns.foreach(col => {
+  var field = dataSchema.find(_.name == col)
+  if (field.isDefined) {
+requiredDataSchema = requiredDataSchema.add(field.get)
+  } else {
+field = skeletonSchema.find(_.name == col)
+requiredSkeletonSchema = requiredSkeletonSchem

[GitHub] [hudi] bvaradar commented on issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


bvaradar commented on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638568039


   @prashanthpdesai : You should be using spark.read.format("hudi") instead of 
parquet to read Hudi datasets.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-06-03 Thread GitBox


nsivabalan commented on pull request #1469:
URL: https://github.com/apache/hudi/pull/1469#issuecomment-638566215


   @lamber-ken : LMK once the patch is ready to be reviewed again. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 edited a comment on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-03 Thread GitBox


garyli1019 edited a comment on pull request #1602:
URL: https://github.com/apache/hudi/pull/1602#issuecomment-638550734


   > are you able to verify this patch fixes the issues in your prod though? 
   
   @vinothchandar Yes, worked as expected. It will skip the small commit.
   Edit: Added a unit test to cover this case as well.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] kwondw opened a new pull request #1703: [HUDI-993] Let delete API use "hoodie.delete.shuffle.parallelism"

2020-06-03 Thread GitBox


kwondw opened a new pull request #1703:
URL: https://github.com/apache/hudi/pull/1703


   ## What is the purpose of the pull request
   
   For Delete API, I noticed "hoodie.delete.shuffle.parallelism" isn't used as 
opposed to "hoodie.upsert.shuffle.parallelism" is used for 
[upsert](https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/table/action/commit/WriteHelper.java#L104),
 this creates the performance difference between delete by upsert API with 
"EmptyHoodieRecordPayload" and delete API for certain cases.
   
   https://issues.apache.org/jira/browse/HUDI-993 has more detail.
   
   ## Brief change log
   
   * Let 
[deduplicateKeys](https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/table/action/commit/DeleteHelper.java#L51-L57)
 method use "hoodie.upsert.shuffle.parallelism"
   * Repartition inputRDD as "hoodie.upsert.shuffle.parallelism" in case 
"hoodie.combine.before.delete=false"
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [X] Necessary doc changes done or have another open PR
  
- [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-993) Use hoodie.delete.shuffle.parallelism for Delete API

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-993:

Labels: pull-request-available  (was: )

> Use hoodie.delete.shuffle.parallelism for Delete API
> 
>
> Key: HUDI-993
> URL: https://issues.apache.org/jira/browse/HUDI-993
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Dongwook Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> While HUDI-328 introduced Delete API, I noticed 
> [deduplicateKeys|https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/table/action/commit/DeleteHelper.java#L51-L57]
>  method doesn't allow any parallelism for RDD operation while 
> [deduplicateRecords|https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/table/action/commit/WriteHelper.java#L104]
>  for upsert uses parallelism on RDD.
> {{And "hoodie.delete.shuffle.parallelism" doesn't seem to be used.}}
>  
> I found certain cases, like input RDD has few parallelism but target table 
> has large files, certain Spark job's performance is suffered from low 
> parallelism. so in this case,  upsert performance with 
> "EmptyHoodieRecordPayload" is faster than delete API.
> Also this is due to the fact that "hoodie.combine.before.upsert" is true by 
> default, when it's not enabled, the issue would be the same.
> So I wonder input RDD should be repartition as 
> "hoodie.delete.shuffle.parallelism" when " hoodie.combine.before.delete" is 
> false for better performance regardless of "hoodie.combine.before.delete"
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] garyli1019 commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-03 Thread GitBox


garyli1019 commented on pull request #1602:
URL: https://github.com/apache/hudi/pull/1602#issuecomment-638550734


   > are you able to verify this patch fixes the issues in your prod though? 
   
   @vinothchandar Yes, worked as expected. It will skip the small commit.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-993) Use hoodie.delete.shuffle.parallelism for Delete API

2020-06-03 Thread Dongwook Kwon (Jira)
Dongwook Kwon created HUDI-993:
--

 Summary: Use hoodie.delete.shuffle.parallelism for Delete API
 Key: HUDI-993
 URL: https://issues.apache.org/jira/browse/HUDI-993
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Performance
Reporter: Dongwook Kwon


While HUDI-328 introduced Delete API, I noticed 
[deduplicateKeys|https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/table/action/commit/DeleteHelper.java#L51-L57]
 method doesn't allow any parallelism for RDD operation while 
[deduplicateRecords|https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/table/action/commit/WriteHelper.java#L104]
 for upsert uses parallelism on RDD.

{{And "hoodie.delete.shuffle.parallelism" doesn't seem to be used.}}

 

I found certain cases, like input RDD has few parallelism but target table has 
large files, certain Spark job's performance is suffered from low parallelism. 
so in this case,  upsert performance with "EmptyHoodieRecordPayload" is faster 
than delete API.

Also this is due to the fact that "hoodie.combine.before.upsert" is true by 
default, when it's not enabled, the issue would be the same.

So I wonder input RDD should be repartition as 
"hoodie.delete.shuffle.parallelism" when " hoodie.combine.before.delete" is 
false for better performance regardless of "hoodie.combine.before.delete"

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-03 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-992:
---
Parent: HUDI-242
Issue Type: Sub-task  (was: Bug)

> For hive-style partitioned source data, partition columns synced with Hive 
> will always have String type
> ---
>
> Key: HUDI-992
> URL: https://issues.apache.org/jira/browse/HUDI-992
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Udit Mehrotra
>Priority: Major
>
> Currently bootstrap implementation is not able to handle partition columns 
> correctly when the source data has *hive-style partitioning*, as is also 
> mentioned in https://jira.apache.org/jira/browse/HUDI-915
> The schema inferred while performing bootstrap and stored in the commit 
> metadata does not have partition column schema(in case of hive partitioned 
> data). As a result during hive-sync when hudi tries to determine the type of 
> partition column from that schema, it would not find it and assume the 
> default data type *string*.
> Here is where partition column schema is determined for hive-sync:
> [https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]
>  
> Thus no matter what the data type of partition column is in the source data 
> (atleast what spark infers it as from the path), it will always be synced as 
> string.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-03 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-992:
--

 Summary: For hive-style partitioned source data, partition columns 
synced with Hive will always have String type
 Key: HUDI-992
 URL: https://issues.apache.org/jira/browse/HUDI-992
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Udit Mehrotra


Currently bootstrap implementation is not able to handle partition columns 
correctly when the source data has *hive-style partitioning*, as is also 
mentioned in https://jira.apache.org/jira/browse/HUDI-915

The schema inferred while performing bootstrap and stored in the commit 
metadata does not have partition column schema(in case of hive partitioned 
data). As a result during hive-sync when hudi tries to determine the type of 
partition column from that schema, it would not find it and assume the default 
data type *string*.

Here is where partition column schema is determined for hive-sync:

[https://github.com/apache/hudi/blob/master/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java#L417]

 

Thus no matter what the data type of partition column is in the source data 
(atleast what spark infers it as from the path), it will always be synced as 
string.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-03 Thread GitBox


vinothchandar commented on pull request #1602:
URL: https://github.com/apache/hudi/pull/1602#issuecomment-638534258


   Reviewing again .. are you able to verify this patch fixes the issues in 
your prod though? Seems like a good thing to do.. 
   
   In general it’s good to be verifying in parallel without blocking on reviews 
here 😎



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-991) Bootstrap Implementation Bugs

2020-06-03 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-991:
--

 Summary: Bootstrap Implementation Bugs
 Key: HUDI-991
 URL: https://issues.apache.org/jira/browse/HUDI-991
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: Udit Mehrotra


This story tracks all the bugs we encounter while testing bootstrap changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] umehrot2 commented on pull request #1475: [HUDI-426][WIP] Initial implementation for Bootstrapping data source

2020-06-03 Thread GitBox


umehrot2 commented on pull request #1475:
URL: https://github.com/apache/hudi/pull/1475#issuecomment-638522351


   Closing this pull request, in favor of the new pull request 
https://github.com/apache/hudi/pull/1702 where I have consolidated all the 
datasource related changes in one PR for review. It includes this read 
datasource part as well. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] umehrot2 closed pull request #1475: [HUDI-426][WIP] Initial implementation for Bootstrapping data source

2020-06-03 Thread GitBox


umehrot2 closed pull request #1475:
URL: https://github.com/apache/hudi/pull/1475


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] umehrot2 opened a new pull request #1702: Bootstrap datasource changes

2020-06-03 Thread GitBox


umehrot2 opened a new pull request #1702:
URL: https://github.com/apache/hudi/pull/1702


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #1670: Error opening Hive split: Unknown converted type TIMESTAMP_MICROS

2020-06-03 Thread GitBox


vinothchandar commented on issue #1670:
URL: https://github.com/apache/hudi/issues/1670#issuecomment-638521271


   https://issues.apache.org/jira/browse/HUDI-83 Should have all the context



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-03 Thread GitBox


vinothchandar commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-638519702


   Beyond the initial shuffle, hudi will auto tune everything so I am not 
surprised. 
   
   On countByKey at HoodieBloomindex, what’s the line number?
   
   count at HoodieSparkSqlWriter, is actual writing of data. We send 100K 
records to the same insert partition to write larger file sizes. Can you see if 
there’s a skew in that stage? It’s tunable 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1701: [HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1701:
URL: https://github.com/apache/hudi/pull/1701#issuecomment-638466637


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=h1) Report
   > Merging 
[#1701](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=desc) into 
[release-0.5.3](https://codecov.io/gh/apache/hudi/commit/5fcc461647e197e805836c6aea24e9df8c09cf0f&el=desc)
 will **decrease** coverage by `0.07%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1701/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## release-0.5.3#1701  +/-   ##
   ===
   - Coverage69.87%   69.80%   -0.08% 
   + Complexity 993  204 -789 
   ===
 Files  322  322  
 Lines1551415521   +7 
 Branches  1602 1603   +1 
   ===
   - Hits 1084110834   -7 
   - Misses3958 3972  +14 
 Partials   715  715  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../hudi/client/embedded/EmbeddedTimelineService.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2VtYmVkZGVkL0VtYmVkZGVkVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | `75.00% <100.00%> (+0.71%)` | `0.00 <0.00> (-7.00)` | :arrow_up: |
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `90.71% <100.00%> (+0.02%)` | `0.00 <0.00> (-9.00)` | :arrow_up: |
   | 
[.../org/apache/hudi/table/HoodieMergeOnReadTable.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllTWVyZ2VPblJlYWRUYWJsZS5qYXZh)
 | `85.71% <100.00%> (+0.08%)` | `0.00 <0.00> (ø)` | |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `92.18% <100.00%> (ø)` | `0.00 <0.00> (-29.00)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `85.07% <100.00%> (+0.94%)` | `0.00 <0.00> (-8.00)` | :arrow_up: |
   | 
[...g/apache/hudi/exception/HoodieRemoteException.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZVJlbW90ZUV4Y2VwdGlvbi5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...on/table/view/RemoteHoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUmVtb3RlSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `77.59% <0.00%> (-5.47%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...e/hudi/timeline/service/FileSystemViewHandler.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvRmlsZVN5c3RlbVZpZXdIYW5kbGVyLmphdmE=)
 | `89.20% <0.00%> (-2.35%)` | `0.00% <0.00%> (-11.00%)` | |
   | 
[.../hudi/common/table/view/FileSystemViewManager.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdNYW5hZ2VyLmphdmE=)
 | `84.48% <0.00%> (+3.44%)` | `0.00% <0.00%> (-13.00%)` | :arrow_up: |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `94.44% <0.00%> (+5.55%)` | `0.00% <0.00%> (-4.00%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codec

[GitHub] [hudi] codecov-commenter commented on pull request #1701: [HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread GitBox


codecov-commenter commented on pull request #1701:
URL: https://github.com/apache/hudi/pull/1701#issuecomment-638466637


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=h1) Report
   > Merging 
[#1701](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=desc) into 
[release-0.5.3](https://codecov.io/gh/apache/hudi/commit/5fcc461647e197e805836c6aea24e9df8c09cf0f&el=desc)
 will **decrease** coverage by `0.07%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1701/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## release-0.5.3#1701  +/-   ##
   ===
   - Coverage69.87%   69.80%   -0.08% 
   + Complexity 993  204 -789 
   ===
 Files  322  322  
 Lines1551415521   +7 
 Branches  1602 1603   +1 
   ===
   - Hits 1084110834   -7 
   - Misses3958 3972  +14 
 Partials   715  715  
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../hudi/client/embedded/EmbeddedTimelineService.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2VtYmVkZGVkL0VtYmVkZGVkVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | `75.00% <100.00%> (+0.71%)` | `0.00 <0.00> (-7.00)` | :arrow_up: |
   | 
[.../org/apache/hudi/table/HoodieCopyOnWriteTable.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29weU9uV3JpdGVUYWJsZS5qYXZh)
 | `90.71% <100.00%> (+0.02%)` | `0.00 <0.00> (-9.00)` | :arrow_up: |
   | 
[.../org/apache/hudi/table/HoodieMergeOnReadTable.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllTWVyZ2VPblJlYWRUYWJsZS5qYXZh)
 | `85.71% <100.00%> (+0.08%)` | `0.00 <0.00> (ø)` | |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `92.18% <100.00%> (ø)` | `0.00 <0.00> (-29.00)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `85.07% <100.00%> (+0.94%)` | `0.00 <0.00> (-8.00)` | :arrow_up: |
   | 
[...g/apache/hudi/exception/HoodieRemoteException.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZVJlbW90ZUV4Y2VwdGlvbi5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...on/table/view/RemoteHoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvUmVtb3RlSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | `77.59% <0.00%> (-5.47%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...e/hudi/timeline/service/FileSystemViewHandler.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvRmlsZVN5c3RlbVZpZXdIYW5kbGVyLmphdmE=)
 | `89.20% <0.00%> (-2.35%)` | `0.00% <0.00%> (-11.00%)` | |
   | 
[.../hudi/common/table/view/FileSystemViewManager.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdNYW5hZ2VyLmphdmE=)
 | `84.48% <0.00%> (+3.44%)` | `0.00% <0.00%> (-13.00%)` | :arrow_up: |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/hudi/pull/1701/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `94.44% <0.00%> (+5.55%)` | `0.00% <0.00%> (-4.00%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1701?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/g

[jira] [Commented] (HUDI-983) Add Metrics section to asf-site

2020-06-03 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125171#comment-17125171
 ] 

Raymond Xu commented on HUDI-983:
-

[~shenhong] Sure. Thanks for taking the initiative!

> Add Metrics section to asf-site
> ---
>
> Key: HUDI-983
> URL: https://issues.apache.org/jira/browse/HUDI-983
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Assignee: Hong Shen
>Priority: Minor
>  Labels: documentation, newbie
> Fix For: 0.6.0
>
>
> Document the use of metrics system in Hudi, include all supported metrics 
> reporter.
> See the example
> https://user-images.githubusercontent.com/20113411/83055820-f5e97100-a086-11ea-9ea3-52b342aca9d4.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] prashanthpdesai edited a comment on issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


prashanthpdesai edited a comment on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638335917


   @nsivabalan : thank you i was able to write it successfully with global 
index after pointing to newer version of jar and but i see below exception 
while reading the parquet file . 
   could you please check is that something you can help on this.
   
   Not sure why its trying to read .commit file which is causing magic byte 
exception. 
   
   spark.read.parquet(basepath+"/*").show(false)
   
   **Caused by: org.apache.spark.SparkException: Exception thrown in 
awaitResult:**
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at org.apache.spark.scheduler.Task.run(Task.scala:123)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.org$apache$spark$executor$Executor$TaskRunner$$anonfun$$res$1(Executor.scala:412)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:419)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1359)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:430)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
   **Caused by: java.io.IOException: Could not read footer for file: 
FileStatus{path=maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit;**
 isDirectory=false; length=4366; replication=0; blocksize=0; 
modification_time=0; access_time=0; owner=; group=; permission=rw-rw-rw-; 
isSymlink=false}
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:551)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:538)
 at 
org.apache.spark.util.ThreadUtils$$anonfun$3$$anonfun$apply$1.apply(ThreadUtils.scala:287)
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
 at 
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
   **Caused by: java.lang.RuntimeException: 
maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit is not a 
Parquet file. expected magic number at tail [80, 65, 82, 49] but found [32, 48, 
10, 125]**
   
   
   
   info:
   /basepath/.hoodie/
   drwxr-sr-x. 2 xxx xgc0 Jun  3 11:55 archived
   -rwxr-xr-x. 1 xxx xgc  207 Jun  3 11:55 hoodie.properties
   -rwxr-xr-x. 1 xxx xgc0 Jun  3 11:55 20200603115556.commit.requested
   -rwxr-xr-x. 1 xxx xgc  380 Jun  3 11:56 20200603115556.inflight
   -rwxr-xr-x. 1 xxx xgc 4366 Jun  3 11:56 20200603115556.commit
   -rwxr-xr-x. 1 xxx xgc0 Jun  3 11:57 20200603115719.commit.requested
   -rwxr-xr-x. 1 xxx xgc  380 Jun  3 11:57 20200603115719.inflight
   -rwxr-xr-x. 1 xxx xgc 5906 Jun  3 11:57 20200603115719.commit




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashanthpdesai edited a comment on issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


prashanthpdesai edited a comment on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638335917


   @nsivabalan : thank you i was able to write it successfully with global 
index after pointing to newer version of jar and but i see below exception 
while reading the parquet file .
   could you please check is that something you can help on this.
   
   spark.read.parquet(basepath+"/*").show(false)
   
   **Caused by: org.apache.spark.SparkException: Exception thrown in 
awaitResult:**
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at org.apache.spark.scheduler.Task.run(Task.scala:123)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.org$apache$spark$executor$Executor$TaskRunner$$anonfun$$res$1(Executor.scala:412)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:419)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1359)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:430)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
   **Caused by: java.io.IOException: Could not read footer for file: 
FileStatus{path=maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit;**
 isDirectory=false; length=4366; replication=0; blocksize=0; 
modification_time=0; access_time=0; owner=; group=; permission=rw-rw-rw-; 
isSymlink=false}
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:551)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:538)
 at 
org.apache.spark.util.ThreadUtils$$anonfun$3$$anonfun$apply$1.apply(ThreadUtils.scala:287)
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
 at 
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
   **Caused by: java.lang.RuntimeException: 
maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit is not a 
Parquet file. expected magic number at tail [80, 65, 82, 49] but found [32, 48, 
10, 125]**
   
   
   info:
   /basepath/.hoodie/
   drwxr-sr-x. 2 xxx xgc0 Jun  3 11:55 archived
   -rwxr-xr-x. 1 xxx xgc  207 Jun  3 11:55 hoodie.properties
   -rwxr-xr-x. 1 xxx xgc0 Jun  3 11:55 20200603115556.commit.requested
   -rwxr-xr-x. 1 xxx xgc  380 Jun  3 11:56 20200603115556.inflight
   -rwxr-xr-x. 1 xxx xgc 4366 Jun  3 11:56 20200603115556.commit
   -rwxr-xr-x. 1 xxx xgc0 Jun  3 11:57 20200603115719.commit.requested
   -rwxr-xr-x. 1 xxx xgc  380 Jun  3 11:57 20200603115719.inflight
   -rwxr-xr-x. 1 xxx xgc 5906 Jun  3 11:57 20200603115719.commit




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashanthpdesai edited a comment on issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


prashanthpdesai edited a comment on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638335917


   @nsivabalan : thank you i was able to write it successfully with global 
index after pointing to newer version of jar and but i see below exception 
while reading the parquet file .
   could you please check is that something you can help on this.
   
   spark.read.parquet(basepath+"/*").show(false)
   
   **Caused by: org.apache.spark.SparkException: Exception thrown in 
awaitResult:**
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at org.apache.spark.scheduler.Task.run(Task.scala:123)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.org$apache$spark$executor$Executor$TaskRunner$$anonfun$$res$1(Executor.scala:412)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:419)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1359)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:430)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
   **Caused by: java.io.IOException: Could not read footer for file: 
FileStatus{path=maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit;**
 isDirectory=false; length=4366; replication=0; blocksize=0; 
modification_time=0; access_time=0; owner=; group=; permission=rw-rw-rw-; 
isSymlink=false}
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:551)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:538)
 at 
org.apache.spark.util.ThreadUtils$$anonfun$3$$anonfun$apply$1.apply(ThreadUtils.scala:287)
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
 at 
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
   **Caused by: java.lang.RuntimeException: 
maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit is not a 
Parquet file. expected magic number at tail [80, 65, 82, 49] but found [32, 48, 
10, 125]**
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashanthpdesai commented on issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


prashanthpdesai commented on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638335917


   @nsivabalan : thank you i was able to write it successfully with global 
index after pointing to newer version of jar and but i see below exception 
while reading the parquet file .
   could you please check is that something you can help on this.
   
   spark.read.parquet(basepath+"/*").show(false)
   
   **Caused by: org.apache.spark.SparkException: Exception thrown in 
awaitResult:**
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
 at org.apache.spark.util.ThreadUtils$.parmap(ThreadUtils.scala:290)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readParquetFootersInParallel(ParquetFileFormat.scala:538)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$9.apply(ParquetFileFormat.scala:611)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$9.apply(ParquetFileFormat.scala:603)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at org.apache.spark.scheduler.Task.run(Task.scala:123)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.org$apache$spark$executor$Executor$TaskRunner$$anonfun$$res$1(Executor.scala:412)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:419)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1359)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:430)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
   **Caused by: java.io.IOException: Could not read footer for file: 
FileStatus{path=maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit;**
 isDirectory=false; length=4366; replication=0; blocksize=0; 
modification_time=0; access_time=0; owner=; group=; permission=rw-rw-rw-; 
isSymlink=false}
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:551)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:538)
 at 
org.apache.spark.util.ThreadUtils$$anonfun$3$$anonfun$apply$1.apply(ThreadUtils.scala:287)
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
 at 
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
   **Caused by: java.lang.RuntimeException: 
maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit is not a 
Parquet file. expected magic number at tail [80, 65, 82, 49] but found [32, 48, 
10, 125]**
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-03 Thread GitBox


garyli1019 commented on pull request #1602:
URL: https://github.com/apache/hudi/pull/1602#issuecomment-638335182


   @vinothchandar @bvaradar @nsivabalan Any thoughts on this PR?
   This bug is happening quite often in my production. One small commit will 
screw up the table.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1697: [WIP][HUDI-988] Fix issues causing Unit Test Flakiness

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1697:
URL: https://github.com/apache/hudi/pull/1697#issuecomment-637484534


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=h1) Report
   > Merging 
[#1697](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/a9a97d6af47841caaa745497ec425267db0873c8&el=desc)
 will **increase** coverage by `0.02%`.
   > The diff coverage is `42.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1697/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1697  +/-   ##
   
   + Coverage 18.18%   18.20%   +0.02% 
   - Complexity  856  858   +2 
   
 Files   348  348  
 Lines 1535115356   +5 
 Branches   1524 1525   +1 
   
   + Hits   2792 2796   +4 
 Misses1220212202  
   - Partials357  358   +1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `52.30% <0.00%> (ø)` | `28.00 <1.00> (ø)` | |
   | 
[.../hudi/client/embedded/EmbeddedTimelineService.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2VtYmVkZGVkL0VtYmVkZGVkVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | `72.22% <50.00%> (-2.07%)` | `7.00 <1.00> (ø)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `62.68% <50.00%> (-0.81%)` | `9.00 <1.00> (+1.00)` | :arrow_down: |
   | 
[...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=)
 | `22.69% <0.00%> (+0.70%)` | `29.00% <0.00%> (+1.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=footer). Last 
update 
[a9a97d6...cb950c1](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


nsivabalan commented on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638300159


   ```
   spark.read.parquet(basepath+"/*").show(false)
   
+---++--+--++--+--+--++
   
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
   |fanme |lname |ts
|uuid|
   
+---++--+--++--+--+--++
   |20200603155652 |20200603155652_0_2  |20|2020-01-30  
  
|a9e4f829-1a0d-49e0-9ed5-254808a4a4bf-0_0-22-12006_20200603155652.parquet|prabil|bal
   |2020-01-30|20  |
   |20200603155652 |20200603155652_2_1  |10|2019-10-15  
  
|0e790488-ebf3-479a-9044-d819b620d085-0_2-22-12008_20200603155652.parquet|pd
|desai1|2019-10-15|10  |
   |20200603155652 |20200603155652_1_3  |11|2019-10-14  
  
|2ec0b40d-7a44-496b-9c92-68fe920c6111-0_1-22-12007_20200603155652.parquet|pp
|sai   |2019-10-14|11  |
   
+---++--+--++--+--+--++
   ```
   
   After update:
   ```
   spark.read.parquet(basepath+"/*").show(false)
   
+---++--+--++--+-+--++
   
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
   |fanme |lname|ts
|uuid|
   
+---++--+--++--+-+--++
   |20200603160032 |20200603160032_1_4  |11|2019-10-18  
  
|1fed9735-8932-4ac3-bdb0-d5e94e23267c-0_1-56-25537_20200603160032.parquet|pp
|sai  |2019-10-18|11  |
   |20200603160032 |20200603160032_1_5  |25|2019-10-18  
  
|1fed9735-8932-4ac3-bdb0-d5e94e23267c-0_1-56-25537_20200603160032.parquet|rg
|fg   |2019-10-18|25  |
   |20200603155652 |20200603155652_0_2  |20|2020-01-30  
  
|a9e4f829-1a0d-49e0-9ed5-254808a4a4bf-0_0-22-12006_20200603155652.parquet|prabil|bal
  |2020-01-30|20  |
   |20200603160032 |20200603160032_0_6  |10|2019-10-17  
  
|17f34208-cb9e-4a39-b796-a7168d3d089a-0_0-56-25536_20200603160032.parquet|pd
|desai|2019-10-17|10  |
   
+---++--+--++--+-+--++
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #1695: [SUPPORT] : Global Bloom Index config issue

2020-06-03 Thread GitBox


nsivabalan commented on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638296488


   @prashanthpdesai : I tried and it works for me. I am using 0.5.2-incubating 
bundle (org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating). Not sure if 
that makes any diff. 
   
   Can you try the following and let me know what output you see.
   
   ```
   val table ="hudi_cow1"
   val basepath="/datalake/globalndextest"
   val df3=spark.read.option("header","true").csv("/datalake/888/test3.csv"
   val dfh4=df3.write.format("org.apache.hudi").option(RECORDKEY_FIELD_OPT_KEY, 
"uuid").option(PARTITIONPATH_FIELD_OPT_KEY,"ts").option("hoodie.index.type","GLOBAL_BLOOM").option("hoodie.bloom.index.update.partition.path","true").option(TABLE_NAME,table)
   dfh4.mode(Append).save(basepath)
   
   spark.read.parquet(basepath+"/*").show(false)
   
   val df5 = spark.read.option("header","true").csv("/datalake/888/test4.csv"
   val dfh6 = 
df5.write.format("org.apache.hudi").option(RECORDKEY_FIELD_OPT_KEY, 
"uuid").option(PARTITIONPATH_FIELD_OPT_KEY,"ts").option("hoodie.index.type","GLOBAL_BLOOM").option("hoodie.bloom.index.update.partition.path","true").option(TABLE_NAME,table)
   dfh6.mode(Append).save(basepath)
   
   spark.read.parquet(basepath+"/*").show(false)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-983) Add Metrics section to asf-site

2020-06-03 Thread Hong Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen reassigned HUDI-983:
--

Assignee: Hong Shen

> Add Metrics section to asf-site
> ---
>
> Key: HUDI-983
> URL: https://issues.apache.org/jira/browse/HUDI-983
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Assignee: Hong Shen
>Priority: Minor
>  Labels: documentation, newbie
> Fix For: 0.6.0
>
>
> Document the use of metrics system in Hudi, include all supported metrics 
> reporter.
> See the example
> https://user-images.githubusercontent.com/20113411/83055820-f5e97100-a086-11ea-9ea3-52b342aca9d4.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-983) Add Metrics section to asf-site

2020-06-03 Thread Hong Shen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125009#comment-17125009
 ] 

Hong Shen commented on HUDI-983:


[~rxu] I am interested in it, I will add a pull request in this weekend.



> Add Metrics section to asf-site
> ---
>
> Key: HUDI-983
> URL: https://issues.apache.org/jira/browse/HUDI-983
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Priority: Minor
>  Labels: documentation, newbie
> Fix For: 0.6.0
>
>
> Document the use of metrics system in Hudi, include all supported metrics 
> reporter.
> See the example
> https://user-images.githubusercontent.com/20113411/83055820-f5e97100-a086-11ea-9ea3-52b342aca9d4.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan commented on pull request #1698: [HUDI-986] Support staging site for per pull request

2020-06-03 Thread GitBox


xushiyan commented on pull request #1698:
URL: https://github.com/apache/hudi/pull/1698#issuecomment-638229425


   > @xushiyan are you able to help review this 
   
   Sure i can help with this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-957) Umbrella ticket for sequencing common tasks required to progress/unblock RFC-08, RFC-15 & RFC-19

2020-06-03 Thread Hong Shen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124926#comment-17124926
 ] 

Hong Shen commented on HUDI-957:


[~nishith29] We also want to do this, please @ me if need.

> Umbrella ticket for sequencing common tasks required to progress/unblock 
> RFC-08, RFC-15 & RFC-19
> 
>
> Key: HUDI-957
> URL: https://issues.apache.org/jira/browse/HUDI-957
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Common Core, Compaction, Index, Storage Management
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>
> There are 3 different designs proposed in following RFC's 08, 15 & 19. On 
> further analysis there are a bunch of common changes that will benefit all 3 
> and some that are specific to each individual design. This ticket is to track 
> most of the common changes so those can be parallelized and all members in 
> the community can help contribute to land these soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter edited a comment on pull request #1697: [WIP][HUDI-988] Fix issues causing Unit Test Flakiness

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1697:
URL: https://github.com/apache/hudi/pull/1697#issuecomment-637484534


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=h1) Report
   > Merging 
[#1697](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/a9a97d6af47841caaa745497ec425267db0873c8&el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `42.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1697/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1697   +/-   ##
   =
 Coverage 18.18%   18.19%   
   - Complexity  856  857+1 
   =
 Files   348  348   
 Lines 1535115356+5 
 Branches   1524 1525+1 
   =
   + Hits   2792 2794+2 
   - Misses1220212204+2 
   - Partials357  358+1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `52.30% <0.00%> (ø)` | `28.00 <1.00> (ø)` | |
   | 
[.../hudi/client/embedded/EmbeddedTimelineService.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2VtYmVkZGVkL0VtYmVkZGVkVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | `72.22% <50.00%> (-2.07%)` | `7.00 <1.00> (ø)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `62.68% <50.00%> (-0.81%)` | `9.00 <1.00> (+1.00)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=footer). Last 
update 
[a9a97d6...cb950c1](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1697: [WIP][HUDI-988] Fix issues causing Unit Test Flakiness

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1697:
URL: https://github.com/apache/hudi/pull/1697#issuecomment-637484534


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=h1) Report
   > Merging 
[#1697](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/a9a97d6af47841caaa745497ec425267db0873c8&el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `42.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1697/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1697   +/-   ##
   =
 Coverage 18.18%   18.19%   
   - Complexity  856  857+1 
   =
 Files   348  348   
 Lines 1535115356+5 
 Branches   1524 1525+1 
   =
   + Hits   2792 2794+2 
   - Misses1220212204+2 
   - Partials357  358+1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `52.30% <0.00%> (ø)` | `28.00 <1.00> (ø)` | |
   | 
[.../hudi/client/embedded/EmbeddedTimelineService.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2VtYmVkZGVkL0VtYmVkZGVkVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | `72.22% <50.00%> (-2.07%)` | `7.00 <1.00> (ø)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `62.68% <50.00%> (-0.81%)` | `9.00 <1.00> (+1.00)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=footer). Last 
update 
[a9a97d6...cb950c1](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1697: [WIP][HUDI-988] Fix issues causing Unit Test Flakiness

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1697:
URL: https://github.com/apache/hudi/pull/1697#issuecomment-637484534


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=h1) Report
   > Merging 
[#1697](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/a9a97d6af47841caaa745497ec425267db0873c8&el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `42.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1697/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1697   +/-   ##
   =
 Coverage 18.18%   18.19%   
   - Complexity  856  857+1 
   =
 Files   348  348   
 Lines 1535115356+5 
 Branches   1524 1525+1 
   =
   + Hits   2792 2794+2 
   - Misses1220212204+2 
   - Partials357  358+1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `52.30% <0.00%> (ø)` | `28.00 <1.00> (ø)` | |
   | 
[.../hudi/client/embedded/EmbeddedTimelineService.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2VtYmVkZGVkL0VtYmVkZGVkVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | `72.22% <50.00%> (-2.07%)` | `7.00 <1.00> (ø)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `62.68% <50.00%> (-0.81%)` | `9.00 <1.00> (+1.00)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=footer). Last 
update 
[a9a97d6...cb950c1](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1697: [WIP][HUDI-988] Fix issues causing Unit Test Flakiness

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1697:
URL: https://github.com/apache/hudi/pull/1697#issuecomment-637484534


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=h1) Report
   > Merging 
[#1697](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/a9a97d6af47841caaa745497ec425267db0873c8&el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `42.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1697/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1697   +/-   ##
   =
 Coverage 18.18%   18.19%   
   - Complexity  856  857+1 
   =
 Files   348  348   
 Lines 1535115356+5 
 Branches   1524 1525+1 
   =
   + Hits   2792 2794+2 
   - Misses1220212204+2 
   - Partials357  358+1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `52.30% <0.00%> (ø)` | `28.00 <1.00> (ø)` | |
   | 
[.../hudi/client/embedded/EmbeddedTimelineService.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2VtYmVkZGVkL0VtYmVkZGVkVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | `72.22% <50.00%> (-2.07%)` | `7.00 <1.00> (ø)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `62.68% <50.00%> (-0.81%)` | `9.00 <1.00> (+1.00)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=footer). Last 
update 
[a9a97d6...cb950c1](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-commenter edited a comment on pull request #1697: [WIP][HUDI-988] Fix issues causing Unit Test Flakiness

2020-06-03 Thread GitBox


codecov-commenter edited a comment on pull request #1697:
URL: https://github.com/apache/hudi/pull/1697#issuecomment-637484534


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=h1) Report
   > Merging 
[#1697](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/a9a97d6af47841caaa745497ec425267db0873c8&el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `42.85%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1697/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1697   +/-   ##
   =
 Coverage 18.18%   18.19%   
   - Complexity  856  857+1 
   =
 Files   348  348   
 Lines 1535115356+5 
 Branches   1524 1525+1 
   =
   + Hits   2792 2794+2 
   - Misses1220212204+2 
   - Partials357  358+1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `52.30% <0.00%> (ø)` | `28.00 <1.00> (ø)` | |
   | 
[.../hudi/client/embedded/EmbeddedTimelineService.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2VtYmVkZGVkL0VtYmVkZGVkVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | `72.22% <50.00%> (-2.07%)` | `7.00 <1.00> (ø)` | |
   | 
[...common/table/view/FileSystemViewStorageConfig.java](https://codecov.io/gh/apache/hudi/pull/1697/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlQ29uZmlnLmphdmE=)
 | `62.68% <50.00%> (-0.81%)` | `9.00 <1.00> (+1.00)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=footer). Last 
update 
[a9a97d6...cb950c1](https://codecov.io/gh/apache/hudi/pull/1697?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on pull request #1697: [WIP][HUDI-988] Fix issues causing Unit Test Flakiness

2020-06-03 Thread GitBox


bvaradar commented on pull request #1697:
URL: https://github.com/apache/hudi/pull/1697#issuecomment-638027639


   @vinothchandar : Yes, there are a subset of related changes present in 0.5.3 
as well. It would be better to cherry pick once we resolve all issues. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on pull request #1701: [HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread GitBox


bvaradar commented on pull request #1701:
URL: https://github.com/apache/hudi/pull/1701#issuecomment-638023012


   cc @vinothchandar @nsivabalan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-990) Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-990:

Labels: pull-request-available  (was: )

> Timeline API : filterCompletedAndCompactionInstants needs to handle requested 
> state correctly
> -
>
> Key: HUDI-990
> URL: https://issues.apache.org/jira/browse/HUDI-990
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0, 0.5.3
>
>
> This bug was causing timeline server API calls during index lookup phase to 
> fail and backup local view getting constructed. This manifested when new 
> "requested" state got introduced for commits. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] bvaradar opened a new pull request #1701: [HUDI-990] Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread GitBox


bvaradar opened a new pull request #1701:
URL: https://github.com/apache/hudi/pull/1701


   Contains:
   
   1. Code changes to fix HUDI-990 and
   2. Fallback for remote file-system view disabled for tests. Unit tests have 
been made to fail when they cannot get response from remote file system view 
calls.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-990) Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124707#comment-17124707
 ] 

Balaji Varadarajan commented on HUDI-990:
-

[~shivnarayan] : FYI : This would need to go to 0.5.3 release

> Timeline API : filterCompletedAndCompactionInstants needs to handle requested 
> state correctly
> -
>
> Key: HUDI-990
> URL: https://issues.apache.org/jira/browse/HUDI-990
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0, 0.5.3
>
>
> This bug was causing timeline server API calls during index lookup phase to 
> fail and backup local view getting constructed. This manifested when new 
> "requested" state got introduced for commits. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-990) Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-990:

Status: Open  (was: New)

> Timeline API : filterCompletedAndCompactionInstants needs to handle requested 
> state correctly
> -
>
> Key: HUDI-990
> URL: https://issues.apache.org/jira/browse/HUDI-990
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>
> This bug was causing timeline server API calls during index lookup phase to 
> fail and backup local view getting constructed. This manifested when new 
> "requested" state got introduced for commits. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-990) Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-990:
---

Assignee: Balaji Varadarajan

> Timeline API : filterCompletedAndCompactionInstants needs to handle requested 
> state correctly
> -
>
> Key: HUDI-990
> URL: https://issues.apache.org/jira/browse/HUDI-990
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>
> This bug was causing timeline server API calls during index lookup phase to 
> fail and backup local view getting constructed. This manifested when new 
> "requested" state got introduced for commits. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-990) Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-990:

Fix Version/s: 0.5.3
   0.6.0

> Timeline API : filterCompletedAndCompactionInstants needs to handle requested 
> state correctly
> -
>
> Key: HUDI-990
> URL: https://issues.apache.org/jira/browse/HUDI-990
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0, 0.5.3
>
>
> This bug was causing timeline server API calls during index lookup phase to 
> fail and backup local view getting constructed. This manifested when new 
> "requested" state got introduced for commits. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-990) Timeline API : filterCompletedAndCompactionInstants needs to handle requested state correctly

2020-06-03 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-990:
---

 Summary: Timeline API : filterCompletedAndCompactionInstants needs 
to handle requested state correctly
 Key: HUDI-990
 URL: https://issues.apache.org/jira/browse/HUDI-990
 Project: Apache Hudi
  Issue Type: Bug
  Components: Common Core
Reporter: Balaji Varadarajan


This bug was causing timeline server API calls during index lookup phase to 
fail and backup local view getting constructed. This manifested when new 
"requested" state got introduced for commits. 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)