[GitHub] [hudi] codecov-io commented on pull request #2761: [HUDI-1676] Support SQL with spark3

2021-04-02 Thread GitBox


codecov-io commented on pull request #2761:
URL: https://github.com/apache/hudi/pull/2761#issuecomment-812815750


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2761?src=pr=h1) Report
   > Merging 
[#2761](https://codecov.io/gh/apache/hudi/pull/2761?src=pr=desc) (f404051) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc)
 (e970e1f) will **decrease** coverage by `42.94%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2761/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2761?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2761   +/-   ##
   
   - Coverage 52.32%   9.38%   -42.95% 
   + Complexity 3689  48 -3641 
   
 Files   483  54  -429 
 Lines 230951993-21102 
 Branches   2460 236 -2224 
   
   - Hits  12084 187-11897 
   + Misses 99421793 -8149 
   + Partials   1069  13 -1056 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.38% <ø> (-60.32%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2761?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2761/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2761: [HUDI-1676] Support SQL with spark3

2021-04-02 Thread GitBox


xiarixiaoyao commented on a change in pull request #2761:
URL: https://github.com/apache/hudi/pull/2761#discussion_r606622137



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala
##
@@ -302,6 +302,10 @@ case class HoodieFileIndex(
   PartitionRowPath(partitionRow, partitionPath)
 }
 
+if (partitionRowPaths.isEmpty) {
+  partitionRowPaths = Seq(PartitionRowPath(InternalRow.empty, "")).toBuffer
+}
+

Review comment:
   simple fixed the bug , introduced by HUDI-1591




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2761: [HUDI-1676] Support SQL with spark3

2021-04-02 Thread GitBox


xiarixiaoyao commented on a change in pull request #2761:
URL: https://github.com/apache/hudi/pull/2761#discussion_r606622137



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala
##
@@ -302,6 +302,10 @@ case class HoodieFileIndex(
   PartitionRowPath(partitionRow, partitionPath)
 }
 
+if (partitionRowPaths.isEmpty) {
+  partitionRowPaths = Seq(PartitionRowPath(InternalRow.empty, "")).toBuffer
+}
+

Review comment:
   simple fixed the bug ,, introduce by HUDI-1591




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #2761: [HUDI-1676] Support SQL with spark3

2021-04-02 Thread GitBox


xiarixiaoyao commented on pull request #2761:
URL: https://github.com/apache/hudi/pull/2761#issuecomment-812815163


   @vinothchandar , could you help me to review this pr,  thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1676) Support SQL with spark3

2021-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1676:
-
Labels: pull-request-available  (was: )

> Support SQL with spark3
> ---
>
> Key: HUDI-1676
> URL: https://issues.apache.org/jira/browse/HUDI-1676
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> 1、support CTAS for spark3
> 3、support INSERT for spark3
> 4、support merge、update、delete without RowKey constraint for spark3
> 5、support dataSourceV2 for spark3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xiarixiaoyao opened a new pull request #2761: [HUDI-1676] Support SQL with spark3

2021-04-02 Thread GitBox


xiarixiaoyao opened a new pull request #2761:
URL: https://github.com/apache/hudi/pull/2761


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Support SQL with spark3 , compatible with dataSourceV1 and dataSourceV2 and 
hive table
   1、support CTAS for spark3 (for mor table ro and rt table will be create at 
the  same time)
   2、support INSERT for spark3
   3、support merge、update、delete without RowKey constraint for spark3
   
   the pr for hudi support dataSourceV2 will put forward in next few days
   this pr is supplement of https://github.com/apache/hudi/pull/2645 which 
implement basic sql support and mergeInto with the RowKey constraint.
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer

2021-04-02 Thread GitBox


codecov-io edited a comment on pull request #2757:
URL: https://github.com/apache/hudi/pull/2757#issuecomment-812247500


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=h1) Report
   > Merging 
[#2757](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=desc) (9372602) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/9804662bc8e17d6936c20326f17ec7c0360dcaf6?el=desc)
 (9804662) will **decrease** coverage by `42.74%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2757/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2757   +/-   ##
   
   - Coverage 52.12%   9.38%   -42.75% 
   + Complexity 3646  48 -3598 
   
 Files   480  54  -426 
 Lines 228671993-20874 
 Branches   2417 236 -2181 
   
   - Hits  11920 187-11733 
   + Misses 99161793 -8123 
   + Partials   1031  13 -1018 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.38% <ø> (-60.36%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] codecov-io edited a comment on pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer

2021-04-02 Thread GitBox


codecov-io edited a comment on pull request #2757:
URL: https://github.com/apache/hudi/pull/2757#issuecomment-812247500


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=h1) Report
   > Merging 
[#2757](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=desc) (8dd3a6f) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/9804662bc8e17d6936c20326f17ec7c0360dcaf6?el=desc)
 (9804662) will **increase** coverage by `17.56%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2757/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2757   +/-   ##
   =
   + Coverage 52.12%   69.69%   +17.56% 
   + Complexity 3646  371 -3275 
   =
 Files   480   54  -426 
 Lines 22867 1993-20874 
 Branches   2417  236 -2181 
   =
   - Hits  11920 1389-10531 
   + Misses 9916  473 -9443 
   + Partials   1031  131  -900 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.69% <ø> (-0.04%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.08% <0.00%> (-0.30%)` | `55.00% <0.00%> (ø%)` | |
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | | | |
   | 
[...g/apache/hudi/common/util/RocksDBSchemaHelper.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUm9ja3NEQlNjaGVtYUhlbHBlci5qYXZh)
 | | | |
   | 
[...e/hudi/metadata/FileSystemBackedTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvRmlsZVN5c3RlbUJhY2tlZFRhYmxlTWV0YWRhdGEuamF2YQ==)
 | | | |
   | 
[...main/scala/org/apache/hudi/HoodieWriterUtils.scala](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVdyaXRlclV0aWxzLnNjYWxh)
 | | | |
   | 
[.../org/apache/hudi/common/engine/EngineProperty.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9FbmdpbmVQcm9wZXJ0eS5qYXZh)
 | | | |
   | 
[...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZU1lcmdlT25SZWFkUkRELnNjYWxh)
 | | | |
   | 
[...he/hudi/common/util/HoodieRecordSizeEstimator.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvSG9vZGllUmVjb3JkU2l6ZUVzdGltYXRvci5qYXZh)
 | | | |
   | 
[...a/org/apache/hudi/streamer/OperationConverter.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zdHJlYW1lci9PcGVyYXRpb25Db252ZXJ0ZXIuamF2YQ==)
 | | | |
   | 
[...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh)
 | | | |
   | ... and [404 
more](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree-more) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

2021-04-02 Thread GitBox


nsivabalan commented on issue #2338:
URL: https://github.com/apache/hudi/issues/2338#issuecomment-812713376


   Closing due to inactivity. but feel free to reopen to create a new ticket. 
would be happy to assist you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

2021-04-02 Thread GitBox


nsivabalan closed issue #2338:
URL: https://github.com/apache/hudi/issues/2338


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?

2021-04-02 Thread GitBox


nsivabalan commented on issue #2284:
URL: https://github.com/apache/hudi/issues/2284#issuecomment-812713158


   CC @n3nash 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2586: [SUPPORT] - How to guarantee snapshot isolation when reading Hudi tables in S3?

2021-04-02 Thread GitBox


nsivabalan commented on issue #2586:
URL: https://github.com/apache/hudi/issues/2586#issuecomment-812712095


   Closing this for now. please feel free to reopen or open a new ticket. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #2586: [SUPPORT] - How to guarantee snapshot isolation when reading Hudi tables in S3?

2021-04-02 Thread GitBox


nsivabalan closed issue #2586:
URL: https://github.com/apache/hudi/issues/2586


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-04-02 Thread GitBox


nsivabalan closed issue #2675:
URL: https://github.com/apache/hudi/issues/2675


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2675: [SUPPORT] Unable to query MOR table after schema evolution

2021-04-02 Thread GitBox


nsivabalan commented on issue #2675:
URL: https://github.com/apache/hudi/issues/2675#issuecomment-812710497


   Closing this as we have a tracking jira. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-79) how to query hoodie tables with 'Hive on Spark' engine?

2021-04-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-79:

Labels: sev:normal user-support-issues  (was: sev:critical 
user-support-issues)

> how to query hoodie tables with 'Hive on Spark' engine?
> ---
>
> Key: HUDI-79
> URL: https://issues.apache.org/jira/browse/HUDI-79
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Hive Integration
>Reporter: t oo
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: sev:normal, user-support-issues
>
>  
> [https://cwiki.apache.org//confluence/display/Hive/Hive+on+Spark:+Getting+Started]
>  recommends not having any \**hive\*jar in the spark/jars folder. But when 
> running hive on spark exec engine with non-local spark.master and query a 
> hoodie external table i get errors about classnotfound 
> org.apache.hadoop.hive.ql.io. _parquet_._MapredParquetInputFormat_*
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)

2021-04-02 Thread GitBox


nsivabalan commented on issue #2756:
URL: https://github.com/apache/hudi/issues/2756#issuecomment-812703790


   CC @n3nash 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1036) HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit

2021-04-02 Thread Nishith Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314046#comment-17314046
 ] 

Nishith Agarwal commented on HUDI-1036:
---

[~shivnarayan] Thanks for the reminder, I will take a look at this.

> HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit
> ---
>
> Key: HUDI-1036
> URL: https://issues.apache.org/jira/browse/HUDI-1036
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.9.0
>Reporter: Bhavani Sudha
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Opening this Jira based on the GitHub issue reported here - 
> [https://github.com/apache/hudi/issues/1735] when hive.input.format = 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat it is not able to 
> create HoodieRealtimeFileSplit for querying _rt table. Please see the GitHub 
> issue more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1036) HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit

2021-04-02 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-1036:
--
Labels: sev:normal user-support-issues  (was: sev:critical 
user-support-issues)

> HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit
> ---
>
> Key: HUDI-1036
> URL: https://issues.apache.org/jira/browse/HUDI-1036
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.9.0
>Reporter: Bhavani Sudha
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: sev:normal, user-support-issues
> Fix For: 0.9.0
>
>
> Opening this Jira based on the GitHub issue reported here - 
> [https://github.com/apache/hudi/issues/1735] when hive.input.format = 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat it is not able to 
> create HoodieRealtimeFileSplit for querying _rt table. Please see the GitHub 
> issue more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] jkdll opened a new issue #2760: [SUPPORT] Possibly Incorrect Documentation

2021-04-02 Thread GitBox


jkdll opened a new issue #2760:
URL: https://github.com/apache/hudi/issues/2760


   Hi,
   I am using the HudiWriteClient library and have been following the 
documentation at [this 
link](https://hudi.apache.org/docs/configurations.html#writeclient-configs) to 
instantiate the HoodieWriteConfig object.
   
   The documentation indicates that the WriteConfig can define the 
[withAssumeDatePartitioning](https://hudi.apache.org/docs/configurations.html#withAssumeDatePartitioning)
 attribute. However upon further inspection, it turns out that this attribute 
is not present [in the HoodieWriteConfig 
class.](https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java).
 Instead this field looks to be present in 
[HoodieMetaDataConifg](https://github.com/apache/hudi/blob/03668dbaf1a60428d7e0d68c6622605e0809150a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java).
   
   Similarly 
[withConsistencyCheckEnabled](https://hudi.apache.org/docs/configurations.html#withConsistencyCheckEnabled)
 is also not in the HoodieWriteConfig class and its 
[usage](https://github.com/apache/hudi/blob/03668dbaf1a60428d7e0d68c6622605e0809150a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieClientTestBase.java#L146)
 is not properly documented.
   
   
   Versions from Pom.xml:
   `
   
   
   org.apache.hudi
   hudi-java-client
   0.7.0
   
   
   
   org.apache.hudi
   hudi-client-common
   0.7.0
   
   
   
   org.apache.hudi
   hudi-client
   0.7.0
   pom
   
   `
   
   Thus I suggest that the documentation is updated.
   
   Thanks,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific partition

2021-04-02 Thread GitBox


codecov-io edited a comment on pull request #2452:
URL: https://github.com/apache/hudi/pull/2452#issuecomment-761259726


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2452?src=pr=h1) Report
   > Merging 
[#2452](https://codecov.io/gh/apache/hudi/pull/2452?src=pr=desc) (8052abf) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e970e1f48302aec3af7eeca009a2c793757cd501?el=desc)
 (e970e1f) will **decrease** coverage by `0.16%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2452/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2452?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2452  +/-   ##
   
   - Coverage 52.32%   52.16%   -0.17% 
 Complexity 3689 3689  
   
 Files   483  484   +1 
 Lines 2309523159  +64 
 Branches   2460 2466   +6 
   
   - Hits  1208412080   -4 
   - Misses 994210010  +68 
 Partials   1069 1069  
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.78% <ø> (-0.05%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `56.71% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `71.33% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `67.57% <0.00%> (-2.12%)` | `0.00 <0.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2452?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/hudi/utilities/HoodiePartitionCleaner.java](https://codecov.io/gh/apache/hudi/pull/2452/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVBhcnRpdGlvbkNsZWFuZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2452/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==)
 | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2452/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2452/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.42% <0.00%> (+0.34%)` | `56.00% <0.00%> (+1.00%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific partition

2021-04-02 Thread GitBox


codecov-io edited a comment on pull request #2452:
URL: https://github.com/apache/hudi/pull/2452#issuecomment-761259726






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong edited a comment on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-02 Thread GitBox


ssdong edited a comment on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-811619487


   @jsbali To give out extra insights and details, as @zherenyu831 has posted 
in the beginning:
   ```
   [20210323080718__replacecommit__COMPLETED]: size : 0
   [20210323081449__replacecommit__COMPLETED]: size : 1
   [20210323082046__replacecommit__COMPLETED]: size : 1
   [20210323082758__replacecommit__COMPLETED]: size : 1
   [20210323084004__replacecommit__COMPLETED]: size : 1
   [20210323085044__replacecommit__COMPLETED]: size : 1
   [20210323085823__replacecommit__COMPLETED]: size : 1
   [20210323090550__replacecommit__COMPLETED]: size : 1
   [20210323091700__replacecommit__COMPLETED]: size : 1
   ```
   If we keep everything the same and let archive logic handling everything, it 
would fail at 0 `partitionToReplaceFileIds` against 
`20210323080718__replacecommit__COMPLETED`(the first item in the list above), 
and this is a known issue. 
   
   To make the archive work, we tried to _manually_ delete the first _empty_ 
commit file, which is `20210323080718__replacecommit__COMPLETED`(the first item 
in the list above). This has succeeded the archive, but instead, it has failed 
upon `User class threw exception: org.apache.hudi.exception.HoodieIOException: 
Could not read commit details from 
s3://xxx/data/.hoodie/20210323081449.replacecommit`(the second item in the list 
above)
   
   Now to reason through the underlying mechanism of this error, given the 
archive was successful, that means a few commit files have been placed within 
the `.archive` folder, let's say 
   ```
   [20210323081449__replacecommit__COMPLETED]: size : 1
   [20210323082046__replacecommit__COMPLETED]: size : 1
   [20210323082758__replacecommit__COMPLETED]: size : 1
   [20210323084004__replacecommit__COMPLETED]: size : 1
   [20210323085044__replacecommit__COMPLETED]: size : 1
   ```
   have been successfully moved and placed in `.archive`. At this moment, the 
timeline has been updated and there are 3 remaining commit files which are:
   ```
   [20210323085823__replacecommit__COMPLETED]: size : 1
   [20210323090550__replacecommit__COMPLETED]: size : 1
   [20210323091700__replacecommit__COMPLETED]: size : 1
   ```
   
   Now, if you pay attention to the stack trace which caused `User class threw 
exception: org.apache.hudi.exception.HoodieIOException: Could not read commit 
details from s3://xxx/data/.hoodie/20210323081449.replacecommit`, and I am just 
pasting them again:
   ```
   User class threw exception: org.apache.hudi.exception.HoodieIOException: 
Could not read commit details from 
s3://xxx/data/.hoodie/20210323081449.replacecommit
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530)
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
   at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
   at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
   at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
   at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
   at 
java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
   at 
org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
   at 
org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179)
   at 
org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112)
   ```
   
   After a `close` action being triggered on `TimelineService`, which is 
understandable, it propagates to `HoodieTableFileSystemView.close` and there is:
   ```
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
   at 

[jira] [Commented] (HUDI-1453) Throw Exception when input data schema is not equal to the hoodie table schema

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313858#comment-17313858
 ] 

sivabalan narayanan commented on HUDI-1453:
---

double to int is not backwards compatible schema evolution. if schema 
compatibility check is enabled, it will fail. 

 

```

 

{{scala> dfFromData5.write.format("hudi").
 |   options(getQuickstartWriteConfigs).
 |   option(PRECOMBINE_FIELD_OPT_KEY, "preComb").
 |   option(RECORDKEY_FIELD_OPT_KEY, "rowId").
 |   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionId").
 |   option("hoodie.index.type","SIMPLE").
 |   option(TABLE_NAME, tableName).
 |   option("hoodie.avro.schema.validate","true").
 | mode(Append).
 |   save(basePath)
org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema 
compatibility check.
  at 
org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:629)
  at 
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:152)
  at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214)
  at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:186)
  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
  at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
  at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
  at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
  at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
  at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
  at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677)
  at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230)
  ... 72 elided
Caused by: org.apache.hudi.exception.HoodieException: Failed schema 
compatibility check for writerSchema 
:\{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"rowId","type":["string","null"]},\{"name":"partitionId","type":["string","null"]},\{"name":"preComb","type":["long","null"]},\{"name":"name","type":["string","null"]},\{"name":"versionId","type":["string","null"]},\{"name":"doubleToInt","type":["int","null"]}]},
 table schema 
:\{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"rowId","type":["string","null"]},\{"name":"partitionId","type":["string","null"]},\{"name":"preComb","type":["long","null"]},\{"name":"name","type":["string","null"]},\{"name":"versionId","type":["string","null"]},\{"name":"doubleToInt","type":["double","null"]}]},
 base path :file:/tmp/hudi_trips_cow
  at 

[jira] [Resolved] (HUDI-1453) Throw Exception when input data schema is not equal to the hoodie table schema

2021-04-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1453.
---
Resolution: Invalid

> Throw Exception when input data schema is not equal to the hoodie table schema
> --
>
> Key: HUDI-1453
> URL: https://issues.apache.org/jira/browse/HUDI-1453
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> The hoodie table *h0's* schema is :
> {code:java}
> (id long, price double){code}
> when I write the *dataframe* to *h0* with the follow schema:
> {code:java}
> (id long, price int){code}
> An Exception is threw as follow:
> {code:java}
> at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136) at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>  at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:102)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4 
> moreCaused by: java.lang.UnsupportedOperationException: 
> org.apache.parquet.avro.AvroConverters$FieldIntegerConverter at 
> org.apache.parquet.io.api.PrimitiveConverter.addDouble(PrimitiveConverter.java:84)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl$2$2.writeValue(ColumnReaderImpl.java:228)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
>  ... 11 more
> {code}
> I have enable the *AVRO_SCHEMA_VALIDATE,* it    *can pass the schema validate 
> in HoodieTable#validateUpsertSchema,* so it is right to write the "int" data 
> to  the "double"  field in hoodie.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1453) Throw Exception when input data schema is not equal to the hoodie table schema

2021-04-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1453:
--
Status: Open  (was: New)

> Throw Exception when input data schema is not equal to the hoodie table schema
> --
>
> Key: HUDI-1453
> URL: https://issues.apache.org/jira/browse/HUDI-1453
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available, sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> The hoodie table *h0's* schema is :
> {code:java}
> (id long, price double){code}
> when I write the *dataframe* to *h0* with the follow schema:
> {code:java}
> (id long, price int){code}
> An Exception is threw as follow:
> {code:java}
> at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136) at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>  at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:102)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4 
> moreCaused by: java.lang.UnsupportedOperationException: 
> org.apache.parquet.avro.AvroConverters$FieldIntegerConverter at 
> org.apache.parquet.io.api.PrimitiveConverter.addDouble(PrimitiveConverter.java:84)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl$2$2.writeValue(ColumnReaderImpl.java:228)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:367)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
>  ... 11 more
> {code}
> I have enable the *AVRO_SCHEMA_VALIDATE,* it    *can pass the schema validate 
> in HoodieTable#validateUpsertSchema,* so it is right to write the "int" data 
> to  the "double"  field in hoodie.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] aditiwari01 edited a comment on issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)

2021-04-02 Thread GitBox


aditiwari01 edited a comment on issue #2756:
URL: https://github.com/apache/hudi/issues/2756#issuecomment-812516535


   I think I couldn't explain myself. I am using DefaultHoodieRecordPayload 
only. I have attached sample command regardinng same.
   
   The issue is not with "combineAndGetUpdateValue", rather with "preCombine".
As per my uderstanding, combineAndGetUpdateValue is used to merge record 
from parquet and in memory record, whereas preCombine is used to dedupe 
multiple records in memory with same key. The preCombine function uses 
orderingVal field to sort and while creating record from log file we do not set 
this ordering field. And hence the issue. 
   
   The constructors are as foolows:
   
   1. DefaultHoodieRecordPayload(Option record) {this(recordl, 
0);}
   2. DefaultHoodieRecordPayload(GenericRecord record, Comparable orderingVal) 
{super(record, orderingVal)}
   
   In the read path we only call the 1st constructor and hence lose the 
ordering value.
   
   Also, if we compact after each commit we dont see this issue since 
"combineAndGetUpdateValue" works absolutely fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] aditiwari01 commented on issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)

2021-04-02 Thread GitBox


aditiwari01 commented on issue #2756:
URL: https://github.com/apache/hudi/issues/2756#issuecomment-812516535


   I think I couldn't explain myself. I am using DefaultHoodieRecordPayload 
only. I am attached sample command regardinng same.
   
   The issue is not with "combineAndGetUpdateValue", rather with "preCombine".
As per my uderstanding, combineAndGetUpdateValue is used to merge record 
from parquet and in memory record, whereas preCombine is used to dedupe 
multiple records in memory with same key. The preCombine function uses 
orderingVal field to sort and while creating record from log file we do not set 
this ordering field. And hence the issue. 
   
   The constructors are as foolows:
   
   1. DefaultHoodieRecordPayload(Option record) {this(recordl, 
0);}
   2. DefaultHoodieRecordPayload(GenericRecord record, Comparable orderingVal) 
{super(record, orderingVal)}
   
   In the read path we only call the 1st constructor and hence lose the 
ordering value.
   
   Also, if we compact after each commit we dont see this issue since 
"combineAndGetUpdateValue" works absolutely fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-04-02 Thread GitBox


nsivabalan commented on pull request #2334:
URL: https://github.com/apache/hudi/pull/2334#issuecomment-812516130


   Yeah, I did verify by enabling schema compatability check. it will fail if 
we try to evolve a field from double to int.
   
   ```
   scala> dfFromData5.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD_OPT_KEY, "preComb").
|   option(RECORDKEY_FIELD_OPT_KEY, "rowId").
|   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionId").
|   option("hoodie.index.type","SIMPLE").
|   option(TABLE_NAME, tableName).
|   option("hoodie.avro.schema.validate","true").
| mode(Append).
|   save(basePath)
   org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema 
compatibility check.
 at 
org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:629)
 at 
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:152)
 at 
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214)
 at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:186)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
 at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677)
 at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
 at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230)
 ... 72 elided
   Caused by: org.apache.hudi.exception.HoodieException: Failed schema 
compatibility check for writerSchema 
:{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"rowId","type":["string","null"]},{"name":"partitionId","type":["string","null"]},{"name":"preComb","type":["long","null"]},{"name":"name","type":["string","null"]},{"name":"versionId","type":["string","null"]},{"name":"doubleToInt","type":["int","null"]}]},
 table schema 
:{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"_hoodie_co
 
mmit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"rowId","type":["string","null"]},{"name":"partitionId","type":["string","null"]},{"name":"preComb","type":["long","null"]},{"name":"name","type":["string","null"]},{"name":"versionId","type":["string","null"]},{"name":"doubleToInt","type":["double","null"]}]},
 base path :file:/tmp/hudi_trips_cow
 at 

[jira] [Commented] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313842#comment-17313842
 ] 

sivabalan narayanan commented on HUDI-1716:
---

related issue: HUDI-774

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: Aditya Tiwari
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> gist of the stack trace:
> Caused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedField at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) 
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) 
> at 
> org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
>  ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 
> 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
> org.apache.hudi.exception.HoodieException: Exception when reading log file  
> at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
> at 
> org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
> Steps to reproduce in spark shell:
>  # create MOR table w/ schema1. 
>  # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
> It took me 2 batch of updates to see a log file.
>  # create a new schema2 with one new additional field. ingest a batch with 
> schema2 that updates existing records. 
>  # read entire dataset. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-774) Spark to Avro converter incorrectly generates optional fields

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313841#comment-17313841
 ] 

sivabalan narayanan commented on HUDI-774:
--

related issue : HUDI-1716

 

> Spark to Avro converter incorrectly generates optional fields
> -
>
> Key: HUDI-774
> URL: https://issues.apache.org/jira/browse/HUDI-774
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think https://issues.apache.org/jira/browse/SPARK-28008 is a good 
> descriptions of what is happening.
>  
> It can cause a situation when schema in the MOR log files is incompatible 
> with the schema produced by RowBasedSchemaProvider, so compactions will stall.
>  
> I have a fix which is a bit hacky -> postprocess schema produced by the 
> converter and
> 1) Make sure unions with null types have those null types at position 0
> 2) They have default values set to null
> I couldn't find a way to do a clean fix as some classes that are problematic 
> are from Hive and called from Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-874) Schema evolution does not work with AWS Glue catalog

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313840#comment-17313840
 ] 

sivabalan narayanan edited comment on HUDI-874 at 4/2/21, 12:10 PM:


[~uditme]: is someone from AWS looking into this. Can you give us an update.


was (Author: shivnarayan):
[~uditme]: is someone from AWS looking into this. 

> Schema evolution does not work with AWS Glue catalog
> 
>
> Key: HUDI-874
> URL: https://issues.apache.org/jira/browse/HUDI-874
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: Udit Mehrotra
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> This issue has been discussed here 
> [https://github.com/apache/incubator-hudi/issues/1581] and at other places as 
> well. Glue catalog currently does not support *cascade* for *ALTER TABLE* 
> statements. As a result features like adding new columns to an existing table 
> does now work with glue catalog .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-874) Schema evolution does not work with AWS Glue catalog

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313840#comment-17313840
 ] 

sivabalan narayanan commented on HUDI-874:
--

[~uditme]: is someone from AWS looking into this. 

> Schema evolution does not work with AWS Glue catalog
> 
>
> Key: HUDI-874
> URL: https://issues.apache.org/jira/browse/HUDI-874
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: Udit Mehrotra
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> This issue has been discussed here 
> [https://github.com/apache/incubator-hudi/issues/1581] and at other places as 
> well. Glue catalog currently does not support *cascade* for *ALTER TABLE* 
> statements. As a result features like adding new columns to an existing table 
> does now work with glue catalog .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1036) HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313837#comment-17313837
 ] 

sivabalan narayanan commented on HUDI-1036:
---

[~nishith29]: this has been lying around for some time. do fix the sev labels 
as appropriate. 

> HoodieCombineHiveInputFormat not picking up HoodieRealtimeFileSplit
> ---
>
> Key: HUDI-1036
> URL: https://issues.apache.org/jira/browse/HUDI-1036
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.9.0
>Reporter: Bhavani Sudha
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Opening this Jira based on the GitHub issue reported here - 
> [https://github.com/apache/hudi/issues/1735] when hive.input.format = 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat it is not able to 
> create HoodieRealtimeFileSplit for querying _rt table. Please see the GitHub 
> issue more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] li36909 commented on a change in pull request #2754: [HUDI-1751] DeltaStreamer print many unnecessary warn log

2021-04-02 Thread GitBox


li36909 commented on a change in pull request #2754:
URL: https://github.com/apache/hudi/pull/2754#discussion_r606207765



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
##
@@ -173,9 +173,11 @@ public KafkaOffsetGen(TypedProperties props) {
 this.props = props;
 
 kafkaParams = new HashMap<>();
-for (Object prop : props.keySet()) {
+props.keySet().stream().filter(prop -> {

Review comment:
   BTW, I find a UT fail cause by concurrent write to a hudi table, I will 
try to analyze it later




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1063) Save in Google Cloud Storage not working

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313836#comment-17313836
 ] 

sivabalan narayanan commented on HUDI-1063:
---

[~WaterKnight]: Were you able to resolve your issue. If disabling embedded 
timeline server work for you. feel free to close out the ticket if not 
required. if not, would appreciate w/ any updates. 

> Save in Google Cloud Storage not working
> 
>
> Key: HUDI-1063
> URL: https://issues.apache.org/jira/browse/HUDI-1063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: David Lacalle Castillo
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> I added to spark submit the following properties: 
> {{--packages 
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
>  \  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}
> Spark version 2.4.5 and Hadoop version 3.2.1
>  
> I am trying to save a Dataframe as follows in Google Cloud Storage as follows:
> tableName = "forecasts"
> basePath = "gs://hudi-datalake/" + tableName
> hudi_options = {
>  'hoodie.table.name': tableName,
>  'hoodie.datasource.write.recordkey.field': 'uuid',
>  'hoodie.datasource.write.partitionpath.field': 'partitionpath',
>  'hoodie.datasource.write.table.name': tableName,
>  'hoodie.datasource.write.operation': 'insert',
>  'hoodie.datasource.write.precombine.field': 'ts',
>  'hoodie.upsert.shuffle.parallelism': 2, 
>  'hoodie.insert.shuffle.parallelism': 2
> }
> results = results.selectExpr(
>  "ds as date",
>  "store",
>  "item",
>  "y as sales",
>  "yhat as sales_predicted",
>  "yhat_upper as sales_predicted_upper",
>  "yhat_lower as sales_predicted_lower",
>  "training_date")
> results.write.format("hudi"). \
>  options(**hudi_options). \
>  mode("overwrite"). \
>  save(basePath)
> I am getting the following error:
> Py4JJavaError: An error occurred while calling o312.save. : 
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at 
> io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
>  at io.javalin.Javalin.(Javalin.java:94) at 
> io.javalin.Javalin.create(Javalin.java:107) at 
> org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102)
>  at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83)
>  at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) 
> at 
> org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) 
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>  at 
> 

[jira] [Resolved] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null

2021-04-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1288.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> DeltaSync:writeToSink fails with Unknown datum type 
> org.apache.avro.JsonProperties$Null
> ---
>
> Key: HUDI-1288
> URL: https://issues.apache.org/jira/browse/HUDI-1288
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Michal Swiatowy
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.6.0
>
>
> After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into 
> following error message on write to HDFS:
> {code:java}
> 2020-09-18 12:54:38,651 [Driver] INFO  
> HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing 
> Table of type MERGE_ON_READ from 
> /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC
> 2020-09-18 12:54:38,663 [Driver] INFO  DeltaSync:setupWriteClient:470 - 
> Setting up Hoodie Write Client
> 2020-09-18 12:54:38,695 [Driver] INFO  DeltaSync:registerAvroSchemas:522 - 
> Registering Schema 
> 

[GitHub] [hudi] li36909 commented on a change in pull request #2754: [HUDI-1751] DeltaStreamer print many unnecessary warn log

2021-04-02 Thread GitBox


li36909 commented on a change in pull request #2754:
URL: https://github.com/apache/hudi/pull/2754#discussion_r606207354



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
##
@@ -173,9 +173,11 @@ public KafkaOffsetGen(TypedProperties props) {
 this.props = props;
 
 kafkaParams = new HashMap<>();
-for (Object prop : props.keySet()) {
+props.keySet().stream().filter(prop -> {

Review comment:
   how about change to this: "DeltaStream print many unnecessary warn log 
because of passing hoodie config to kafka consumer". the warn log is printed by 
kafkaconsumer. when hudi new the kafka consumer, hudi pass some non-kafka 
parameter to the kafka comsumer, then lead to these warn log, to solve this 
problem we just need to filter the hoodie config.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313835#comment-17313835
 ] 

sivabalan narayanan commented on HUDI-1288:
---

Closing out this Jira as we don't have any plans to back port fixes. 

> DeltaSync:writeToSink fails with Unknown datum type 
> org.apache.avro.JsonProperties$Null
> ---
>
> Key: HUDI-1288
> URL: https://issues.apache.org/jira/browse/HUDI-1288
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Michal Swiatowy
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into 
> following error message on write to HDFS:
> {code:java}
> 2020-09-18 12:54:38,651 [Driver] INFO  
> HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing 
> Table of type MERGE_ON_READ from 
> /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC
> 2020-09-18 12:54:38,663 [Driver] INFO  DeltaSync:setupWriteClient:470 - 
> Setting up Hoodie Write Client
> 2020-09-18 12:54:38,695 [Driver] INFO  DeltaSync:registerAvroSchemas:522 - 
> Registering Schema 
> 

[jira] [Updated] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null

2021-04-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1288:
--
Status: Open  (was: New)

> DeltaSync:writeToSink fails with Unknown datum type 
> org.apache.avro.JsonProperties$Null
> ---
>
> Key: HUDI-1288
> URL: https://issues.apache.org/jira/browse/HUDI-1288
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Michal Swiatowy
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into 
> following error message on write to HDFS:
> {code:java}
> 2020-09-18 12:54:38,651 [Driver] INFO  
> HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing 
> Table of type MERGE_ON_READ from 
> /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC
> 2020-09-18 12:54:38,663 [Driver] INFO  DeltaSync:setupWriteClient:470 - 
> Setting up Hoodie Write Client
> 2020-09-18 12:54:38,695 [Driver] INFO  DeltaSync:registerAvroSchemas:522 - 
> Registering Schema 
> 

[jira] [Commented] (HUDI-1528) hudi-sync-tools error

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313834#comment-17313834
 ] 

sivabalan narayanan commented on HUDI-1528:
---

[~Trevorzhang]: can you update the Jira on how you fixed the issue or what was 
the resolution. we can close it out if you don't have any more issues. 

> hudi-sync-tools error
> -
>
> Key: HUDI-1528
> URL: https://issues.apache.org/jira/browse/HUDI-1528
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Trevorzhang
>Assignee: Trevorzhang
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.9.0
>
>
> When using hudi-sync-tools to synchronize to a remote hive, hivemetastore 
> throw exceptions.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1528) hudi-sync-tools error

2021-04-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1528:
--
Labels: pull-request-available user-support-issues  (was: 
pull-request-available sev:critical user-support-issues)

> hudi-sync-tools error
> -
>
> Key: HUDI-1528
> URL: https://issues.apache.org/jira/browse/HUDI-1528
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Trevorzhang
>Assignee: Trevorzhang
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.9.0
>
>
> When using hudi-sync-tools to synchronize to a remote hive, hivemetastore 
> throw exceptions.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1528) hudi-sync-tools error

2021-04-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1528:
--
Labels: user-support-issues  (was: pull-request-available 
user-support-issues)

> hudi-sync-tools error
> -
>
> Key: HUDI-1528
> URL: https://issues.apache.org/jira/browse/HUDI-1528
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Trevorzhang
>Assignee: Trevorzhang
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.9.0
>
>
> When using hudi-sync-tools to synchronize to a remote hive, hivemetastore 
> throw exceptions.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1751) DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer

2021-04-02 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1751:
--
Summary: DeltaStream print many unnecessary warn log because of passing 
hoodie config to kafka consumer  (was: DeltaStream print many unnecessary warn 
log)

> DeltaStream print many unnecessary warn log because of passing hoodie config 
> to kafka consumer
> --
>
> Key: HUDI-1751
> URL: https://issues.apache.org/jira/browse/HUDI-1751
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Because we add both kafka parameters and hudi configs at the same properties 
> file, such as kafka-source.properties, then when creating kafkaParams obj 
> will add some hoodie config also, which lead to the warn log printing:
> !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan edited a comment on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-04-02 Thread GitBox


nsivabalan edited a comment on pull request #2449:
URL: https://github.com/apache/hudi/pull/2449#issuecomment-779341027


   @Trevor-zhang : sorry, I didn't suggest to close this out. I am also getting 
conversant w/ hive sync in general. So, was trying to clarify few things. 
   
   if I am not wrong, metastore flow(non JDBC flow)  is already supported. Just 
that config value for "hive.metastore.uris" is taken from Hadoop configuration 
and there is no direct way to pass in as as argument. If you intention is to 
add an argument to make it convenient, then this patch is the right approach. 
If not, lets discuss on what exactly are we trying to achieve here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1652) DiskBasedMap:As time goes by, the number of /temp/***** file handles held by the executor process is increasing

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313832#comment-17313832
 ] 

sivabalan narayanan commented on HUDI-1652:
---

[~hainanzhongjian]: can we close the Jira then since its already fixed in 
hudi-0.7? 

> DiskBasedMap:As time goes by, the number of /temp/* file handles held by 
> the executor process is increasing
> ---
>
> Key: HUDI-1652
> URL: https://issues.apache.org/jira/browse/HUDI-1652
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Affects Versions: 0.6.0
>Reporter: wangmeng
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> We encountered a problem in the hudi production environment, which is very 
> similar to the HUDI-945 problem.
>  *Software environment:* spark 2.4.5, hudi 0.6
>  *Scenario:* consume Kafka data and write hudi, using spark streaming 
> (non-StructedStreaming).
>  *Problem:* As time goes by, the number of /temp/* file handles held by 
> the executor process is increasing.
> "
> /tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0
>  /tmp/49251680-0efd-4cc4-a55e-1af2038d3900
>  /tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9
> "
>  *Reason analysis:* ExternalSpillableMap is used in HoodieMergeHandle class, 
> and DiskBasedMap is used to flush overflowed data to the disk. But the file 
> stream can only be closed and deleted by the hook when the jvm exits. When 
> the clear method is executed in the program, the stream is not closed and the 
> file is not deleted. As a result, over time, more and more file handles are 
> still held, leading to errors. This error is similar to Hudi-945.
>  
> *软件环境:*spark 2.4.5、hudi 0.6 
> *场景:*消费kafka数据写入hudi,采用spark streaming(非StructedStreaming)。
>  *问题:executor 进程随着时间的推移,所持有的/temp/*文件句柄数越来越多。
> "
> /tmp/10ded0f7-1bcc-4316-91e9-9b4d0507e1e0
>  /tmp/49251680-0efd-4cc4-a55e-1af2038d3900
>  /tmp/cc7dd284-3444-4c17-a5c8-84b3090c17f9
> "
> *原因分析:*HoodieMergeHandle类中采用ExternalSpillableMap,使用DiskBasedMap将溢出的数据刷新到磁盘上。但是文件流只有在jvm退出的时候通过钩子关闭且删除文件。程序中执行clear方法时,并不关闭流及删除文件。从而导致随着时间推移,越来越多的文件句柄还持有,导致报错。此错误和Hudi-945挺相似的。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1723) DFSPathSelector skips files with the same modify date when read up to source limit

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313828#comment-17313828
 ] 

sivabalan narayanan commented on HUDI-1723:
---

[~xushiyan]: I don't have much exp on the query side, so some noob questions. 

Whats the granularity of the modification time? If its millisecs, you mean to 
say that we will have lot of files w/ exactly same modification time at ms 
granularity? 

Did you see this happen in prod env or just theorically speaking. 

I understand the problem, just trying to gauge the severity and probability of 
it occurring. 

> DFSPathSelector skips files with the same modify date when read up to source 
> limit
> --
>
> Key: HUDI-1723
> URL: https://issues.apache.org/jira/browse/HUDI-1723
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
> Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png
>
>
> org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles 
> filters the input files based on last saved checkpoint, which was the 
> modification date from last read file. However, the last read file's 
> modification date could be duplicated for multiple files and resulted in 
> skipping a few of them when reading up to source limit. An illustration is 
> shown in the attached picture.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] li36909 edited a comment on pull request #2752: [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail

2021-04-02 Thread GitBox


li36909 edited a comment on pull request #2752:
URL: https://github.com/apache/hudi/pull/2752#issuecomment-812496806


   just run any rollback/compaction command, and make it fail by injection 
fault, then the command will hang.
   For example, currently hudi only support rollback to the latest commit, we 
can make a rollback fail by rollback to a older commit, and observation the 
hang problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] li36909 edited a comment on pull request #2752: [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail

2021-04-02 Thread GitBox


li36909 edited a comment on pull request #2752:
URL: https://github.com/apache/hudi/pull/2752#issuecomment-812496806


   just run any rollback/compaction command, and make it fail by injection 
fault, then the command will hang.
   For example, currently hudi only support the latest commit, we can make a 
rollback fail by rollback to a older commit, and observation the hang problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-1734) Hive sync script (run_sync_tool.sh) fails w/ ClassNotFoundError : org/apache/log4j/LogManager

2021-04-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1734.
---
Fix Version/s: 0.8.0
   Resolution: Invalid

> Hive sync script (run_sync_tool.sh) fails w/ ClassNotFoundError : 
> org/apache/log4j/LogManager
> -
>
> Key: HUDI-1734
> URL: https://issues.apache.org/jira/browse/HUDI-1734
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.8.0
>
>
> ./run_sync_tool_sh --jdbc-url jdbc:hive://dxbigdata102:1000 \ --user appuser 
> \ --pass '' \ --base-path 
> 'hdfs://dxbigdata101:8020/user/hudi/test/data/hudi_trips_cow' \ --database 
> test \ --table hudi_trips_cow
>  
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/log4j/LogManager
> at org.apache.hudi.hive.HiveSyncTool.(HiveSyncTool.java:55)
> Caused by: java.lang.ClassNotFoundException: org.apache.log4j.LogManager
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 1 more
>  
> https://github.com/apache/hudi/issues/2728



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on issue #2728: [SUPPORT]Hive sync error by using run_sync_tool.sh

2021-04-02 Thread GitBox


nsivabalan commented on issue #2728:
URL: https://github.com/apache/hudi/issues/2728#issuecomment-812497980


   thanks for letting us know. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-1745) Hudi compilation fails w/ spark version < 2.4.4 due to usage of unavailable spark api

2021-04-02 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1745.
---
Resolution: Fixed

> Hudi compilation fails w/ spark version < 2.4.4 due to usage of unavailable 
> spark api
> -
>
> Key: HUDI-1745
> URL: https://issues.apache.org/jira/browse/HUDI-1745
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical
>
> [https://github.com/apache/hudi/issues/2748]
>  
> PR thats of interest: [https://github.com/apache/hudi/pull/2431]
>  
> I see we have three options. Let me know if we have more.
> Option1:
> Similar to SparkRowSerDe, we might have to introduce an interface for 
> translateSqlOptions and override based on spark versions. But already we have 
> two sub modules for spark2 and spark3. and now we might have to add more such 
> modules for diff spark2 versions which might need more thought to do it 
> elegantly.
> Option2:
> Since this feature is added only w/ 0.8.0, and not like a more sought after 
> feature, we could revert this commit and unblock ourselves for 0.8.0. Once 
> release is complete, we can decide how to do about doing this and get this 
> feature in for next release.
> Option3:
> we say that hudi does not support spark version < 2.4.4 w/ 0.8.0. Don't think 
> we can go this route. But just listing it out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1745) Hudi compilation fails w/ spark version < 2.4.4 due to usage of unavailable spark api

2021-04-02 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313827#comment-17313827
 ] 

sivabalan narayanan commented on HUDI-1745:
---

We have always been testing w/ spark 2.4.4 and at uber, its 2.43. And so, we 
will just fix the documentation that min spark version supported is 2.4.3. 

> Hudi compilation fails w/ spark version < 2.4.4 due to usage of unavailable 
> spark api
> -
>
> Key: HUDI-1745
> URL: https://issues.apache.org/jira/browse/HUDI-1745
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical
>
> [https://github.com/apache/hudi/issues/2748]
>  
> PR thats of interest: [https://github.com/apache/hudi/pull/2431]
>  
> I see we have three options. Let me know if we have more.
> Option1:
> Similar to SparkRowSerDe, we might have to introduce an interface for 
> translateSqlOptions and override based on spark versions. But already we have 
> two sub modules for spark2 and spark3. and now we might have to add more such 
> modules for diff spark2 versions which might need more thought to do it 
> elegantly.
> Option2:
> Since this feature is added only w/ 0.8.0, and not like a more sought after 
> feature, we could revert this commit and unblock ourselves for 0.8.0. Once 
> release is complete, we can decide how to do about doing this and get this 
> feature in for next release.
> Option3:
> we say that hudi does not support spark version < 2.4.4 w/ 0.8.0. Don't think 
> we can go this route. But just listing it out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)

2021-04-02 Thread GitBox


nsivabalan commented on issue #2756:
URL: https://github.com/apache/hudi/issues/2756#issuecomment-812496813


   yes, this is expected. if you are using OverwriteWithLatestAvroPayload as 
your payload class, combineAndGetUpdateValue does not honor ordering value. And 
so we added another payload for this purpose. DefaultHoodieRecordPayload. While 
using this, ensure you set "hoodie.payload.ordering.field" accordingly. Let me 
know how it goes. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] li36909 commented on pull request #2752: [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail

2021-04-02 Thread GitBox


li36909 commented on pull request #2752:
URL: https://github.com/apache/hudi/pull/2752#issuecomment-812496806


   just run any rollback/compaction command, and make it fail by injection 
fault, then the command will hang


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-04-02 Thread GitBox


pengzhiwei2018 commented on pull request #2334:
URL: https://github.com/apache/hudi/pull/2334#issuecomment-812496550


   > https://gist.github.com/nsivabalan/91f12109e0fe1ca9749ff5290c946778
   
   Hi @nsivabalan , I have take a review for your test code. First you write a 
"int" to the table, so the table schema type is "int". Then you write a 
"double" to the table, so the table schema become to "double". The table schema 
changed from "int" to "double". I think this is more reasonable.
   
   In my original idea, I think the first write schema(e.g. "int") is the table 
schema forever. The incoming records after that should be compatible with the 
origin table schema(e.g. "int"). This is this PR wants to solve. I can 
understand more clearly now. The table schema should be change to a more 
generic type (e.g. from "int" to "double"), but not always be the first write 
schema.
   So I can close this PR now. Thanks @nsivabalan for correct me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 closed pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-04-02 Thread GitBox


pengzhiwei2018 closed pull request #2334:
URL: https://github.com/apache/hudi/pull/2334


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1591) Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning

2021-04-02 Thread pengzhiwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengzhiwei updated HUDI-1591:
-
Summary: Implement Spark's FileIndex for Hudi to support queries via Hudi 
DataSource using non-globbed table path and partition pruning  (was: Improve 
Hoodie Table Query Performance And Ease Of Use For Spark)

> Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource 
> using non-globbed table path and partition pruning
> --
>
> Key: HUDI-1591
> URL: https://issues.apache.org/jira/browse/HUDI-1591
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Affects Versions: 0.9.0
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> We have found same  problems on query hoodie table on spark:
> 1、Users must  specify "*" to tell the partition level to spark for the query.
> 2、Cannot support partition prune for COW table.
> This issue wants to achieve the following goals:
> 1、Support No Stars query for hoodie table.
> 2、Support partition prune for COW table.
> Refer to the documentation for more details about this: [Optimization For 
> Hudi COW 
> Query|https://docs.google.com/document/d/1qG014M3VZg3lMswsZv7cYB9Tb0vz8yXgqvlI_Jlnnsc/edit#heading=h.k6ro6dhgwh8y]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hddong commented on pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand

2021-04-02 Thread GitBox


hddong commented on pull request #2325:
URL: https://github.com/apache/hudi/pull/2325#issuecomment-812460441


   @yanghua @wangxianghu: had address them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hddong commented on a change in pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand

2021-04-02 Thread GitBox


hddong commented on a change in pull request #2325:
URL: https://github.com/apache/hudi/pull/2325#discussion_r606166490



##
File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/CompactionCommand.java
##
@@ -175,25 +174,26 @@ public String compactionShowArchived(
 HoodieArchivedTimeline archivedTimeline = client.getArchivedTimeline();
 HoodieInstant instant = new HoodieInstant(HoodieInstant.State.COMPLETED,
 HoodieTimeline.COMPACTION_ACTION, compactionInstantTime);
-String startTs = CommitUtil.addHours(compactionInstantTime, -1);
-String endTs = CommitUtil.addHours(compactionInstantTime, 1);

Review comment:
   > if we want to load a `ts` equals `compactionInstantTime`, can we add a 
new method that takes only one `instantTime` as input param? WDYT
   
   Yes, had add it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-02 Thread GitBox


liujinhui1994 commented on a change in pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#discussion_r606123928



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CommitUtils.java
##
@@ -59,14 +61,24 @@ public static HoodieCommitMetadata 
buildMetadata(List writeStat
Option> 
extraMetadata,
WriteOperationType 
operationType,
String 
schemaToStoreInCommit,
-   String commitActionType) {
+   String commitActionType,
+   Boolean updatePartialFields,
+   HoodieTableMetaClient 
metaClient) {
 
 HoodieCommitMetadata commitMetadata = buildMetadataFromStats(writeStats, 
partitionToReplaceFileIds, commitActionType, operationType);
 
 // add in extra metadata
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(commitMetadata::addMetadata);
 }
+if (updatePartialFields) {
+  try {

Review comment:
   good,i will delete it
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ssdong edited a comment on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-02 Thread GitBox


ssdong edited a comment on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-811619487


   @jsbali To give out extra insights and details, as @zherenyu831 has posted 
in the beginning:
   ```
   [20210323080718__replacecommit__COMPLETED]: size : 0
   [20210323081449__replacecommit__COMPLETED]: size : 1
   [20210323082046__replacecommit__COMPLETED]: size : 1
   [20210323082758__replacecommit__COMPLETED]: size : 1
   [20210323084004__replacecommit__COMPLETED]: size : 1
   [20210323085044__replacecommit__COMPLETED]: size : 1
   [20210323085823__replacecommit__COMPLETED]: size : 1
   [20210323090550__replacecommit__COMPLETED]: size : 1
   [20210323091700__replacecommit__COMPLETED]: size : 1
   ```
   If we keep everything the same and let archive logic handling everything, it 
would fail at 0 `partitionToReplaceFileIds` against 
`20210323080718__replacecommit__COMPLETED`(the first item in the list above), 
and this is a known issue. 
   
   To make the archive work, we tried to _manually_ delete the first _empty_ 
commit file, which is `20210323080718__replacecommit__COMPLETED`(the first item 
in the list above). This has succeeded the archive, but instead, it has failed 
upon `User class threw exception: org.apache.hudi.exception.HoodieIOException: 
Could not read commit details from 
s3://xxx/data/.hoodie/20210323081449.replacecommit`(the second item in the list 
above)
   
   Now to reason through the underlying mechanism of this error, given the 
archive was successful, that means a few commit files have been placed within 
the `.archive` folder, let's say 
   ```
   [20210323081449__replacecommit__COMPLETED]: size : 1
   [20210323082046__replacecommit__COMPLETED]: size : 1
   [20210323082758__replacecommit__COMPLETED]: size : 1
   [20210323084004__replacecommit__COMPLETED]: size : 1
   [20210323085044__replacecommit__COMPLETED]: size : 1
   ```
   have been successfully moved and placed in `.archive`. At this moment, the 
timeline has been updated and there are 3 remaining commit files which are:
   ```
   [20210323085823__replacecommit__COMPLETED]: size : 1
   [20210323090550__replacecommit__COMPLETED]: size : 1
   [20210323091700__replacecommit__COMPLETED]: size : 1
   ```
   
   Now, if you pay attention to the stack trace which caused `User class threw 
exception: org.apache.hudi.exception.HoodieIOException: Could not read commit 
details from s3://xxx/data/.hoodie/20210323081449.replacecommit`, and I am just 
pasting them again:
   ```
   User class threw exception: org.apache.hudi.exception.HoodieIOException: 
Could not read commit details from 
s3://xxx/data/.hoodie/20210323081449.replacecommit
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530)
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
   at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
   at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
   at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
   at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
   at 
java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
   at 
org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
   at 
org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179)
   at 
org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112)
   ```
   
   After a `close` action being triggered on `TimelineService`, which is 
understandable, it propagates to `HoodieTableFileSystemView.close` and there is:
   ```
   at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
   at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
   at 

[hudi] branch release-0.8.0 updated: [MINOR] Update release version to reflect published version 0.8.0

2021-04-02 Thread garyli
This is an automated email from the ASF dual-hosted git repository.

garyli pushed a commit to branch release-0.8.0
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/release-0.8.0 by this push:
 new da65d3c  [MINOR] Update release version to reflect published version 
0.8.0
da65d3c is described below

commit da65d3cae99e8fee0ede9b5ed8630a3716d284c8
Author: garyli1019 
AuthorDate: Fri Apr 2 17:19:07 2021 +0800

[MINOR] Update release version to reflect published version 0.8.0
---
 docker/hoodie/hadoop/base/pom.xml   | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml   | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml  | 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml  | 2 +-
 docker/hoodie/hadoop/namenode/pom.xml   | 2 +-
 docker/hoodie/hadoop/pom.xml| 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml| 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml| 2 +-
 hudi-cli/pom.xml| 2 +-
 hudi-client/hudi-client-common/pom.xml  | 4 ++--
 hudi-client/hudi-flink-client/pom.xml   | 4 ++--
 hudi-client/hudi-java-client/pom.xml| 4 ++--
 hudi-client/hudi-spark-client/pom.xml   | 4 ++--
 hudi-client/pom.xml | 2 +-
 hudi-common/pom.xml | 2 +-
 hudi-examples/pom.xml   | 2 +-
 hudi-flink/pom.xml  | 2 +-
 hudi-hadoop-mr/pom.xml  | 2 +-
 hudi-integ-test/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++--
 hudi-spark-datasource/hudi-spark/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark2/pom.xml   | 4 ++--
 hudi-spark-datasource/hudi-spark3/pom.xml   | 4 ++--
 hudi-spark-datasource/pom.xml   | 2 +-
 hudi-sync/hudi-dla-sync/pom.xml | 2 +-
 hudi-sync/hudi-hive-sync/pom.xml| 2 +-
 hudi-sync/hudi-sync-common/pom.xml  | 2 +-
 hudi-sync/pom.xml   | 2 +-
 hudi-timeline-service/pom.xml   | 2 +-
 hudi-utilities/pom.xml  | 2 +-
 packaging/hudi-flink-bundle/pom.xml | 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml | 2 +-
 packaging/hudi-hive-sync-bundle/pom.xml | 2 +-
 packaging/hudi-integ-test-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml| 2 +-
 packaging/hudi-spark-bundle/pom.xml | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml   | 2 +-
 packaging/hudi-utilities-bundle/pom.xml | 2 +-
 pom.xml | 2 +-
 42 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 3e2bc48..85e84d0 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ b/docker/hoodie/hadoop/base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-rc1
+0.8.0
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/datanode/pom.xml 
b/docker/hoodie/hadoop/datanode/pom.xml
index 561d1a9..b57a19c 100644
--- a/docker/hoodie/hadoop/datanode/pom.xml
+++ b/docker/hoodie/hadoop/datanode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-rc1
+0.8.0
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/historyserver/pom.xml 
b/docker/hoodie/hadoop/historyserver/pom.xml
index b06a238..e04d446 100644
--- a/docker/hoodie/hadoop/historyserver/pom.xml
+++ b/docker/hoodie/hadoop/historyserver/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-rc1
+0.8.0
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/hive_base/pom.xml 
b/docker/hoodie/hadoop/hive_base/pom.xml
index c17c3da..3f85692 100644
--- a/docker/hoodie/hadoop/hive_base/pom.xml
+++ b/docker/hoodie/hadoop/hive_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-rc1
+0.8.0
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/namenode/pom.xml 
b/docker/hoodie/hadoop/namenode/pom.xml
index ab7251c..2806990 100644
--- a/docker/hoodie/hadoop/namenode/pom.xml
+++ b/docker/hoodie/hadoop/namenode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.8.0-rc1
+0.8.0
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml
index deff4ba..e8300ae 100644
--- a/docker/hoodie/hadoop/pom.xml
+++ b/docker/hoodie/hadoop/pom.xml
@@ -19,7 +19,7 @@
   
 hudi
 org.apache.hudi
-0.8.0-rc1
+0.8.0
 ../../../pom.xml
   
   4.0.0
diff --git a/docker/hoodie/hadoop/prestobase/pom.xml 
b/docker/hoodie/hadoop/prestobase/pom.xml
index 

[GitHub] [hudi] li36909 commented on pull request #2759: [HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false

2021-04-02 Thread GitBox


li36909 commented on pull request #2759:
URL: https://github.com/apache/hudi/pull/2759#issuecomment-812403045


   cc @nsivabalan could you help to take a look, thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] li36909 commented on pull request #2759: [HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false

2021-04-02 Thread GitBox


li36909 commented on pull request #2759:
URL: https://github.com/apache/hudi/pull/2759#issuecomment-812402879


   The retry issue is cause by: when close metaStoreClient, or sessionState, or 
hiveDriver, they will all call 'Hive.closeCurrent()', so both sessionState and 
hiveDriver should be a class member


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false

2021-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1759:
-
Labels: pull-request-available  (was: )

> Save one connection retry when hiveSyncTool run with useJdbc=false
> --
>
> Key: HUDI-1759
> URL: https://issues.apache.org/jira/browse/HUDI-1759
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-04-02-15-43-15-854.png, 
> image-2021-04-02-15-48-42-895.png
>
>
> when sync metadata to hive with useJdbc=false, there will have two problem:
> first: if hive server enable kerberos, and hudi  sync to hive with 
> useJdbc=false, then the metadata will miss owner, check the meta data here(I 
> test with hive 3.1.1):
> !image-2021-04-02-15-43-15-854.png!
> second: we can see there is a connection retry to hive metastore everytime 
> syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":
> !image-2021-04-02-15-48-42-895.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] li36909 opened a new pull request #2759: [HUDI-1759] Save one connection retry to hive metastore when hiveSyncTool run with useJdbc=false

2021-04-02 Thread GitBox


li36909 opened a new pull request #2759:
URL: https://github.com/apache/hudi/pull/2759


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   1) set owner when start hive session
   2) save one connection retry to hive metastore when sync to hive 
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   run the UT "TestHiveSyncTool.testBasicSync", and check the log. after this 
fix, it shoudn't have a connection retry exception
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false

2021-04-02 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1759:
--
Description: 
when sync metadata to hive with useJdbc=false, there will have two problem:

first: if hive server enable kerberos, and hudi  sync to hive with 
useJdbc=false, then the metadata will miss owner, check the meta data here(I 
test with hive 3.1.1):

!image-2021-04-02-15-43-15-854.png!

second: we can see there is a connection retry to hive metastore everytime 
syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":

!image-2021-04-02-15-48-42-895.png!

 

  was:
when sync metadata to hive with useJdbc=false, there will have two problem:

first: if hive server enable kerberos, and hudi  sync to hive with 
useJdbc=false, then the metadata will miss owner, check the meta data here(I 
test with hive 3.1.1):

!image-2021-04-02-15-43-15-854.png!

second: we can see there is a connection retry to hive metastore everytime 
syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":

 


> Save one connection retry when hiveSyncTool run with useJdbc=false
> --
>
> Key: HUDI-1759
> URL: https://issues.apache.org/jira/browse/HUDI-1759
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: image-2021-04-02-15-43-15-854.png, 
> image-2021-04-02-15-48-42-895.png
>
>
> when sync metadata to hive with useJdbc=false, there will have two problem:
> first: if hive server enable kerberos, and hudi  sync to hive with 
> useJdbc=false, then the metadata will miss owner, check the meta data here(I 
> test with hive 3.1.1):
> !image-2021-04-02-15-43-15-854.png!
> second: we can see there is a connection retry to hive metastore everytime 
> syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":
> !image-2021-04-02-15-48-42-895.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false

2021-04-02 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1759:
--
Attachment: image-2021-04-02-15-48-42-895.png

> Save one connection retry when hiveSyncTool run with useJdbc=false
> --
>
> Key: HUDI-1759
> URL: https://issues.apache.org/jira/browse/HUDI-1759
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: image-2021-04-02-15-43-15-854.png, 
> image-2021-04-02-15-48-42-895.png
>
>
> when sync metadata to hive with useJdbc=false, there will have two problem:
> first: if hive server enable kerberos, and hudi  sync to hive with 
> useJdbc=false, then the metadata will miss owner, check the meta data here(I 
> test with hive 3.1.1):
> !image-2021-04-02-15-43-15-854.png!
> second: we can see there is a connection retry to hive metastore everytime 
> syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false

2021-04-02 Thread lrz (Jira)
lrz created HUDI-1759:
-

 Summary: Save one connection retry when hiveSyncTool run with 
useJdbc=false
 Key: HUDI-1759
 URL: https://issues.apache.org/jira/browse/HUDI-1759
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: lrz
 Fix For: 0.9.0
 Attachments: image-2021-04-02-15-43-15-854.png

when sync metadata to hive with useJdbc=false, there will have two problem:

first: if hive server enable kerberos, and hudi  sync to hive with 
useJdbc=false, then the metadata will miss owner, check the meta data here(I 
test with hive 3.1.1):

!image-2021-04-02-15-43-15-854.png!

second: we can see there is a connection retry to hive metastore everytime 
syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync":

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-02 Thread GitBox


liujinhui1994 commented on a change in pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#discussion_r606117534



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdatePayload.java
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.util.List;
+
+public class PartialUpdatePayload extends OverwriteWithLatestAvroPayload {
+  public PartialUpdatePayload(GenericRecord record, Comparable orderingVal) {
+super(record, orderingVal);
+  }
+
+  public PartialUpdatePayload(Option record) {
+this(record.get(), (record1) -> 0); // natural order
+  }
+
+  @Override
+  public Option combineAndGetUpdateValue(IndexedRecord 
lastValue, Schema schema) throws IOException {

Review comment:
   > existing
   
   ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1657) build failed on AArch64, Fedora 33

2021-04-02 Thread shenjinxin (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313672#comment-17313672
 ] 

shenjinxin commented on HUDI-1657:
--

I also encounter the same problem. My Java is JDK 1.8_281

> build failed on AArch64, Fedora 33 
> ---
>
> Key: HUDI-1657
> URL: https://issues.apache.org/jira/browse/HUDI-1657
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lutz Weischer
>Priority: Major
>  Labels: sev:triage, user-support-issues
>
> [jw@cn05 hudi]$ mvn package -DskipTests
> [INFO] Scanning for projects...
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-java-client:jar:0.8.0-SNAPSHOT
> [WARNING] The expression ${parent.version} is deprecated. Please use 
> ${project.parent.version} instead.
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark-client:jar:0.8.0-SNAPSHOT
> [WARNING] The expression ${parent.version} is deprecated. Please use 
> ${project.parent.version} instead.
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-flink-client:jar:0.8.0-SNAPSHOT
> [WARNING] The expression ${parent.version} is deprecated. Please use 
> ${project.parent.version} instead.
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-spark-datasource/hudi-spark/pom.xml, line 26, 
> column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark2_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark2_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-spark-datasource/hudi-spark2/pom.xml, line 24, 
> column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-utilities_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-utilities_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-utilities/pom.xml, line 26, column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark-bundle_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/packaging/hudi-spark-bundle/pom.xml, line 26, column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-utilities-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/packaging/hudi-utilities-bundle/pom.xml, line 26, column 
> 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-flink_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-flink_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-flink/pom.xml, line 28, column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-flink-bundle_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-flink-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/packaging/hudi-flink-bundle/pom.xml, line 28, column 15
> [WARNING]
> [WARNING] It is highly recommended to fix these problems because they 
> threaten the stability of your build.
> [WARNING]
> [WARNING] For this reason, future Maven versions might no longer support 
> building such malformed projects.
> [WARNING]
> [INFO] 
> 
> [INFO] Reactor Build Order:
> [INFO]
> [INFO] Hudi   
> [pom]
> [INFO] hudi-common
> [jar]
> [INFO] hudi-timeline-service  
> [jar]
> [INFO] hudi-client
> [pom]
> [INFO] hudi-client-common 
> [jar]
> [INFO] 

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-02 Thread GitBox


liujinhui1994 commented on a change in pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#discussion_r606116196



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdatePayload.java
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.util.List;
+
+public class PartialUpdatePayload extends OverwriteWithLatestAvroPayload {
+  public PartialUpdatePayload(GenericRecord record, Comparable orderingVal) {
+super(record, orderingVal);
+  }
+
+  public PartialUpdatePayload(Option record) {
+this(record.get(), (record1) -> 0); // natural order
+  }
+
+  @Override
+  public Option combineAndGetUpdateValue(IndexedRecord 
lastValue, Schema schema) throws IOException {

Review comment:
   > schema here refers to incoming partial schema right?
   yes
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-02 Thread GitBox


liujinhui1994 commented on a change in pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#discussion_r606108825



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdatePayload.java
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.util.List;
+
+public class PartialUpdatePayload extends OverwriteWithLatestAvroPayload {

Review comment:
   ok,i will add




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-02 Thread GitBox


liujinhui1994 commented on a change in pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#discussion_r606108625



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -106,7 +110,7 @@
   public HoodieMergeHandle(HoodieWriteConfig config, String instantTime, 
HoodieTable hoodieTable,
Iterator> recordItr, String 
partitionPath, String fileId,
TaskContextSupplier taskContextSupplier) {
-super(config, instantTime, partitionPath, fileId, hoodieTable, 
taskContextSupplier);
+super(config, instantTime, partitionPath, fileId, hoodieTable, 
getWriterSchemaIncludingAndExcludingMetadataPair(config, hoodieTable), 
taskContextSupplier);

Review comment:
   I found through debug,when config.setLastSchema is successfully updated, 
table.getConfig.getLastSchema is also updated, and they both take their values 
from the same properties




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1758) Flink insert command does not update the record

2021-04-02 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-1758:
--
Description: 
[^Screen Shot 2021-04-02 at 12.10.08 AM.zip]

 

Followed the steps mentioned in 
[https://hudi.apache.org/docs/flink-quick-start-guide.html]

but the second insert command that is supposed to perform an `update` did not 
update the record. 

 

[~danny0405] Would you be able to help here ?

  was:
!Screen Shot 2021-04-02 at 12.10.08 AM.png!

 

Followed the steps mentioned in 
[https://hudi.apache.org/docs/flink-quick-start-guide.html]

but the second insert command that is supposed to perform an `update` did not 
update the record. 

 

[~danny0405] Would you be able to help here ?


> Flink insert command does not update the record
> ---
>
> Key: HUDI-1758
> URL: https://issues.apache.org/jira/browse/HUDI-1758
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>
> [^Screen Shot 2021-04-02 at 12.10.08 AM.zip]
>  
> Followed the steps mentioned in 
> [https://hudi.apache.org/docs/flink-quick-start-guide.html]
> but the second insert command that is supposed to perform an `update` did not 
> update the record. 
>  
> [~danny0405] Would you be able to help here ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1758) Flink insert command does not update the record

2021-04-02 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-1758:
--
Description: 
!Screen Shot 2021-04-02 at 12.10.08 AM.png!

 

Followed the steps mentioned in 
[https://hudi.apache.org/docs/flink-quick-start-guide.html]

but the second insert command that is supposed to perform an `update` did not 
update the record. 

 

[~danny0405] Would you be able to help here ?

  was:
!image (1).png!

 

Followed the steps mentioned in 
[https://hudi.apache.org/docs/flink-quick-start-guide.html]

but the second insert command that is supposed to perform an `update` did not 
update the record. 

 

[~danny0405] Would you be able to help here ?


> Flink insert command does not update the record
> ---
>
> Key: HUDI-1758
> URL: https://issues.apache.org/jira/browse/HUDI-1758
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>
> !Screen Shot 2021-04-02 at 12.10.08 AM.png!
>  
> Followed the steps mentioned in 
> [https://hudi.apache.org/docs/flink-quick-start-guide.html]
> but the second insert command that is supposed to perform an `update` did not 
> update the record. 
>  
> [~danny0405] Would you be able to help here ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1758) Flink insert command does not update the record

2021-04-02 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-1758:
--
Description: 
!image (1).png!

 

Followed the steps mentioned in 
[https://hudi.apache.org/docs/flink-quick-start-guide.html]

but the second insert command that is supposed to perform an `update` did not 
update the record. 

 

[~danny0405] Would you be able to help here ?

  was:!image (1).png!


> Flink insert command does not update the record
> ---
>
> Key: HUDI-1758
> URL: https://issues.apache.org/jira/browse/HUDI-1758
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>
> !image (1).png!
>  
> Followed the steps mentioned in 
> [https://hudi.apache.org/docs/flink-quick-start-guide.html]
> but the second insert command that is supposed to perform an `update` did not 
> update the record. 
>  
> [~danny0405] Would you be able to help here ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1758) Flink insert command does not update the record

2021-04-02 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-1758:
-

 Summary: Flink insert command does not update the record
 Key: HUDI-1758
 URL: https://issues.apache.org/jira/browse/HUDI-1758
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Nishith Agarwal
Assignee: Nishith Agarwal


!image (1).png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-02 Thread GitBox


liujinhui1994 commented on a change in pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#discussion_r606101834



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -123,6 +127,22 @@ public HoodieMergeHandle(HoodieWriteConfig config, String 
instantTime, HoodieTab
 init(fileId, this.partitionPath, dataFileToBeMerged);
   }
 
+  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, 
HoodieTable hoodieTable) {
+Schema originalSchema = new Schema.Parser().parse(config.getSchema());
+Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+boolean updatePartialFields = config.updatePartialFields();
+if (updatePartialFields) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(hoodieTable.getMetaClient());
+Schema lastSchema = resolver.getTableAvroSchema();
+config.setLastSchema(lastSchema.toString());

Review comment:
   > infact, would prefer to throw an exception if table schema is not 
present and if someone is trying to use updatePartialFields.
   
   Is this too serious? I don't think it's necessary to throw this exception 
here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-02 Thread GitBox


liujinhui1994 commented on a change in pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#discussion_r606100489



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -123,6 +127,22 @@ public HoodieMergeHandle(HoodieWriteConfig config, String 
instantTime, HoodieTab
 init(fileId, this.partitionPath, dataFileToBeMerged);
   }
 
+  protected static Pair 
getWriterSchemaIncludingAndExcludingMetadataPair(HoodieWriteConfig config, 
HoodieTable hoodieTable) {
+Schema originalSchema = new Schema.Parser().parse(config.getSchema());
+Schema hoodieSchema = HoodieAvroUtils.addMetadataFields(originalSchema);
+boolean updatePartialFields = config.updatePartialFields();
+if (updatePartialFields) {
+  try {
+TableSchemaResolver resolver = new 
TableSchemaResolver(hoodieTable.getMetaClient());
+Schema lastSchema = resolver.getTableAvroSchema();
+config.setLastSchema(lastSchema.toString());

Review comment:
   > lastSchema could be null if this is first commit or last commit is an 
operation which does not inject schema in commit metadata. Can we check if not 
null and then set the last schema. If not, .toString() could throw 
NullPointerException.
   
   ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running

2021-04-02 Thread GitBox


n3nash commented on pull request #2751:
URL: https://github.com/apache/hudi/pull/2751#issuecomment-812357498


   Okay, thanks for the confirmation, I will try to reproduce this issue on my 
end and get back. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-02 Thread GitBox


liujinhui1994 commented on a change in pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#discussion_r606098330



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -79,6 +80,7 @@
   public static final String TIMELINE_LAYOUT_VERSION = 
"hoodie.timeline.layout.version";
   public static final String BASE_PATH_PROP = "hoodie.base.path";
   public static final String AVRO_SCHEMA = "hoodie.avro.schema";
+  public static final String LAST_AVRO_SCHEMA = "hoodie.last.avro.schema";

Review comment:
   ok, i think it's good




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2752: [HUDI-1749] Clean/Compaction/Rollback command maybe never exit when operation fail

2021-04-02 Thread GitBox


n3nash commented on pull request #2752:
URL: https://github.com/apache/hudi/pull/2752#issuecomment-812341544


   @li36909 Have you run into a scenario where these commands hang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] li36909 commented on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running

2021-04-02 Thread GitBox


li36909 commented on pull request #2751:
URL: https://github.com/apache/hudi/pull/2751#issuecomment-812341340


   @n3nash yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #2754: [HUDI-1751] DeltaStreamer print many unnecessary warn log

2021-04-02 Thread GitBox


n3nash commented on a change in pull request #2754:
URL: https://github.com/apache/hudi/pull/2754#discussion_r606087966



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
##
@@ -173,9 +173,11 @@ public KafkaOffsetGen(TypedProperties props) {
 this.props = props;
 
 kafkaParams = new HashMap<>();
-for (Object prop : props.keySet()) {
+props.keySet().stream().filter(prop -> {

Review comment:
   @li36909 This change does not change any logging unlike what the title 
of the PR says, can you correct the title and explain this change please ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (94a5e72 -> e970e1f)

2021-04-02 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 94a5e72  [HUDI-1737][hudi-client] Code Cleanup: Extract common method 
in HoodieCreateHandle & FlinkCreateHandle (#2745)
 add e970e1f  [HUDI-1696] add apache commons-codec dependency to 
flink-bundle explicitly (#2758)

No new revisions were added by this update.

Summary of changes:
 packaging/hudi-flink-bundle/pom.xml | 1 +
 1 file changed, 1 insertion(+)


[GitHub] [hudi] n3nash merged pull request #2758: [HUDI-1696] add apache commons-codec dependency to flink-bundle explicitly

2021-04-02 Thread GitBox


n3nash merged pull request #2758:
URL: https://github.com/apache/hudi/pull/2758


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on issue #2696: Metadata and runtime exceptions in Hudi 0.7.0 on AWS Glue

2021-04-02 Thread GitBox


n3nash commented on issue #2696:
URL: https://github.com/apache/hudi/issues/2696#issuecomment-812339166


   @kimberlyamandalu Yes, you should be able to switch off your metadata table 
without any side-effect. Although, if you want to later turn on the metadata 
table, you will need to delete data under `basepath/.hoodie/metadata`. Once the 
metadata folder is empty, you can toggle the metadata back on and things will 
work fine. 
   There is a change to do this automatically in master now but for your case 
to debug the issue, I would recommend just deleting the metadata folder. 
   
   Sure, I can help you with making the changes so you can build it and deploy 
the custom build. Before turning the metadata table `off`, you should make the 
following changes, deploy your custom build and get the logs because once you 
turn the metadata table `off` and then `on` the problem may not be 
reproducible. 
   
   1. Checkout the 0.7.0 release tag by doing `git checkout tags/version 0.7.0`
   2. Add a bunch of logs to the following method -> 
https://github.com/apache/hudi/blob/release-0.7.0/hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/FileSystemViewHandler.java#L342.
 We ideally want to know which line of code is throwing the runtime exception 
and why. So, add logs before and after at any method invocation inside will be 
helpful. 
   2. Run the following command : mvn clean package -DskipTests
   3. New bundle jars will be available under package/*. Use these new bundle 
jars to deploy and run to collect logs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org