date:20210921

[GitHub] [hudi] hudi-bot edited a comment on pull request #3696: [WIP][HUDI-2439] Refactor commit actions in hudi-client module

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3696:
URL: https://github.com/apache/hudi/pull/3696#issuecomment-924126593


   
   ## CI report:
   
   * 2831b493e97cdc9e2a5aa99e170f15d48028de51 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2315)
 
   * 1c37ce18b451091cdcd751af679be6833d713c68 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2317)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3696: [WIP][HUDI-2439] Refactor commit actions in hudi-client module

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3696:
URL: https://github.com/apache/hudi/pull/3696#issuecomment-924126593


   
   ## CI report:
   
   * 2831b493e97cdc9e2a5aa99e170f15d48028de51 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2315)
 
   * 1c37ce18b451091cdcd751af679be6833d713c68 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3702: [HUDI-2479] HoodieFileIndex throws NPE for FileSlice with pure log files

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3702:
URL: https://github.com/apache/hudi/pull/3702#issuecomment-924636021


   
   ## CI report:
   
   * 163949191ab006195970b6acd209e282dd3cc068 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2316)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3702: [HUDI-2479] HoodieFileIndex throws NPE for FileSlice with pure log files

2021-09-21 Thread GitBox



hudi-bot commented on pull request #3702:
URL: https://github.com/apache/hudi/pull/3702#issuecomment-924636021


   
   ## CI report:
   
   * 163949191ab006195970b6acd209e282dd3cc068 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2479) HoodieFileIndex throws NPE for FileSlice with pure log files

2021-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2479:
-
Labels: pull-request-available  (was: )

> HoodieFileIndex throws NPE for FileSlice with pure log files
> 
>
> Key: HUDI-2479
> URL: https://issues.apache.org/jira/browse/HUDI-2479
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] danny0405 opened a new pull request #3702: [HUDI-2479] HoodieFileIndex throws NPE for FileSlice with pure log files

2021-09-21 Thread GitBox



danny0405 opened a new pull request #3702:
URL: https://github.com/apache/hudi/pull/3702


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-2479) HoodieFileIndex throws NPE for FileSlice with pure log files

2021-09-21 Thread Danny Chen (Jira)

Danny Chen created HUDI-2479:


 Summary: HoodieFileIndex throws NPE for FileSlice with pure log 
files
 Key: HUDI-2479
 URL: https://issues.apache.org/jira/browse/HUDI-2479
 Project: Apache Hudi
  Issue Type: Bug
  Components: Spark Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3696: [WIP][HUDI-2439] Refactor commit actions in hudi-client module

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3696:
URL: https://github.com/apache/hudi/pull/3696#issuecomment-924126593


   
   ## CI report:
   
   * 2831b493e97cdc9e2a5aa99e170f15d48028de51 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2315)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3696: [WIP][HUDI-2439] Refactor commit actions in hudi-client module

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3696:
URL: https://github.com/apache/hudi/pull/3696#issuecomment-924126593


   
   ## CI report:
   
   * a4b4afdf7b5dd711cc3978e2c02f9efb0c7b5514 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2305)
 
   * 2831b493e97cdc9e2a5aa99e170f15d48028de51 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2441) To support partial update function which can move and update the data from the old partition to the new partition , when the data with same key change it's partition

2021-09-21 Thread David_Liang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David_Liang updated HUDI-2441:
--
Description: 
to considerate such a scene, there 2 reocod *in different batch*  as follow 
||post_id ||position||weight||ts||day ||
| 1|shengzhen|3KG|1630480027|{color:#ff}20210901{color}|
| 1|beijing|3KG|1630652828|{color:#ff}20210903{color}|

 

when using the {color:#ff}*Global Index*{color} with such sql

 
{code:java}
merge into target_hudi_table  t
   using (
        select post_id, position, ts , day from source_table
   ) as s
on t.id = s.id
when natched then update set  t.position = s.position, t.ts=s.ts, t.day = s.day
when not matched then insert *
{code}
 

Beacuse now the hudi engine haven't support *cross partitions partial merge 
into,* the result in the target table is  

 
||post_id  (as primiary key)||position||weight||ts||day||
| 1|beijing| |1630652828|*{color:#ff}20210903{color}*|

the record still in  the old parition. 

 

but the *expected* result is 
||post_id  (as primiary key)||position||weight||ts||day||
| 
1|beijing|*{color:#ff}3KG{color}*|1630652828|{color:#ff}*20210903*{color}|

 

 

 

  was:
to considerate such a scene, there 2 reocod in different batch  as follow 
||post_id ||position||weight||ts||day ||
| 1|shengzhen|3KG|1630480027|{color:#ff}20210901{color}|
| 1|beijing|3KG|1630652828|{color:#ff}20210903{color}|

 

when using the {color:#ff}*Global Index*{color} with such sql

 
{code:java}
merge into target_hudi_table  t
   using (
        select post_id, position, ts , day from source_table
   ) as s
on t.id = s.id
when natched then update set  t.position = s.position, t.ts=s.ts, t.day = s.day
when not matched then insert *
{code}
 

Beacuse now the hudi engine haven't support *cross partitions partial merge 
into,* the result in the target table is  

 
||post_id  (as primiary key)||position||weight||ts||day||
| 1|beijing| |1630652828|*{color:#ff}20210903{color}*|

the record still in  the old parition. 

 

but the *expected* result is 
||post_id  (as primiary key)||position||weight||ts||day||
| 
1|beijing|*{color:#FF}3KG{color}*|1630652828|{color:#ff}*20210903*{color}|

 

 

 


> To support partial update function which can move and update the data from 
> the old partition to the new partition , when the data with same key change 
> it's partition
> -
>
> Key: HUDI-2441
> URL: https://issues.apache.org/jira/browse/HUDI-2441
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Storage Management
>Reporter: David_Liang
>Assignee: Nicholas Jiang
>Priority: Major
>
> to considerate such a scene, there 2 reocod *in different batch*  as follow 
> ||post_id ||position||weight||ts||day ||
> | 1|shengzhen|3KG|1630480027|{color:#ff}20210901{color}|
> | 1|beijing|3KG|1630652828|{color:#ff}20210903{color}|
>  
> when using the {color:#ff}*Global Index*{color} with such sql
>  
> {code:java}
> merge into target_hudi_table  t
>    using (
>         select post_id, position, ts , day from source_table
>    ) as s
> on t.id = s.id
> when natched then update set  t.position = s.position, t.ts=s.ts, t.day = 
> s.day
> when not matched then insert *
> {code}
>  
> Beacuse now the hudi engine haven't support *cross partitions partial merge 
> into,* the result in the target table is  
>  
> ||post_id  (as primiary key)||position||weight||ts||day||
> | 1|beijing| |1630652828|*{color:#ff}20210903{color}*|
> the record still in  the old parition. 
>  
> but the *expected* result is 
> ||post_id  (as primiary key)||position||weight||ts||day||
> | 
> 1|beijing|*{color:#ff}3KG{color}*|1630652828|{color:#ff}*20210903*{color}|
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3623: [WIP][HUDI-2409] Using HBase shaded jars in Hudi presto bundle

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3623:
URL: https://github.com/apache/hudi/pull/3623#issuecomment-915056982


   
   ## CI report:
   
   * a34260e3bc4c2344005feefe4c7672b9589569af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on issue #3699: [SUPPORT] Job hanging on toRdd at HoodieSparkUtils

2021-09-21 Thread GitBox



xushiyan commented on issue #3699:
URL: https://github.com/apache/hudi/issues/3699#issuecomment-924585220


   Need more details to understand what's going on:
   - Which line of code from HoodieSparkUtils was ran here? 
   - What Hudi actions are you trying to perform? 
   - What is the total input data size are you reading?
   - How many executors were actually created during the run?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3623: [WIP][HUDI-2409] Using HBase shaded jars in Hudi presto bundle

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3623:
URL: https://github.com/apache/hudi/pull/3623#issuecomment-915056982


   
   ## CI report:
   
   * 427429a67168db05df942087f2cbf950f853196a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2260)
 
   * a34260e3bc4c2344005feefe4c7672b9589569af Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3623: [WIP][HUDI-2409] Using HBase shaded jars in Hudi presto bundle

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3623:
URL: https://github.com/apache/hudi/pull/3623#issuecomment-915056982


   
   ## CI report:
   
   * 427429a67168db05df942087f2cbf950f853196a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2260)
 
   * a34260e3bc4c2344005feefe4c7672b9589569af UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2441) To support partial update function which can move and update the data from the old partition to the new partition , when the data with same key change it's partition

2021-09-21 Thread David_Liang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David_Liang updated HUDI-2441:
--
Description: 
to considerate such a scene, there 2 reocod in different batch  as follow 
||post_id ||position||weight||ts||day ||
| 1|shengzhen|3KG|1630480027|{color:#ff}20210901{color}|
| 1|beijing|3KG|1630652828|{color:#ff}20210903{color}|

 

when using the {color:#ff}*Global Index*{color} with such sql

 
{code:java}
merge into target_hudi_table  t
   using (
        select post_id, position, ts , day from source_table
   ) as s
on t.id = s.id
when natched then update set  t.position = s.position, t.ts=s.ts, t.day = s.day
when not matched then insert *
{code}
 

Beacuse now the hudi engine haven't support *cross partitions partial merge 
into,* the result in the target table is  

 
||post_id  (as primiary key)||position||weight||ts||day||
| 1|beijing| |1630652828|*{color:#ff}20210903{color}*|

the record still in  the old parition. 

 

but the *expected* result is 
||post_id  (as primiary key)||position||weight||ts||day||
| 
1|beijing|*{color:#FF}3KG{color}*|1630652828|{color:#ff}*20210903*{color}|

 

 

 

  was:
to considerate such a scene, there 2 reocod  as follow in the source table
||post_id ||position||weight||ts||day ||
| 1|shengzhen|3KG|1630480027|{color:#ff}20210901{color}|
| 1|beijing|3KG|1630652828|{color:#ff}20210903{color}|

 

when using the {color:#ff}*Global Index*{color} with such sql

 
{code:java}
merge into target_hudi_table  t
   using (
        select post_id, position, ts , day from source_table
   ) as s
on t.id = s.id
when natched then update set  t.position = s.position, t.ts=s.ts, t.day = s.day
when not matched then insert *
{code}
 

Beacuse now the hudi engine haven't support *cross partitions partial merge 
into,* the result in the target table is  

 
||post_id  (as primiary key)||position||weight||ts||day||
| 1|beijing|3KG|1630652828|*{color:#ff}20210901{color}*|

the record still in  the old parition. 

 

but the *expected* result is 
||post_id  (as primiary key)||position||weight||ts||day||
| 1|beijing|3KG|1630652828|{color:#ff}*20210903*{color}|

 

 

 


> To support partial update function which can move and update the data from 
> the old partition to the new partition , when the data with same key change 
> it's partition
> -
>
> Key: HUDI-2441
> URL: https://issues.apache.org/jira/browse/HUDI-2441
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Storage Management
>Reporter: David_Liang
>Assignee: Nicholas Jiang
>Priority: Major
>
> to considerate such a scene, there 2 reocod in different batch  as follow 
> ||post_id ||position||weight||ts||day ||
> | 1|shengzhen|3KG|1630480027|{color:#ff}20210901{color}|
> | 1|beijing|3KG|1630652828|{color:#ff}20210903{color}|
>  
> when using the {color:#ff}*Global Index*{color} with such sql
>  
> {code:java}
> merge into target_hudi_table  t
>    using (
>         select post_id, position, ts , day from source_table
>    ) as s
> on t.id = s.id
> when natched then update set  t.position = s.position, t.ts=s.ts, t.day = 
> s.day
> when not matched then insert *
> {code}
>  
> Beacuse now the hudi engine haven't support *cross partitions partial merge 
> into,* the result in the target table is  
>  
> ||post_id  (as primiary key)||position||weight||ts||day||
> | 1|beijing| |1630652828|*{color:#ff}20210903{color}*|
> the record still in  the old parition. 
>  
> but the *expected* result is 
> ||post_id  (as primiary key)||position||weight||ts||day||
> | 
> 1|beijing|*{color:#FF}3KG{color}*|1630652828|{color:#ff}*20210903*{color}|
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * a7a59556703d2ea881abee407f8fd88291d04d80 UNKNOWN
   * 99414ba1ee89c6cdd2f482425001aec2392d65e9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2312)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (55df8f6 -> e813dae)

2021-09-21 Thread danny0405

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 55df8f6  [MINOR] Fix typo."funcitons" corrected to "functions" (#3681)
 add e813dae  [MINOR] Cosmetic changes for flink (#3701)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/sink/StreamWriteFunction.java   |  2 +-
 .../sink/partitioner/profile/EmptyWriteProfile.java |  7 ++-
 .../apache/hudi/streamer/HoodieFlinkStreamer.java   |  5 +++--
 .../org/apache/hudi/table/HoodieTableFactory.java   |  2 +-
 .../org/apache/hudi/table/HoodieTableSource.java| 11 ---
 .../table/format/mor/MergeOnReadInputFormat.java|  2 +-
 .../Transformer.java => util/InputFormats.java} | 21 +
 .../java/org/apache/hudi/util/StreamerUtil.java | 16 ++--
 8 files changed, 35 insertions(+), 31 deletions(-)
 copy hudi-flink/src/main/java/org/apache/hudi/{sink/transform/Transformer.java 
=> util/InputFormats.java} (67%)

[GitHub] [hudi] danny0405 merged pull request #3701: [MINOR] Cosmetic changes for flink

2021-09-21 Thread GitBox



danny0405 merged pull request #3701:
URL: https://github.com/apache/hudi/pull/3701


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3701: [MINOR] Cosmetic changes for flink

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3701:
URL: https://github.com/apache/hudi/pull/3701#issuecomment-924551270


   
   ## CI report:
   
   * 4e1b5f8accb6d597d4fe2244ae381a4a56b6f109 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2311)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YannByron commented on pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-21 Thread GitBox



YannByron commented on pull request #3693:
URL: https://github.com/apache/hudi/pull/3693#issuecomment-924562478


   @leesf can you have time to review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



zhangyue19921010 commented on a change in pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#discussion_r713565743



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/UtilitiesTestBase.java
##
@@ -364,5 +385,32 @@ public static String toJsonString(HoodieRecord hr) {
 public static String[] jsonifyRecords(List records) {
   return 
records.stream().map(Helpers::toJsonString).toArray(String[]::new);
 }
+
+public static void addAvroRecord(
+VectorizedRowBatch batch,
+GenericRecord record,
+TypeDescription orcSchema,
+int orcBatchSize,
+Writer writer
+) throws IOException {
+  for (int c = 0; c < batch.numCols; c++) {
+ColumnVector colVector = batch.cols[c];
+final String thisField = orcSchema.getFieldNames().get(c);
+final TypeDescription type = orcSchema.getChildren().get(c);
+
+Object fieldValue = record.get(thisField);
+Schema.Field avroField = record.getSchema().getField(thisField);
+AvroOrcUtils.addToVector(type, colVector, avroField.schema(), 
fieldValue, batch.size);
+  }
+
+  batch.size++;
+
+  if (batch.size % orcBatchSize == 0 || batch.size == batch.getMaxSize()) {

Review comment:
   Sure thing, changed :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 32223149bbb3d0c23e710fd338de4ed63e5f8be8 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2310)
 
   * a7a59556703d2ea881abee407f8fd88291d04d80 UNKNOWN
   * 99414ba1ee89c6cdd2f482425001aec2392d65e9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2312)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a change in pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-21 Thread GitBox



danny0405 commented on a change in pull request #3698:
URL: https://github.com/apache/hudi/pull/3698#discussion_r713564251



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
##
@@ -318,6 +323,7 @@ protected void completeCompaction(HoodieCommitMetadata 
metadata, JavaRDD compact(String compactionInstantTime, boolean 
shouldComplete) {
 HoodieSparkTable table = HoodieSparkTable.create(config, context);
+table.getHoodieView().sync();
 preWrite(compactionInstantTime, WriteOperationType.COMPACT, 
table.getMetaClient());

Review comment:
   Is there any possibility that we only sync the view when metadata table 
is enabled ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3701: [MINOR] Cosmetic changes for flink

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3701:
URL: https://github.com/apache/hudi/pull/3701#issuecomment-924551270


   
   ## CI report:
   
   * 4e1b5f8accb6d597d4fe2244ae381a4a56b6f109 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2311)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3701: [MINOR] Cosmetic changes for flink

2021-09-21 Thread GitBox



hudi-bot commented on pull request #3701:
URL: https://github.com/apache/hudi/pull/3701#issuecomment-924551270


   
   ## CI report:
   
   * 4e1b5f8accb6d597d4fe2244ae381a4a56b6f109 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 32223149bbb3d0c23e710fd338de4ed63e5f8be8 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2310)
 
   * a7a59556703d2ea881abee407f8fd88291d04d80 UNKNOWN
   * 99414ba1ee89c6cdd2f482425001aec2392d65e9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 opened a new pull request #3701: [MINOR] Cosmetic changes for flink

2021-09-21 Thread GitBox



danny0405 opened a new pull request #3701:
URL: https://github.com/apache/hudi/pull/3701


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.

2021-09-21 Thread GitBox



nsivabalan commented on issue #3605:
URL: https://github.com/apache/hudi/issues/3605#issuecomment-924546245


   If your cardinality for partition is low, we can try to partition using a 
diff field which could have high cardinality. We can leverage more parallel 
processing depending on the no of partitions. Within each partition, we can't 
do much of parallel processing and so we are limited. I mean, hudi does assign 
one file group to each executor, but I am talking about indexing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.

2021-09-21 Thread GitBox



nsivabalan commented on issue #3605:
URL: https://github.com/apache/hudi/issues/3605#issuecomment-924545165


   btw, an orthogonal point. 
   I see your record key is {segmentId,uuid} and partition path is segmentId. 
Not sure if you need to prefix segmentId to your record keys, if you are solely 
using it to uniquely identify unique records and apply updates within hudi. If 
there is no external facing requirement for record keys to be a pair of 
{segmentId,uuid}, you can just have uuid. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.

2021-09-21 Thread GitBox



nsivabalan commented on issue #3605:
URL: https://github.com/apache/hudi/issues/3605#issuecomment-924544144


   got it, would you mind sharing the screenshots of spark stages. we will get 
an idea of where the time is spent more. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 32223149bbb3d0c23e710fd338de4ed63e5f8be8 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2310)
 
   * a7a59556703d2ea881abee407f8fd88291d04d80 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 7a21d39bce12b04c3663d8966e9923145b2ce234 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2100)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2103)
 
   * 32223149bbb3d0c23e710fd338de4ed63e5f8be8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2310)
 
   * a7a59556703d2ea881abee407f8fd88291d04d80 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



zhangyue19921010 commented on a change in pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#discussion_r713553429



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/UtilitiesTestBase.java
##
@@ -364,5 +385,32 @@ public static String toJsonString(HoodieRecord hr) {
 public static String[] jsonifyRecords(List records) {
   return 
records.stream().map(Helpers::toJsonString).toArray(String[]::new);
 }
+
+public static void addAvroRecord(

Review comment:
   Sure thing. changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3700: [HUDI-2471] Add support ignoring case when column name matches in merge into

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3700:
URL: https://github.com/apache/hudi/pull/3700#issuecomment-924523979


   
   ## CI report:
   
   * 98de9c0ec2e814c3c8c20276e6d1457c4eb7243d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2309)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



zhangyue19921010 commented on a change in pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#discussion_r713552896



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1398,6 +1399,34 @@ private void testParquetDFSSource(boolean 
useSchemaProvider, List transf
 testNum++;
   }
 
+  private void testORCDFSSource(boolean useSchemaProvider, List 
transformerClassNames) throws Exception {
+// prepare ORCDFSSource
+TypedProperties orcProps = new TypedProperties();
+
+// Properties used for testing delta-streamer with orc source
+orcProps.setProperty("include", "base.properties");
+orcProps.setProperty("hoodie.embed.timeline.server","false");
+orcProps.setProperty("hoodie.datasource.write.recordkey.field", 
"_row_key");
+orcProps.setProperty("hoodie.datasource.write.partitionpath.field", 
"not_there");
+if (useSchemaProvider) {
+  
orcProps.setProperty("hoodie.deltastreamer.schemaprovider.source.schema.file", 
dfsBasePath + "/" + "source.avsc");
+  if (transformerClassNames != null) {
+
orcProps.setProperty("hoodie.deltastreamer.schemaprovider.target.schema.file", 
dfsBasePath + "/" + "target.avsc");
+  }
+}
+orcProps.setProperty("hoodie.deltastreamer.source.dfs.root", 
ORC_SOURCE_ROOT);
+UtilitiesTestBase.Helpers.savePropsToDFS(orcProps, dfs, dfsBasePath + "/" 
+ PROPS_FILENAME_TEST_ORC);
+
+String tableBasePath = dfsBasePath + "/test_orc_source_table" + testNum;
+HoodieDeltaStreamer deltaStreamer = new HoodieDeltaStreamer(
+TestHelpers.makeConfig(tableBasePath, WriteOperationType.INSERT, 
ORCDFSSource.class.getName(),
+transformerClassNames, PROPS_FILENAME_TEST_ORC, false,
+useSchemaProvider, 10, false, null, null, "timestamp", 
null), jsc);
+deltaStreamer.sync();
+TestHelpers.assertRecordCount(ORC_NUM_RECORDS, tableBasePath + 
"/*/*.parquet", sqlContext);

Review comment:
   Hi @nsivabalan Thanks for your review. I think this is .parquet Because 
this patch is a ORCDFSSource which let HoodieDeltaStreamer can read orc file 
into hudi table and also use parquet format as base file format. So that we 
need to use .parquet when reading hudi table data.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2472) Tests failure follow up when metadata is enabled by default

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2472:
--
Description: 
We plan to enable metadata by default. but there are some tests that fail with 
this. Dumping details on tests for which metadata is disabled for now. We need 
to fix them one by one.  

 

hudi-spark-client: // this is the module that has lot of tests that could 
potentially have issues. 

TestHoodieSparkMergeOnReadTableIncrementalRead.testIncrementalReadsWithCompaction.
 disabled metadata for now. directly accesses files.

TestHoodieIndex.
 testSimpleTagLocationAndUpdateWithRollback. known issue.  
https://issues.apache.org/jira/browse/HUDI-2468

testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath. uses test table. 
disabled metadata.

TestHoodieRowCreateHandle.testInstantiationFailure. disabled metadata. not a 
real issue. 
 
TestHoodieSparkMergeOnReadTableRollback.testMultiRollbackWithDeltaAndCompactionCommit.
 restore fails. bcoz, there is an inflight rollback in dataset timeline. 
disabling for now. https://issues.apache.org/jira/browse/HUDI-2477
 TestHoodieMergeOnReadTable.testLogFileCountsAfterCompaction. uses 
HoodieSparkWriteableTestTable. disabled metadata for now.
 TestHoodieCompactor.testWriteStatusContentsAfterCompaction. uses 
HoodieSparkWriteableTestTable. have disabled metadata.
 TestHbaseIndex.testEnsureTagLocationUsesCommitTimeline. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468
 TestHbaseIndex.testSimpleTagLocationAndUpdateWithRollback. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468

TestCleaner. lot of tests. uses test table.
 TestHoodieTimelineArchiveLog. lot of tests. uses test table. 

hudi-utilities:

TestHoodieDeltaStreamer. testCleanerDeleteReplacedDataWithArchive. fails. 
relating to archival. disabling metadata. need to look into it.

hudi-client-common: all passed.
 hudi-flink-client: all passed. 
 hudi-java-client: disabled metadata for java. all ok.
 hudi-common: all passed. 
 hudi-spark java: Testbootstrap class fully fails. rollback of 1st commit. have 
disbaled metadata. https://issues.apache.org/jira/browse/HUDI-2477

hudi-spark scala tests: all good.
 hudi-utilities: one test in deltastreamer. 
 hudi-timelineserver: all good.
 hudi-sync: 
     hudi-dla-sync: all good. 
     hudi-hive-sync: all good. 
 hudi-spark3: all good.
 hudi-spark2: all good.
 hudi-examples: no tests.

 

pending modules.

hudi-cli

hudi-integ-test 

 

 

 

 

  was:
We plan to enable metadata by default. but there are some tests that fail with 
this. Dumping details on tests for which metadata is disabled for now. We need 
to fix them one by one.  

 

hudi-spark-client: // this is the module that has lot of tests that could 
potentially have issues. 

TestHoodieSparkMergeOnReadTableIncrementalRead.testIncrementalReadsWithCompaction.
 disabled metadata for now. directly accesses files.

TestHoodieIndex.
 testSimpleTagLocationAndUpdateWithRollback. known issue.  
https://issues.apache.org/jira/browse/HUDI-2468

testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath. uses test table. 
disabled metadata.

TestHoodieRowCreateHandle.testInstantiationFailure. disabled metadata. not a 
real issue. 
 
TestHoodieSparkMergeOnReadTableRollback.testMultiRollbackWithDeltaAndCompactionCommit.
 restore fails. bcoz, there is an inflight rollback in dataset timeline. 
disabling for now. https://issues.apache.org/jira/browse/HUDI-2477
 TestHoodieMergeOnReadTable.testLogFileCountsAfterCompaction. uses 
HoodieSparkWriteableTestTable. disabled metadata for now.
 TestHoodieCompactor.testWriteStatusContentsAfterCompaction. uses 
HoodieSparkWriteableTestTable. have disabled metadata.
 TestHbaseIndex.testEnsureTagLocationUsesCommitTimeline. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468
 TestHbaseIndex.testSimpleTagLocationAndUpdateWithRollback. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468

TestCleaner. lot of tests. uses test table.
 TestHoodieTimelineArchiveLog. lot of tests. uses test table. 

hudi-client-common: all passed.
 hudi-flink-client: all passed. 
 hudi-java-client: disabled metadata for java. all ok.
 hudi-common: all passed. 
 hudi-spark java: Testbootstrap class fully fails. rollback of 1st commit. have 
disbaled metadata. https://issues.apache.org/jira/browse/HUDI-2477

hudi-spark scala tests: all good.
 hudi-utilities: one test in deltastreamer. 
 hudi-timelineserver: all good.
 hudi-sync: 
     hudi-dla-sync: all good. 
     hudi-hive-sync: all good. 
 hudi-spark3: all good.
 hudi-spark2: all good.
 hudi-examples: no tests.

 

pending modules.

hudi-cli

hudi-integ-test 

 

 

 

 


> Tests failure follow up when metadat

[GitHub] [hudi] nsivabalan commented on pull request #3455: [HUDI-2297] Estimate available memory size accurately for spillable map.

2021-09-21 Thread GitBox



nsivabalan commented on pull request #3455:
URL: https://github.com/apache/hudi/pull/3455#issuecomment-924533840


   @rmahindra123 : a gentle reminder to review the PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] liujinhui1994 commented on pull request #3614: [HUDI-2370] Supports data encryption

2021-09-21 Thread GitBox



liujinhui1994 commented on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-924533202


   I tested and verified last week. After upgrading the parquet version, many 
unit tests and integration tests failed. I am still looking for a solution.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 7a21d39bce12b04c3663d8966e9923145b2ce234 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2100)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2103)
 
   * 32223149bbb3d0c23e710fd338de4ed63e5f8be8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2310)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-09-21 Thread GitBox



nsivabalan edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-924531159


   @liujinhui1994 : how did your testing go. Can you update w/ your findings. 
@vinothchandar : apart from parquet version upgrade, I also see that we are 
enabling vectorized reading by default in this patch. Just wanted to remind you 
just incase we need to watch out for something. Also, should we do parquet 
upgrade in a separate patch, so that we can do some testing around diff query 
types, engines etc to certify the upgrade. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #3614: [HUDI-2370] Supports data encryption

2021-09-21 Thread GitBox



nsivabalan commented on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-924531159


   @liujinhui1994 : how did your testing go. @vinothchandar : apart from 
parquet version upgrade, I also see that we are enabling vectorized reading by 
default in this patch. Just wanted to remind you just incase we need to watch 
out for something. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 7a21d39bce12b04c3663d8966e9923145b2ce234 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2100)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2103)
 
   * 32223149bbb3d0c23e710fd338de4ed63e5f8be8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3700: [HUDI-2471] Add support ignoring case when column name matches in merge into

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3700:
URL: https://github.com/apache/hudi/pull/3700#issuecomment-924523979


   
   ## CI report:
   
   * 98de9c0ec2e814c3c8c20276e6d1457c4eb7243d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2309)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3700: [HUDI-2471] Add support ignoring case when column name matches in merge into

2021-09-21 Thread GitBox



hudi-bot commented on pull request #3700:
URL: https://github.com/apache/hudi/pull/3700#issuecomment-924523979


   
   ## CI report:
   
   * 98de9c0ec2e814c3c8c20276e6d1457c4eb7243d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2470) use commit_time in the WHERE STATEMENT to optimize the incremental query

2021-09-21 Thread David_Liang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418375#comment-17418375
 ] 

David_Liang commented on HUDI-2470:
---

please assign to me? 

> use commit_time in the WHERE STATEMENT to optimize the  incremental query
> -
>
> Key: HUDI-2470
> URL: https://issues.apache.org/jira/browse/HUDI-2470
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Incremental Pull, Performance
>Reporter: David_Liang
>Priority: Major
>
> In the module of DeltaStreamer,  Option of  QUERY_TYPE_OPT_KEY and 
> BEGIN_INSTANTTIME_OPT_KEY is used to tell the DeltaStreamer to query data 
> after the specific time.  
> Such as method is not very convenient for user.  So if we can implement the 
> function that User can set BEGIN_INSTANTTIME_OPT_KEY  and 
> BEGIN_INSTANTTIME_OPT_KEY  at the sql, which is not only  very convinient for 
> user, also very a  elegant implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2471) Add support ignoring case when column name matches in merge into

2021-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2471:
-
Labels: pull-request-available  (was: )

> Add support ignoring case when  column name matches in merge into
> -
>
> Key: HUDI-2471
> URL: https://issues.apache.org/jira/browse/HUDI-2471
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] dongkelun opened a new pull request #3700: [HUDI-2471] Add support ignoring case when column name matches in merge into

2021-09-21 Thread GitBox



dongkelun opened a new pull request #3700:
URL: https://github.com/apache/hudi/pull/3700


   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *Add support ignoring case when column name matches in merge into*
   
   ## Brief change log
   
   *(for example:)*
 - *Add support ignoring case when column name matches in merge into*
   
   ## Verify this pull request
   
   *(example:)*
   
 - *Added unit test in TestMergeIntoTable2*

   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3590:
URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120


   
   ## CI report:
   
   * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN
   * 9c0123c0f27f990d009b323bab75b76ceecf3dab Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2307)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3590:
URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120


   
   ## CI report:
   
   * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN
   * 7793fbdb9b93a129ef606cb2d73ea6e1e9074957 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2299)
 
   * 9c0123c0f27f990d009b323bab75b76ceecf3dab Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2307)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] rubenssoto commented on issue #3607: [SUPPORT]Presto query hudi data with metadata table enable un-successfully.

2021-09-21 Thread GitBox



rubenssoto commented on issue #3607:
URL: https://github.com/apache/hudi/issues/3607#issuecomment-924497696


   @nsivabalan Does presto already support Hudi metadata?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a change in pull request #3646: [HUDI-349]: Added new cleaning policy based on number of hours

2021-09-21 Thread GitBox



nsivabalan commented on a change in pull request #3646:
URL: https://github.com/apache/hudi/pull/3646#discussion_r713508760



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##
@@ -512,6 +517,11 @@ public Builder retainCommits(int commitsRetained) {
   return this;
 }
 
+public Builder retainNumberOfHours(int numberOfHours) {

Review comment:
   can we name the arg same as its usage. cleanerHoursRetained

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCleaningPolicy.java
##
@@ -22,5 +22,5 @@
  * Hoodie cleaning policies.
  */
 public enum HoodieCleaningPolicy {
-  KEEP_LATEST_FILE_VERSIONS, KEEP_LATEST_COMMITS;
+  KEEP_LATEST_FILE_VERSIONS, KEEP_LATEST_COMMITS, KEEP_LAST_X_HOURS;

Review comment:
   KEEP_LATEST_BY_HOURS

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java
##
@@ -1240,6 +1244,154 @@ public void testKeepLatestCommits(boolean 
simulateFailureRetry, boolean enableIn
 assertTrue(testTable.baseFileExists(p0, "05", file3P0C2));
   }
 
+  /**
+   * Test cleaning policy based on number of hours retained policy. This test 
case covers the case when files will not be cleaned.
+   */
+  @ParameterizedTest
+  @MethodSource("argumentsForTestKeepLatestCommits")
+  public void testKeepXHoursNoCleaning(boolean simulateFailureRetry, boolean 
enableIncrementalClean, boolean enableBootstrapSourceClean) throws Exception {

Review comment:
   again, is it possible to reuse existing tests for keepLatestcommits 
rather than rewriting entire tests.

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##
@@ -402,9 +478,16 @@ private String 
getLatestVersionBeforeCommit(List fileSliceList, Hoodi
   public Option getEarliestCommitToRetain() {
 Option earliestCommitToRetain = Option.empty();
 int commitsRetained = config.getCleanerCommitsRetained();
+int hoursRetained = config.getCleanerHoursRetained();
 if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS
 && commitTimeline.countInstants() > commitsRetained) {
-  earliestCommitToRetain = 
commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained);
+  earliestCommitToRetain = 
commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); 
//15 instants total, 10 commits to retain, this gives 6th instant in the list
+} else if (config.getCleanerPolicy() == 
HoodieCleaningPolicy.KEEP_LAST_X_HOURS) {
+  Instant instant = Instant.now();
+  ZonedDateTime commitDateTime = ZonedDateTime.ofInstant(instant, 
ZoneId.systemDefault());

Review comment:
   Do we have any precedence in hudi code base for doing time based 
calculations. Can you explore and let me know. Wanna maintain some uniformity 
if we have any. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##
@@ -69,6 +69,11 @@
   .withDocumentation("Number of commits to retain, without cleaning. This 
will be retained for num_of_commits * time_between_commits "
   + "(scheduled). This also directly translates into how much data 
retention the table supports for incremental queries.");
 
+  public static final ConfigProperty CLEANER_HOURS_RETAINED = 
ConfigProperty.key("hoodie.cleaner.hours.retained")
+  .defaultValue("5")

Review comment:
   lets have 24 may be. 5 hours is very aggressive. 1 day seems reasonable. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##
@@ -330,6 +336,74 @@ public CleanPlanner(HoodieEngineContext context, 
HoodieTable hoodieT
 }
 return deletePaths;
   }
+
+  /**
+   * This method finds the files to be cleaned based on the number of hours. 
If {@code config.getCleanerHoursRetained()} is set to 5,
+   * all the files with commit time earlier than 5 hours will be removed. Also 
the latest file for any file group is retained.
+   * This policy gives much more flexibility to users for retaining data for 
running incremental queries as compared to
+   * KEEP_LATEST_COMMITS cleaning policy. The default number of hours is 5.
+   * @param partitionPath partition path to check
+   * @return list of files to clean
+   */
+  private List getFilesToCleanKeepingLatestHours(String 
partitionPath) {

Review comment:
   can't we re-use getFilesToCleanKeepingLatestCommits(). all we need to do 
is to move most of these to a private method and reuse across both. 
   for getFilesToCleanKeepingLatestCommits(), you can pass in the config value 
for N commits to retain. where as for getFilesToCleanKeepingLatestHours, we can 
determine how many commits can be retained and then pass in the N value.
   lets try to re-use code as

[GitHub] [hudi] nsivabalan commented on pull request #3630: [HUDI-313] NPE when select count start from a realtime table

2021-09-21 Thread GitBox



nsivabalan commented on pull request #3630:
URL: https://github.com/apache/hudi/pull/3630#issuecomment-924490394


   @codope : can you please review this as you are working w/ realtime input 
format recently. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #3648: [HUDI-2413] fix Sql source's checkpoint issue

2021-09-21 Thread GitBox



nsivabalan commented on pull request #3648:
URL: https://github.com/apache/hudi/pull/3648#issuecomment-924487255


   @codope : can you review this as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (5a94043 -> 55df8f6)

2021-09-21 Thread sivabalan

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 5a94043  [HUDI-2343]Fix the exception for mergeInto when the 
primaryKey and preCombineField of source table and target table differ in case 
only (#3517)
 add 55df8f6  [MINOR] Fix typo."funcitons" corrected to "functions" (#3681)

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/hudi/hadoop/HoodieColumnProjectionUtils.java   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

[GitHub] [hudi] nsivabalan merged pull request #3681: [MINOR]Fix typo."funcitons" corrected to "functions"

2021-09-21 Thread GitBox



nsivabalan merged pull request #3681:
URL: https://github.com/apache/hudi/pull/3681


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #3691: [HUDI-2455] Adding spark_avro dependency to hudi-integ-test

2021-09-21 Thread GitBox



nsivabalan commented on pull request #3691:
URL: https://github.com/apache/hudi/pull/3691#issuecomment-924486497


   ```
   mvn package -DskipTests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #3691: [HUDI-2455] Adding spark_avro dependency to hudi-integ-test

2021-09-21 Thread GitBox



nsivabalan commented on pull request #3691:
URL: https://github.com/apache/hudi/pull/3691#issuecomment-924486402


   yeah, I could build successfully w/ master. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan removed a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-21 Thread GitBox



nsivabalan removed a comment on pull request #3590:
URL: https://github.com/apache/hudi/pull/3590#issuecomment-92856


   So, given this approach, we could also support async compaction and 
clustering in metadata table.
   
   Here is what we could do.
   all things stay same wrt data table. i.e. 
   take locks and do conflict resolution for all regular writes, commit and 
release locks. 
   take locks and do conflict resolution while scheduling compaction/clustering 
and release locks. 
   take locks and commit compaction and clustering. 
   
   when it comes to metadata table. We will enable multi-writer mode in 
metadata table. (As of this patch, we have only single writer mode for metadata 
table)
   all writes to metadata table happens within data table lock. And so wrt new 
delta commits to metadata table, it is always going to be a single writer 
implicitly. 
   after committing to metadata table, we can just schedule compaction and 
cleaning if something is available. This internally will take the lock for 
metadata table and check for any conflicts, but since there are no other 
writers, we should be good. 
   and once the commit and scheduling completes, we return to data table, make 
the commit and release the lockk.
   
   Later when async compaction for metadata table is about to get committed, we 
take metadata table lock and make the commit. this will ensure this may not 
collide with regular delta commits happening to metadata table. We may not 
invoke any conflict resolution here similar to how it is done in data table. 
   But one major issue we need to fix here is the ConflictResolutionStrategy: 
as of now, there are any pending or complete compactions after current commit 
of interest, writes will fail. since all of them are going to operate on the 
same partition with one file group, there will definitely be conflict. So, just 
for metadata, we might want to consider if we can come up with a special 
conflict resolution strategy where we consider only new writes as conflicts and 
not any scheduled compaction). I need to understand the implications of this in 
more finer detail. But just putting it out here. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2468][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3590:
URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120


   
   ## CI report:
   
   * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN
   * 7793fbdb9b93a129ef606cb2d73ea6e1e9074957 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2299)
 
   * 9c0123c0f27f990d009b323bab75b76ceecf3dab UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-2472) Tests failure follow up when metadata is enabled by default

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2472:
-

Assignee: sivabalan narayanan

> Tests failure follow up when metadata is enabled by default
> ---
>
> Key: HUDI-2472
> URL: https://issues.apache.org/jira/browse/HUDI-2472
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> We plan to enable metadata by default. but there are some tests that fail 
> with this. Dumping details on tests for which metadata is disabled for now. 
> We need to fix them one by one.  
>  
> hudi-spark-client: // this is the module that has lot of tests that could 
> potentially have issues. 
> TestHoodieSparkMergeOnReadTableIncrementalRead.testIncrementalReadsWithCompaction.
>  disabled metadata for now. directly accesses files.
> TestHoodieIndex.
>  testSimpleTagLocationAndUpdateWithRollback. known issue.  
> https://issues.apache.org/jira/browse/HUDI-2468
> testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath. uses test 
> table. disabled metadata.
> TestHoodieRowCreateHandle.testInstantiationFailure. disabled metadata. not a 
> real issue. 
>  
> TestHoodieSparkMergeOnReadTableRollback.testMultiRollbackWithDeltaAndCompactionCommit.
>  restore fails. bcoz, there is an inflight rollback in dataset timeline. 
> disabling for now. https://issues.apache.org/jira/browse/HUDI-2477
>  TestHoodieMergeOnReadTable.testLogFileCountsAfterCompaction. uses 
> HoodieSparkWriteableTestTable. disabled metadata for now.
>  TestHoodieCompactor.testWriteStatusContentsAfterCompaction. uses 
> HoodieSparkWriteableTestTable. have disabled metadata.
>  TestHbaseIndex.testEnsureTagLocationUsesCommitTimeline. rolling back 1st 
> commit. known issue. disabling metadata. 
> https://issues.apache.org/jira/browse/HUDI-2468
>  TestHbaseIndex.testSimpleTagLocationAndUpdateWithRollback. rolling back 1st 
> commit. known issue. disabling metadata. 
> https://issues.apache.org/jira/browse/HUDI-2468
> TestCleaner. lot of tests. uses test table.
>  TestHoodieTimelineArchiveLog. lot of tests. uses test table. 
> hudi-client-common: all passed.
>  hudi-flink-client: all passed. 
>  hudi-java-client: disabled metadata for java. all ok.
>  hudi-common: all passed. 
>  hudi-spark java: Testbootstrap class fully fails. rollback of 1st commit. 
> have disbaled metadata. https://issues.apache.org/jira/browse/HUDI-2477
> hudi-spark scala tests: all good.
>  hudi-utilities: one test in deltastreamer. 
>  hudi-timelineserver: all good.
>  hudi-sync: 
>      hudi-dla-sync: all good. 
>      hudi-hive-sync: all good. 
>  hudi-spark3: all good.
>  hudi-spark2: all good.
>  hudi-examples: no tests.
>  
> pending modules.
> hudi-cli
> hudi-integ-test 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2472) Tests failure follow up when metadata is enabled by default

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2472:
--
Summary: Tests failure follow up when metadata is enabled by default  (was: 
Test failure follow up when metadata is enabled by default)

> Tests failure follow up when metadata is enabled by default
> ---
>
> Key: HUDI-2472
> URL: https://issues.apache.org/jira/browse/HUDI-2472
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: sivabalan narayanan
>Priority: Major
>
> We plan to enable metadata by default. but there are some tests that fail 
> with this. Dumping details on tests for which metadata is disabled for now. 
> We need to fix them one by one.  
>  
> hudi-spark-client: // this is the module that has lot of tests that could 
> potentially have issues. 
> TestHoodieSparkMergeOnReadTableIncrementalRead.testIncrementalReadsWithCompaction.
>  disabled metadata for now. directly accesses files.
> TestHoodieIndex.
>  testSimpleTagLocationAndUpdateWithRollback. known issue.  
> https://issues.apache.org/jira/browse/HUDI-2468
> testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath. uses test 
> table. disabled metadata.
> TestHoodieRowCreateHandle.testInstantiationFailure. disabled metadata. not a 
> real issue. 
>  
> TestHoodieSparkMergeOnReadTableRollback.testMultiRollbackWithDeltaAndCompactionCommit.
>  restore fails. bcoz, there is an inflight rollback in dataset timeline. 
> disabling for now. https://issues.apache.org/jira/browse/HUDI-2477
>  TestHoodieMergeOnReadTable.testLogFileCountsAfterCompaction. uses 
> HoodieSparkWriteableTestTable. disabled metadata for now.
>  TestHoodieCompactor.testWriteStatusContentsAfterCompaction. uses 
> HoodieSparkWriteableTestTable. have disabled metadata.
>  TestHbaseIndex.testEnsureTagLocationUsesCommitTimeline. rolling back 1st 
> commit. known issue. disabling metadata. 
> https://issues.apache.org/jira/browse/HUDI-2468
>  TestHbaseIndex.testSimpleTagLocationAndUpdateWithRollback. rolling back 1st 
> commit. known issue. disabling metadata. 
> https://issues.apache.org/jira/browse/HUDI-2468
> TestCleaner. lot of tests. uses test table.
>  TestHoodieTimelineArchiveLog. lot of tests. uses test table. 
> hudi-client-common: all passed.
>  hudi-flink-client: all passed. 
>  hudi-java-client: disabled metadata for java. all ok.
>  hudi-common: all passed. 
>  hudi-spark java: Testbootstrap class fully fails. rollback of 1st commit. 
> have disbaled metadata. https://issues.apache.org/jira/browse/HUDI-2477
> hudi-spark scala tests: all good.
>  hudi-utilities: one test in deltastreamer. 
>  hudi-timelineserver: all good.
>  hudi-sync: 
>      hudi-dla-sync: all good. 
>      hudi-hive-sync: all good. 
>  hudi-spark3: all good.
>  hudi-spark2: all good.
>  hudi-examples: no tests.
>  
> pending modules.
> hudi-cli
> hudi-integ-test 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-2478) Handle failure mid-way during init buckets

2021-09-21 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-2478:
-

 Summary: Handle failure mid-way during init buckets
 Key: HUDI-2478
 URL: https://issues.apache.org/jira/browse/HUDI-2478
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: sivabalan narayanan


If process crashes mid-way while instantiating buckets, if tried again, it 
should work seamlessly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2478) Handle failure mid-way during init buckets

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2478:
--
Parent: HUDI-1292
Issue Type: Sub-task  (was: Improvement)

> Handle failure mid-way during init buckets
> --
>
> Key: HUDI-2478
> URL: https://issues.apache.org/jira/browse/HUDI-2478
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> If process crashes mid-way while instantiating buckets, if tried again, it 
> should work seamlessly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-2478) Handle failure mid-way during init buckets

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2478:
-

Assignee: sivabalan narayanan

> Handle failure mid-way during init buckets
> --
>
> Key: HUDI-2478
> URL: https://issues.apache.org/jira/browse/HUDI-2478
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> If process crashes mid-way while instantiating buckets, if tried again, it 
> should work seamlessly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2478) Handle failure mid-way during init buckets

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2478:
--
Fix Version/s: 0.10.0

> Handle failure mid-way during init buckets
> --
>
> Key: HUDI-2478
> URL: https://issues.apache.org/jira/browse/HUDI-2478
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> If process crashes mid-way while instantiating buckets, if tried again, it 
> should work seamlessly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3698:
URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554


   
   ## CI report:
   
   * 78605ec9bca5f63118d4e9c93010e32d2c6f1d0a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2306)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ZeMirella opened a new issue #3699: [SUPPORT] Job hanging on toRdd at HoodieSparkUtils

2021-09-21 Thread GitBox



ZeMirella opened a new issue #3699:
URL: https://github.com/apache/hudi/issues/3699


   **Describe the problem you faced**
   My job hangs during the toRdd task here is some screenshots of the task size
   
   https://user-images.githubusercontent.com/75490501/134254359-2f2d0cb9-8bb8-48d2-b02d-f84fd9dab9d6.png";>
   
   https://user-images.githubusercontent.com/75490501/134254377-963509f6-3075-4509-bf88-7188dc71d548.png";>
   
   **My spark-submit**
   `spark-submit --deploy-mode cluster --conf spark.executor.cores=5 --conf 
spark.executor.memoryOverhead=6g --conf spark.executor.memory=43g --conf 
spark.dynamicAllocation.maxExecutors=50 --conf 
spark.sql.hive.convertMetastoreParquet=false --conf spark.rdd.compress=true 
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.kryoserializer.buffer.max=512m --packages 
org.apache.hudi:hudi-spark-bundle_2.12:0.8.0,org.apache.spark:spark-avro_2.12:3.0.1,com.audienceproject:spark-dynamodb_2.12:1.1.2
 --py-files s3://bucket/modules.zip --files s3://bucket/config.yml 
s3://bucket/main.py`
   
   **My cluster configuration**
   
   https://user-images.githubusercontent.com/75490501/134254642-ffdbc239-77b2-4d47-be1e-8a3872280533.png";>
   
   I also set shuffle.parallelism=2000
   
   **Expected behavior**
   It should run without hang
   
   **Environment Description**
   
   * Hudi version : 0.8
   
   * Spark version : 3.0
   
   * Hive version : 3.1.2
   
   * Hadoop version : Amazon 3.2.1
   
   * Storage (HDFS/S3/GCS..) : s3
   
   * Running on Docker? (yes/no) : no
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2472) Test failure follow up when metadata is enabled by default

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2472:
--
Description: 
We plan to enable metadata by default. but there are some tests that fail with 
this. Dumping details on tests for which metadata is disabled for now. We need 
to fix them one by one.  

 

hudi-spark-client: // this is the module that has lot of tests that could 
potentially have issues. 

TestHoodieSparkMergeOnReadTableIncrementalRead.testIncrementalReadsWithCompaction.
 disabled metadata for now. directly accesses files.

TestHoodieIndex.
 testSimpleTagLocationAndUpdateWithRollback. known issue.  
https://issues.apache.org/jira/browse/HUDI-2468

testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath. uses test table. 
disabled metadata.

TestHoodieRowCreateHandle.testInstantiationFailure. disabled metadata. not a 
real issue. 
 
TestHoodieSparkMergeOnReadTableRollback.testMultiRollbackWithDeltaAndCompactionCommit.
 restore fails. bcoz, there is an inflight rollback in dataset timeline. 
disabling for now. https://issues.apache.org/jira/browse/HUDI-2477
 TestHoodieMergeOnReadTable.testLogFileCountsAfterCompaction. uses 
HoodieSparkWriteableTestTable. disabled metadata for now.
 TestHoodieCompactor.testWriteStatusContentsAfterCompaction. uses 
HoodieSparkWriteableTestTable. have disabled metadata.
 TestHbaseIndex.testEnsureTagLocationUsesCommitTimeline. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468
 TestHbaseIndex.testSimpleTagLocationAndUpdateWithRollback. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468

TestCleaner. lot of tests. uses test table.
 TestHoodieTimelineArchiveLog. lot of tests. uses test table. 

hudi-client-common: all passed.
 hudi-flink-client: all passed. 
 hudi-java-client: disabled metadata for java. all ok.
 hudi-common: all passed. 
 hudi-spark java: Testbootstrap class fully fails. rollback of 1st commit. have 
disbaled metadata. https://issues.apache.org/jira/browse/HUDI-2477

hudi-spark scala tests: all good.
 hudi-utilities: one test in deltastreamer. 
 hudi-timelineserver: all good.
 hudi-sync: 
     hudi-dla-sync: all good. 
     hudi-hive-sync: all good. 
 hudi-spark3: all good.
 hudi-spark2: all good.
 hudi-examples: no tests.

 

pending modules.

hudi-cli

hudi-integ-test 

 

 

 

 

  was:
We plan to enable metadata by default. but there are some tests that fail with 
this. Dumping details on tests for which metadata is disabled for now. We need 
to fix them one by one.  

 

hudi-spark-client: // this is the module that has lot of tests that could 
potentially have issues. 

TestHoodieSparkMergeOnReadTableIncrementalRead.testIncrementalReadsWithCompaction.
 disabled metadata for now. directly accesses files.

TestHoodieIndex.
 testSimpleTagLocationAndUpdateWithRollback. known issue.  
https://issues.apache.org/jira/browse/HUDI-2468

testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath. uses test table. 
disabled metadata.

TestHoodieRowCreateHandle.testInstantiationFailure. disabled metadata. 
 
TestHoodieSparkMergeOnReadTableRollback.testMultiRollbackWithDeltaAndCompactionCommit.
 restore fails. bcoz, there is an inflight rollback in dataset timeline. 
disabling for now. 
 TestHoodieSparkMergeOnReadTableInsertUpdateDelete.testSimpleInsertAndUpdate. 
fixed sync for schedule internal table service. and now succeeds. 
 TestHoodieMergeOnReadTable.testLogFileCountsAfterCompaction. uses 
HoodieSparkWriteableTestTable. disabled metadata for now. 
 TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime. all succeeding 
now.
 TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime. all succeeding 
now. 
 TestHoodieCompactor.testWriteStatusContentsAfterCompaction. uses 
HoodieSparkWriteableTestTable. have disabled metadata.
 TestHbaseIndex.testEnsureTagLocationUsesCommitTimeline. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468
 TestHbaseIndex.testSimpleTagLocationAndUpdateWithRollback. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468
 TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict. 
succeeded on retry.

TestCleaner. lot of tests. uses test table.
 TestHoodieTimelineArchiveLog. lot of tests. uses test table. 


hudi-client-common: all passed.
hudi-flink-client: all passed. 
hudi-java-client: disabled metadata for java. all ok.
hudi-common: all passed. 
hudi-spark java: Testbootstrap class fully fails. rollback of 1st commit. have 
disbaled metadata. https://issues.apache.org/jira/browse/HUDI-2477

hudi-spark scala tests: all good.
hudi-utilities: one test in deltastreamer. 
hudi-timelineserver: all good.
hudi-sync: 
    hudi-dla-sync: all good. 
    hudi-hive-sync

[jira] [Updated] (HUDI-2472) Test failure follow up when metadata is enabled by default

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2472:
--
Description: 
We plan to enable metadata by default. but there are some tests that fail with 
this. Dumping details on tests for which metadata is disabled for now. We need 
to fix them one by one.  

 

hudi-spark-client: // this is the module that has lot of tests that could 
potentially have issues. 

TestHoodieSparkMergeOnReadTableIncrementalRead.testIncrementalReadsWithCompaction.
 disabled metadata for now. directly accesses files.

TestHoodieIndex.
 testSimpleTagLocationAndUpdateWithRollback. known issue.  
https://issues.apache.org/jira/browse/HUDI-2468

testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath. uses test table. 
disabled metadata.

TestHoodieRowCreateHandle.testInstantiationFailure. disabled metadata. 
 
TestHoodieSparkMergeOnReadTableRollback.testMultiRollbackWithDeltaAndCompactionCommit.
 restore fails. bcoz, there is an inflight rollback in dataset timeline. 
disabling for now. 
 TestHoodieSparkMergeOnReadTableInsertUpdateDelete.testSimpleInsertAndUpdate. 
fixed sync for schedule internal table service. and now succeeds. 
 TestHoodieMergeOnReadTable.testLogFileCountsAfterCompaction. uses 
HoodieSparkWriteableTestTable. disabled metadata for now. 
 TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime. all succeeding 
now.
 TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime. all succeeding 
now. 
 TestHoodieCompactor.testWriteStatusContentsAfterCompaction. uses 
HoodieSparkWriteableTestTable. have disabled metadata.
 TestHbaseIndex.testEnsureTagLocationUsesCommitTimeline. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468
 TestHbaseIndex.testSimpleTagLocationAndUpdateWithRollback. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468
 TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict. 
succeeded on retry.

TestCleaner. lot of tests. uses test table.
 TestHoodieTimelineArchiveLog. lot of tests. uses test table. 


hudi-client-common: all passed.
hudi-flink-client: all passed. 
hudi-java-client: disabled metadata for java. all ok.
hudi-common: all passed. 
hudi-spark java: Testbootstrap class fully fails. rollback of 1st commit. have 
disbaled metadata. https://issues.apache.org/jira/browse/HUDI-2477

hudi-spark scala tests: all good.
hudi-utilities: one test in deltastreamer. 
hudi-timelineserver: all good.
hudi-sync: 
    hudi-dla-sync: all good. 
    hudi-hive-sync: all good. 
hudi-spark3: all good.
hudi-spark2: all good.
hudi-examples: no tests.

 

pending modules.

hudi-cli

hudi-integ-test 

 

 

 

 

  was:
We plan to enable metadata by default. but there are some tests that fail with 
this. Dumping details on tests for which metadata is disabled for now. We need 
to fix them one by one.  

 

hudi-spark-client: 

TestHoodieSparkMergeOnReadTableIncrementalRead.testIncrementalReadsWithCompaction.
 disabled metadata for now. directly accesses files.

TestHoodieIndex.
testSimpleTagLocationAndUpdateWithRollback. known issue.  
https://issues.apache.org/jira/browse/HUDI-2468

testSimpleGlobalIndexTagLocationWhenShouldUpdatePartitionPath. uses test table. 
disabled metadata.

TestHoodieRowCreateHandle.testInstantiationFailure. disabled metadata. 
TestHoodieSparkMergeOnReadTableRollback.testMultiRollbackWithDeltaAndCompactionCommit.
 restore fails. bcoz, there is an inflight rollback in dataset timeline. 
disabling for now. 
TestHoodieSparkMergeOnReadTableInsertUpdateDelete.testSimpleInsertAndUpdate. 
fixed sync for schedule internal table service. and now succeeds. 
TestHoodieMergeOnReadTable.testLogFileCountsAfterCompaction. uses 
HoodieSparkWriteableTestTable. disabled metadata for now. 
TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime. all succeeding 
now.
TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime. all succeeding 
now. 
TestHoodieCompactor.testWriteStatusContentsAfterCompaction. uses 
HoodieSparkWriteableTestTable. have disabled metadata.
TestHbaseIndex.testEnsureTagLocationUsesCommitTimeline. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468
TestHbaseIndex.testSimpleTagLocationAndUpdateWithRollback. rolling back 1st 
commit. known issue. disabling metadata. 
https://issues.apache.org/jira/browse/HUDI-2468
TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict. 
succeeded on retry.

TestCleaner. lot of tests. uses test table.
TestHoodieTimelineArchiveLog. lot of tests. uses test table. 

 


hudi-client-common: all passed.
hudi-flink-client: all passed. 
hudi-java-client: disabled metadata for java. all ok.
hudi-common: all passed. 
hudi-spark: java Testbootstrap class fully fails.

[GitHub] [hudi] hudi-bot edited a comment on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3698:
URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554


   
   ## CI report:
   
   * 78605ec9bca5f63118d4e9c93010e32d2c6f1d0a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2306)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-21 Thread GitBox



hudi-bot commented on pull request #3698:
URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554


   
   ## CI report:
   
   * 78605ec9bca5f63118d4e9c93010e32d2c6f1d0a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-21 Thread GitBox



nsivabalan commented on pull request #3698:
URL: https://github.com/apache/hudi/pull/3698#issuecomment-924410099


   @nbalajee @n3nash : Can you folks please review this. This is a blocker for 
synchronous metadata patch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2474) Fix refreshing timeline for every operation

2021-09-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2474:
-
Labels: pull-request-available  (was: )

> Fix refreshing timeline for every operation
> ---
>
> Key: HUDI-2474
> URL: https://issues.apache.org/jira/browse/HUDI-2474
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> with metadata enabled, we need to refresh timeline for the table before very 
> operation. if not, some states might be missed out. Atleast for deltastreamer 
> continuous mode, we need this to be fixed. also some tests are failing due to 
> this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] nsivabalan opened a new pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-21 Thread GitBox



nsivabalan opened a new pull request #3698:
URL: https://github.com/apache/hudi/pull/3698


   ## What is the purpose of the pull request
   
   - Timeline is not refreshed before every operation. With enabling metadata, 
some operations could fail if not refreshed. 
   - Without enabling metadata, this will not be an issue since following 
operations anyways will list the fs to fetch values, which is not the case with 
metadata table. 
   
   ## Brief change log
   
   - Added refresh of timeline before every operation.
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
   - Enabled metadata for TestHoodieSparkMergeOnReadTableInsertUpdateDelete to 
verify the fix. w/o the fix, the test fails. 
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2477) Restore fails after adding rollback plan and rollback.requested instant w/ metadata enabled

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2477:
--
Description: 
restore triggers rollback of N commits and then finally will commit the 
restore. None of rollbacks will be published to timeline. 

But after we have added the rollback.requested instant, restore is breaking w/ 
metadata enabled. 

Here is what is happening:

Restore

     schedule rollback for all of N commits. this will produce 
rollback.requested instants to timeline. Remember we can't skip this 
publishing, bcoz, rollback action executor depends on this. 

    trigger rollback action executor. which will execute the rollback. but this 
time we may not publish the rollbacks. and so there won't be a rollback 
completed instant. 

and now to finalize restore, we apply the changes to metadata table before we 
can commit the restore to datatable. Here is where the issue is. We do check if 
bootstrapping is required. chances that last synced instant to metadata table 
is not active anymore in data table and so it triggers a bootstrap. but we do 
allow bootstrap only if there are no pending operations in datatable. But all 
rollbacks are surfacing as pending operations and hence we fail here. 

 

This could also be an issue when we try to play with bootstrap in original 
dataset. 

bootstrap. and for some reason you want to rollback bootstrap. this might end 
up in this state too. 

to illustrate clearly. 

bootstrap

    also apply changes to metadata. there is only one commit.

rollback bootstrap

   this is a restore operation. so, we first do a rollback which will create 
rollback.requested instant. 

               and to finalize restore, we try to apply the restore to 
metadata. 

                    this goes into bootstrap code path. last synced instant is 
not found in datatimeline. we assume its archived and so trigger a rebootstrap 
and so delete the metadata table. 

                    and then try to do the actual bootstrap. but since there is 
a pending operation in datatimeline (rollback.requested), we will not do any 
bootstrap only. and so the state remains. i.e. metadata table is deleted. and 
the actually applying restore commit will fail. 

 

 

We also need to think even if metadata is enabled, should we leave the rollback 
instant in timeline itself.or should we clean it up after committing restore to 
timeline. 

 

 

 

  was:
restore triggers rollback of N commits and then finally will commit the 
restore. None of rollbacks will be published to timeline. 

But after we have added the rollback.requested instant, restore is breaking w/ 
metadata enabled. 

Here is what is happening:

Restore

     schedule rollback for all of N commits. this will produce 
rollback.requested instants to timeline. Remember we can't skip this 
publishing, bcoz, rollback action executor depends on this. 

    trigger rollback action executor. which will execute the rollback. but this 
time we may not publish the rollbacks. and so there won't be a rollback 
completed instant. 

and now to finalize restore, we apply the changes to metadata table before we 
can commit the restore to datatable. Here is where the issue is. We do check if 
bootstrapping is required. chances that last synced instant to metadata table 
is not active anymore in data table and so it triggers a bootstrap. but we do 
allow bootstrap only if there are no pending operations in datatable. But all 
rollbacks are surfacing as pending operations and hence we fail here. 

 

This could also be an issue when we try to play with bootstrap in original 
dataset. 

bootstrap. and for some reason you want to rollback bootstrap. this might end 
up in this state too. 

 

We also need to think even if metadata is enabled, should we leave the rollback 
instant in timeline itself.or should we clean it up after committing restore to 
timeline. 

 

 

 


> Restore fails after adding rollback plan and rollback.requested instant w/ 
> metadata enabled
> ---
>
> Key: HUDI-2477
> URL: https://issues.apache.org/jira/browse/HUDI-2477
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> restore triggers rollback of N commits and then finally will commit the 
> restore. None of rollbacks will be published to timeline. 
> But after we have added the rollback.requested instant, restore is breaking 
> w/ metadata enabled. 
> Here is what is happening:
> Restore
>      schedule rollback for all of N commits. this will produce 
> rollback.requested instants to timeline. Remember we can't skip this 
> publishing, bcoz, rollbac

[jira] [Updated] (HUDI-2477) Restore fails after adding rollback plan and rollback.requested instant w/ metadata enabled

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2477:
--
Description: 
restore triggers rollback of N commits and then finally will commit the 
restore. None of rollbacks will be published to timeline. 

But after we have added the rollback.requested instant, restore is breaking w/ 
metadata enabled. 

Here is what is happening:

Restore

     schedule rollback for all of N commits. this will produce 
rollback.requested instants to timeline. Remember we can't skip this 
publishing, bcoz, rollback action executor depends on this. 

    trigger rollback action executor. which will execute the rollback. but this 
time we may not publish the rollbacks. and so there won't be a rollback 
completed instant. 

and now to finalize restore, we apply the changes to metadata table before we 
can commit the restore to datatable. Here is where the issue is. We do check if 
bootstrapping is required. chances that last synced instant to metadata table 
is not active anymore in data table and so it triggers a bootstrap. but we do 
allow bootstrap only if there are no pending operations in datatable. But all 
rollbacks are surfacing as pending operations and hence we fail here. 

 

This could also be an issue when we try to play with bootstrap in original 
dataset. 

bootstrap. and for some reason you want to rollback bootstrap. this might end 
up in this state too. 

 

We also need to think even if metadata is enabled, should we leave the rollback 
instant in timeline itself.or should we clean it up after committing restore to 
timeline. 

 

 

 

  was:
restore triggers rollback of N commits and then finally will commit the 
restore. None of rollbacks will be published to timeline. 

But after we have added the rollback.requested instant, restore is breaking w/ 
metadata enabled. 

Here is what is happening:

Restore

     schedule rollback for all of N commits. this will produce 
rollback.requested instants to timeline. Remember we can't skip this 
publishing, bcoz, rollback action executor depends on this. 

    trigger rollback action executor. which will execute the rollback. but this 
time we may not publish the rollbacks. and so there won't be a rollback 
completed instant. 

and now to finalize restore, we apply the changes to metadata table before we 
can commit the restore to datatable. Here is where the issue is. We do check if 
bootstrapping is required. chances that last synced instant to metadata table 
is not active anymore in data table and so it triggers a bootstrap. but we do 
allow bootstrap only if there are no pending operations in datatable. But all 
rollbacks are surfacing as pending operations and hence we fail here. 

 

This could also be an issue when we try to play with bootstrap in original 
dataset. 

bootstrap. and for some reason you want to rollback bootstrap. this might end 
up in this state too. 

 

 

 

 

 


> Restore fails after adding rollback plan and rollback.requested instant w/ 
> metadata enabled
> ---
>
> Key: HUDI-2477
> URL: https://issues.apache.org/jira/browse/HUDI-2477
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> restore triggers rollback of N commits and then finally will commit the 
> restore. None of rollbacks will be published to timeline. 
> But after we have added the rollback.requested instant, restore is breaking 
> w/ metadata enabled. 
> Here is what is happening:
> Restore
>      schedule rollback for all of N commits. this will produce 
> rollback.requested instants to timeline. Remember we can't skip this 
> publishing, bcoz, rollback action executor depends on this. 
>     trigger rollback action executor. which will execute the rollback. but 
> this time we may not publish the rollbacks. and so there won't be a rollback 
> completed instant. 
> and now to finalize restore, we apply the changes to metadata table before we 
> can commit the restore to datatable. Here is where the issue is. We do check 
> if bootstrapping is required. chances that last synced instant to metadata 
> table is not active anymore in data table and so it triggers a bootstrap. but 
> we do allow bootstrap only if there are no pending operations in datatable. 
> But all rollbacks are surfacing as pending operations and hence we fail here. 
>  
> This could also be an issue when we try to play with bootstrap in original 
> dataset. 
> bootstrap. and for some reason you want to rollback bootstrap. this might end 
> up in this state too. 
>  
> We also need to think even if metadata is enabled, should we leave

[jira] [Updated] (HUDI-2477) Restore fails after adding rollback plan and rollback.requested instant w/ metadata enabled

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2477:
--
Summary: Restore fails after adding rollback plan and rollback.requested 
instant w/ metadata enabled  (was: Restore fails after adding rollback plan and 
rollback.requested instant)

> Restore fails after adding rollback plan and rollback.requested instant w/ 
> metadata enabled
> ---
>
> Key: HUDI-2477
> URL: https://issues.apache.org/jira/browse/HUDI-2477
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> restore triggers rollback of N commits and then finally will commit the 
> restore. None of rollbacks will be published to timeline. 
> But after we have added the rollback.requested instant, restore is breaking 
> w/ metadata enabled. 
> Here is what is happening:
> Restore
>      schedule rollback for all of N commits. this will produce 
> rollback.requested instants to timeline. Remember we can't skip this 
> publishing, bcoz, rollback action executor depends on this. 
>     trigger rollback action executor. which will execute the rollback. but 
> this time we may not publish the rollbacks. and so there won't be a rollback 
> completed instant. 
> and now to finalize restore, we apply the changes to metadata table before we 
> can commit the restore to datatable. Here is where the issue is. We do check 
> if bootstrapping is required. chances that last synced instant to metadata 
> table is not active anymore in data table and so it triggers a bootstrap. but 
> we do allow bootstrap only if there are no pending operations in datatable. 
> But all rollbacks are surfacing as pending operations and hence we fail here. 
>  
> This could also be an issue when we try to play with bootstrap in original 
> dataset. 
> bootstrap. and for some reason you want to rollback bootstrap. this might end 
> up in this state too. 
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-2477) Restore fails after adding rollback plan and rollback.requested instant

2021-09-21 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-2477:
-

 Summary: Restore fails after adding rollback plan and 
rollback.requested instant
 Key: HUDI-2477
 URL: https://issues.apache.org/jira/browse/HUDI-2477
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Writer Core
Reporter: sivabalan narayanan


restore triggers rollback of N commits and then finally will commit the 
restore. None of rollbacks will be published to timeline. 

But after we have added the rollback.requested instant, restore is breaking w/ 
metadata enabled. 

Here is what is happening:

Restore

     schedule rollback for all of N commits. this will produce 
rollback.requested instants to timeline. Remember we can't skip this 
publishing, bcoz, rollback action executor depends on this. 

    trigger rollback action executor. which will execute the rollback. but this 
time we may not publish the rollbacks. and so there won't be a rollback 
completed instant. 

and now to finalize restore, we apply the changes to metadata table before we 
can commit the restore to datatable. Here is where the issue is. We do check if 
bootstrapping is required. chances that last synced instant to metadata table 
is not active anymore in data table and so it triggers a bootstrap. but we do 
allow bootstrap only if there are no pending operations in datatable. But all 
rollbacks are surfacing as pending operations and hence we fail here. 

 

This could also be an issue when we try to play with bootstrap in original 
dataset. 

bootstrap. and for some reason you want to rollback bootstrap. this might end 
up in this state too. 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-2477) Restore fails after adding rollback plan and rollback.requested instant

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2477:
-

Assignee: sivabalan narayanan

> Restore fails after adding rollback plan and rollback.requested instant
> ---
>
> Key: HUDI-2477
> URL: https://issues.apache.org/jira/browse/HUDI-2477
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> restore triggers rollback of N commits and then finally will commit the 
> restore. None of rollbacks will be published to timeline. 
> But after we have added the rollback.requested instant, restore is breaking 
> w/ metadata enabled. 
> Here is what is happening:
> Restore
>      schedule rollback for all of N commits. this will produce 
> rollback.requested instants to timeline. Remember we can't skip this 
> publishing, bcoz, rollback action executor depends on this. 
>     trigger rollback action executor. which will execute the rollback. but 
> this time we may not publish the rollbacks. and so there won't be a rollback 
> completed instant. 
> and now to finalize restore, we apply the changes to metadata table before we 
> can commit the restore to datatable. Here is where the issue is. We do check 
> if bootstrapping is required. chances that last synced instant to metadata 
> table is not active anymore in data table and so it triggers a bootstrap. but 
> we do allow bootstrap only if there are no pending operations in datatable. 
> But all rollbacks are surfacing as pending operations and hence we fail here. 
>  
> This could also be an issue when we try to play with bootstrap in original 
> dataset. 
> bootstrap. and for some reason you want to rollback bootstrap. this might end 
> up in this state too. 
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2477) Restore fails after adding rollback plan and rollback.requested instant

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2477:
--
Fix Version/s: 0.10.0

> Restore fails after adding rollback plan and rollback.requested instant
> ---
>
> Key: HUDI-2477
> URL: https://issues.apache.org/jira/browse/HUDI-2477
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> restore triggers rollback of N commits and then finally will commit the 
> restore. None of rollbacks will be published to timeline. 
> But after we have added the rollback.requested instant, restore is breaking 
> w/ metadata enabled. 
> Here is what is happening:
> Restore
>      schedule rollback for all of N commits. this will produce 
> rollback.requested instants to timeline. Remember we can't skip this 
> publishing, bcoz, rollback action executor depends on this. 
>     trigger rollback action executor. which will execute the rollback. but 
> this time we may not publish the rollbacks. and so there won't be a rollback 
> completed instant. 
> and now to finalize restore, we apply the changes to metadata table before we 
> can commit the restore to datatable. Here is where the issue is. We do check 
> if bootstrapping is required. chances that last synced instant to metadata 
> table is not active anymore in data table and so it triggers a bootstrap. but 
> we do allow bootstrap only if there are no pending operations in datatable. 
> But all rollbacks are surfacing as pending operations and hence we fail here. 
>  
> This could also be an issue when we try to play with bootstrap in original 
> dataset. 
> bootstrap. and for some reason you want to rollback bootstrap. this might end 
> up in this state too. 
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2477) Restore fails after adding rollback plan and rollback.requested instant

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2477:
--
Parent: HUDI-1292
Issue Type: Sub-task  (was: Improvement)

> Restore fails after adding rollback plan and rollback.requested instant
> ---
>
> Key: HUDI-2477
> URL: https://issues.apache.org/jira/browse/HUDI-2477
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> restore triggers rollback of N commits and then finally will commit the 
> restore. None of rollbacks will be published to timeline. 
> But after we have added the rollback.requested instant, restore is breaking 
> w/ metadata enabled. 
> Here is what is happening:
> Restore
>      schedule rollback for all of N commits. this will produce 
> rollback.requested instants to timeline. Remember we can't skip this 
> publishing, bcoz, rollback action executor depends on this. 
>     trigger rollback action executor. which will execute the rollback. but 
> this time we may not publish the rollbacks. and so there won't be a rollback 
> completed instant. 
> and now to finalize restore, we apply the changes to metadata table before we 
> can commit the restore to datatable. Here is where the issue is. We do check 
> if bootstrapping is required. chances that last synced instant to metadata 
> table is not active anymore in data table and so it triggers a bootstrap. but 
> we do allow bootstrap only if there are no pending operations in datatable. 
> But all rollbacks are surfacing as pending operations and hence we fail here. 
>  
> This could also be an issue when we try to play with bootstrap in original 
> dataset. 
> bootstrap. and for some reason you want to rollback bootstrap. this might end 
> up in this state too. 
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] Rap70r opened a new issue #3697: [SUPPORT] Performance Tuning: How to speed up stages?

2021-09-21 Thread GitBox



Rap70r opened a new issue #3697:
URL: https://github.com/apache/hudi/issues/3697


   Hello,
   We are using Spark and Hudi to upsert records into parquet in S3, extracted 
from Kafka, using EMR. The events could be either inserts or updates.
   Currently, it takes 41 minutes for the process to extract and upsert 
1,430,000 records (1714 Megabytes).
   We are trying to increase the speed of this process. Below are the details 
of our environment
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   
   * EMR version : 6.4.0
   > Master Instance: 1 r5.xlarge
   > Core Instance: 1 c5.xlarge
   > Task Instance: 25 c5.xlarge
   
   * Spark version : 3.1.2
   
   * Hive version : n/a
   
   * Hadoop version : 3.2.1
   
   * Source : Kafka
   
   * Storage : S3 (as parquet)
   
   * Partitions: 1100
   
   * Partition Size: ~1MB to 30MB each
   
   * Parallelism: 3000
   
   * Operation: Upsert
   
   * Partition : Concatenation of year, month and week of a date field
   
   * Storage Type: COPY_ON_WRITE
   
   * Running on Docker? : no
   
   **Spark-Submit Configs**
   `spark-submit --deploy-mode cluster --conf 
spark.dynamicAllocation.enabled=true --conf 
spark.dynamicAllocation.cachedExecutorIdleTimeout=300s --conf 
spark.dynamicAllocation.executorIdleTimeout=300s --conf 
spark.scheduler.mode=FAIR --conf spark.memory.fraction=0.4 --conf 
spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true 
--conf spark.sql.hive.convertMetastoreParquet=false --conf 
spark.sql.parquet.mergeSchema=true --conf spark.driver.maxResultSize=4g --conf 
spark.driver.memory=4g --conf spark.executor.cores=4 --conf 
spark.driver.memoryOverhead=1g --conf spark.executor.instances=100 --conf 
spark.executor.memoryOverhead=1g --conf spark.driver.cores=4 --conf 
spark.executor.memory=4g --conf spark.rdd.compress=true --conf 
spark.kryoserializer.buffer.max=512m --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.yarn.nodemanager.vmem-check-enabled=false --conf 
yarn.nodemanager.pmem-check-enabled=false --conf spark.sql.shuffle.partitions=10
 0 --conf spark.default.parallelism=100 --conf spark.task.cpus=2`
   
   **Spark Job**
   
![image](https://user-images.githubusercontent.com/22181358/134231023-4aa94788-5f68-4610-843c-1e98187aa810.png)
   
   From the job above, it seems that most of the time is consumed by 
UpsertPartitioner and SparkUpsertCommitActionExecutor events.
   
   Do you have any suggestions on how to reduce the time above job takes to 
complete?
   
   Thank you
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2476) Fix retried compaction commit in datatable fails when applied to metadata w/ sync updates

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2476:
--
Fix Version/s: 0.10.0

> Fix retried compaction commit in datatable fails when applied to metadata w/ 
> sync updates
> -
>
> Key: HUDI-2476
> URL: https://issues.apache.org/jira/browse/HUDI-2476
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> Compaction and clustering has a static instant time. So, when retried it may 
> not have a new instant time, but the same. 
> So, lets walk through the scenario of what happens when compaction fails 
> after synced to metadata table. 
> c1, c2, cc3( compaction commit), c4. 
> c1, c2, c4 on completion is applied to metadata table. 
> cc3 also gets synced to metadata table, but before committing to data table, 
> it failed(process crashed). Its is a small window, but still a possibility. 
> So, from a timeline standpoint this is what looks like
> data timeline: 
> c1 complete, c2 complete, cc3 inflight. c4 complete.
> metadata timeline:
> dc_c1, dc_c2, dc_cc3, dc_c4
> Lets say there are few more commits went in. 
> data timeline: 
> c1 complete, c2 complete, cc3 inflight. c4 complete, c5 complete. c6 complete.
> metadata timeline:
> dc_c1, dc_c2, dc_cc3, dc_c4, dc_c5, dc_c6
>  
> Now, compaction in datatable is being re-attempted. So, first we rollback 
> pending compaction in data table. So, this will trigger an upsert to metadata 
> table. even thought this is a rollback, all updates to metadata table is an 
> upsert which would result in a delta table. 
> And then, the compaction will be retried in datatable. when this is nearing 
> completion, we try to upsert to metadata table. which will fail. because 
> already we have a completed dc_cc3 in metadata table. 
>  
> Fix: 
> when a commit is being retried, we delete the completed instant and then 
> proceed with upsert. So, when log blocks/files are merged together, final 
> state will be intact and will ensure only those files added in 2nd attempt is 
> returned and those added during 1st attempt is not returned (since there will 
> be complimenting log block corresponding to a rollback). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2476) Fix retried compaction commit in datatable fails when applied to metadata w/ sync updates

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2476:
--
Parent: HUDI-1292
Issue Type: Sub-task  (was: Improvement)

> Fix retried compaction commit in datatable fails when applied to metadata w/ 
> sync updates
> -
>
> Key: HUDI-2476
> URL: https://issues.apache.org/jira/browse/HUDI-2476
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> Compaction and clustering has a static instant time. So, when retried it may 
> not have a new instant time, but the same. 
> So, lets walk through the scenario of what happens when compaction fails 
> after synced to metadata table. 
> c1, c2, cc3( compaction commit), c4. 
> c1, c2, c4 on completion is applied to metadata table. 
> cc3 also gets synced to metadata table, but before committing to data table, 
> it failed(process crashed). Its is a small window, but still a possibility. 
> So, from a timeline standpoint this is what looks like
> data timeline: 
> c1 complete, c2 complete, cc3 inflight. c4 complete.
> metadata timeline:
> dc_c1, dc_c2, dc_cc3, dc_c4
> Lets say there are few more commits went in. 
> data timeline: 
> c1 complete, c2 complete, cc3 inflight. c4 complete, c5 complete. c6 complete.
> metadata timeline:
> dc_c1, dc_c2, dc_cc3, dc_c4, dc_c5, dc_c6
>  
> Now, compaction in datatable is being re-attempted. So, first we rollback 
> pending compaction in data table. So, this will trigger an upsert to metadata 
> table. even thought this is a rollback, all updates to metadata table is an 
> upsert which would result in a delta table. 
> And then, the compaction will be retried in datatable. when this is nearing 
> completion, we try to upsert to metadata table. which will fail. because 
> already we have a completed dc_cc3 in metadata table. 
>  
> Fix: 
> when a commit is being retried, we delete the completed instant and then 
> proceed with upsert. So, when log blocks/files are merged together, final 
> state will be intact and will ensure only those files added in 2nd attempt is 
> returned and those added during 1st attempt is not returned (since there will 
> be complimenting log block corresponding to a rollback). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-2476) Fix retried compaction commit in datatable fails when applied to metadata w/ sync updates

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2476:
-

Assignee: sivabalan narayanan

> Fix retried compaction commit in datatable fails when applied to metadata w/ 
> sync updates
> -
>
> Key: HUDI-2476
> URL: https://issues.apache.org/jira/browse/HUDI-2476
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Compaction and clustering has a static instant time. So, when retried it may 
> not have a new instant time, but the same. 
> So, lets walk through the scenario of what happens when compaction fails 
> after synced to metadata table. 
> c1, c2, cc3( compaction commit), c4. 
> c1, c2, c4 on completion is applied to metadata table. 
> cc3 also gets synced to metadata table, but before committing to data table, 
> it failed(process crashed). Its is a small window, but still a possibility. 
> So, from a timeline standpoint this is what looks like
> data timeline: 
> c1 complete, c2 complete, cc3 inflight. c4 complete.
> metadata timeline:
> dc_c1, dc_c2, dc_cc3, dc_c4
> Lets say there are few more commits went in. 
> data timeline: 
> c1 complete, c2 complete, cc3 inflight. c4 complete, c5 complete. c6 complete.
> metadata timeline:
> dc_c1, dc_c2, dc_cc3, dc_c4, dc_c5, dc_c6
>  
> Now, compaction in datatable is being re-attempted. So, first we rollback 
> pending compaction in data table. So, this will trigger an upsert to metadata 
> table. even thought this is a rollback, all updates to metadata table is an 
> upsert which would result in a delta table. 
> And then, the compaction will be retried in datatable. when this is nearing 
> completion, we try to upsert to metadata table. which will fail. because 
> already we have a completed dc_cc3 in metadata table. 
>  
> Fix: 
> when a commit is being retried, we delete the completed instant and then 
> proceed with upsert. So, when log blocks/files are merged together, final 
> state will be intact and will ensure only those files added in 2nd attempt is 
> returned and those added during 1st attempt is not returned (since there will 
> be complimenting log block corresponding to a rollback). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-2476) Fix retried compaction commit in datatable fails when applied to metadata w/ sync updates

2021-09-21 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-2476:
-

 Summary: Fix retried compaction commit in datatable fails when 
applied to metadata w/ sync updates
 Key: HUDI-2476
 URL: https://issues.apache.org/jira/browse/HUDI-2476
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Writer Core
Reporter: sivabalan narayanan


Compaction and clustering has a static instant time. So, when retried it may 
not have a new instant time, but the same. 

So, lets walk through the scenario of what happens when compaction fails after 
synced to metadata table. 

c1, c2, cc3( compaction commit), c4. 

c1, c2, c4 on completion is applied to metadata table. 

cc3 also gets synced to metadata table, but before committing to data table, it 
failed(process crashed). Its is a small window, but still a possibility. 

So, from a timeline standpoint this is what looks like

data timeline: 

c1 complete, c2 complete, cc3 inflight. c4 complete.

metadata timeline:

dc_c1, dc_c2, dc_cc3, dc_c4

Lets say there are few more commits went in. 

data timeline: 

c1 complete, c2 complete, cc3 inflight. c4 complete, c5 complete. c6 complete.

metadata timeline:

dc_c1, dc_c2, dc_cc3, dc_c4, dc_c5, dc_c6

 

Now, compaction in datatable is being re-attempted. So, first we rollback 
pending compaction in data table. So, this will trigger an upsert to metadata 
table. even thought this is a rollback, all updates to metadata table is an 
upsert which would result in a delta table. 

And then, the compaction will be retried in datatable. when this is nearing 
completion, we try to upsert to metadata table. which will fail. because 
already we have a completed dc_cc3 in metadata table. 

 

Fix: 

when a commit is being retried, we delete the completed instant and then 
proceed with upsert. So, when log blocks/files are merged together, final state 
will be intact and will ensure only those files added in 2nd attempt is 
returned and those added during 1st attempt is not returned (since there will 
be complimenting log block corresponding to a rollback). 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3696: [WIP][HUDI-2439] Refactor commit actions in hudi-client module

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3696:
URL: https://github.com/apache/hudi/pull/3696#issuecomment-924126593


   
   ## CI report:
   
   * a4b4afdf7b5dd711cc3978e2c02f9efb0c7b5514 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2305)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-2475) Upgrade downgrade infra for enabling metadata

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2475:
-

Assignee: sivabalan narayanan

> Upgrade downgrade infra for enabling metadata
> -
>
> Key: HUDI-2475
> URL: https://issues.apache.org/jira/browse/HUDI-2475
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Upgrade downgrade infra for enabling metadata



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2475) Upgrade downgrade infra for enabling metadata

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2475:
--
Fix Version/s: 0.10.0

> Upgrade downgrade infra for enabling metadata
> -
>
> Key: HUDI-2475
> URL: https://issues.apache.org/jira/browse/HUDI-2475
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> Upgrade downgrade infra for enabling metadata



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-2475) Upgrade downgrade infra for enabling metadata

2021-09-21 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-2475:
-

 Summary: Upgrade downgrade infra for enabling metadata
 Key: HUDI-2475
 URL: https://issues.apache.org/jira/browse/HUDI-2475
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: sivabalan narayanan


Upgrade downgrade infra for enabling metadata



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3696: [WIP][HUDI-2439] Refactor commit actions in hudi-client module

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3696:
URL: https://github.com/apache/hudi/pull/3696#issuecomment-924126593


   
   ## CI report:
   
   * 67d66c27fb4cf090d7d4ca43daf691ed337f6971 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2304)
 
   * a4b4afdf7b5dd711cc3978e2c02f9efb0c7b5514 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2474) Fix refreshing timeline for every operation

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2474:
--
Parent: HUDI-1292
Issue Type: Sub-task  (was: Improvement)

> Fix refreshing timeline for every operation
> ---
>
> Key: HUDI-2474
> URL: https://issues.apache.org/jira/browse/HUDI-2474
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> with metadata enabled, we need to refresh timeline for the table before very 
> operation. if not, some states might be missed out. Atleast for deltastreamer 
> continuous mode, we need this to be fixed. also some tests are failing due to 
> this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2474) Fix refreshing timeline for every operation

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2474:
--
Fix Version/s: 0.10.0

> Fix refreshing timeline for every operation
> ---
>
> Key: HUDI-2474
> URL: https://issues.apache.org/jira/browse/HUDI-2474
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> with metadata enabled, we need to refresh timeline for the table before very 
> operation. if not, some states might be missed out. Atleast for deltastreamer 
> continuous mode, we need this to be fixed. also some tests are failing due to 
> this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-2474) Fix refreshing timeline for every operation

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2474:
-

Assignee: sivabalan narayanan

> Fix refreshing timeline for every operation
> ---
>
> Key: HUDI-2474
> URL: https://issues.apache.org/jira/browse/HUDI-2474
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> with metadata enabled, we need to refresh timeline for the table before very 
> operation. if not, some states might be missed out. Atleast for deltastreamer 
> continuous mode, we need this to be fixed. also some tests are failing due to 
> this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-2474) Fix refreshing timeline for every operation

2021-09-21 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-2474:
-

 Summary: Fix refreshing timeline for every operation
 Key: HUDI-2474
 URL: https://issues.apache.org/jira/browse/HUDI-2474
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Writer Core
Reporter: sivabalan narayanan


with metadata enabled, we need to refresh timeline for the table before very 
operation. if not, some states might be missed out. Atleast for deltastreamer 
continuous mode, we need this to be fixed. also some tests are failing due to 
this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2473) Fix compaction action type in commit metadata

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2473:
--
Fix Version/s: 0.10.0

> Fix compaction action type in commit metadata
> -
>
> Key: HUDI-2473
> URL: https://issues.apache.org/jira/browse/HUDI-2473
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> Fix compaction action type in commit metadata.
> as of now, it is empty for compaction commit. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HUDI-2473) Fix compaction action type in commit metadata

2021-09-21 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2473:
-

Assignee: sivabalan narayanan

> Fix compaction action type in commit metadata
> -
>
> Key: HUDI-2473
> URL: https://issues.apache.org/jira/browse/HUDI-2473
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Fix compaction action type in commit metadata.
> as of now, it is empty for compaction commit. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-2473) Fix compaction action type in commit metadata

2021-09-21 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-2473:
-

 Summary: Fix compaction action type in commit metadata
 Key: HUDI-2473
 URL: https://issues.apache.org/jira/browse/HUDI-2473
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: sivabalan narayanan


Fix compaction action type in commit metadata.

as of now, it is empty for compaction commit. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3696: [WIP][HUDI-2439] Refactor commit actions in hudi-client module

2021-09-21 Thread GitBox



hudi-bot edited a comment on pull request #3696:
URL: https://github.com/apache/hudi/pull/3696#issuecomment-924126593


   
   ## CI report:
   
   * 67d66c27fb4cf090d7d4ca43daf691ed337f6971 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2304)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3696: [WIP][HUDI-2439] Refactor commit actions in hudi-client module

2021-09-21 Thread GitBox



hudi-bot commented on pull request #3696:
URL: https://github.com/apache/hudi/pull/3696#issuecomment-924126593


   
   ## CI report:
   
   * 67d66c27fb4cf090d7d4ca43daf691ed337f6971 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 >

1 - 100 of 132 matches

Mail list logo