[jira] [Created] (HUDI-3477) Sync timeline for client from embedded timeline service instead scan filesystem

2022-02-22 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-3477:


 Summary: Sync timeline for client from embedded timeline service 
instead scan filesystem
 Key: HUDI-3477
 URL: https://issues.apache.org/jira/browse/HUDI-3477
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink, spark, timeline-server
Reporter: yuzhaojing
Assignee: yuzhaojing


Currently, the hudi meta client gets timeline using scan filesystem. But when 
the task num and job num incre, the file system will be under enormous 
pressure, which will affect the system stability. In this issue, I hope to use 
embedded timeline service instead scan the filesystem to solve this problem.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3475) Support run compaction / clustering job in Service

2022-02-22 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-3475:


 Summary: Support run compaction / clustering job in Service 
 Key: HUDI-3475
 URL: https://issues.apache.org/jira/browse/HUDI-3475
 Project: Apache Hudi
  Issue Type: New Feature
  Components: core
Reporter: yuzhaojing


Is an implementation for RFC-43



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3420) Remove duplicates type in HoodieClusteringGroup.avsc

2022-02-13 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-3420:
-
Summary: Remove duplicates type in HoodieClusteringGroup.avsc  (was: Remove 
duplicates type in HoodieCleanMetadata.avsc)

> Remove duplicates type in HoodieClusteringGroup.avsc
> 
>
> Key: HUDI-3420
> URL: https://issues.apache.org/jira/browse/HUDI-3420
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: yuzhaojing
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3420) Remove duplicates type in HoodieCleanMetadata.avsc

2022-02-13 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-3420:


 Summary: Remove duplicates type in HoodieCleanMetadata.avsc
 Key: HUDI-3420
 URL: https://issues.apache.org/jira/browse/HUDI-3420
 Project: Apache Hudi
  Issue Type: Improvement
  Components: core
Reporter: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3418) Save timeout option for remote RemoteFileSystemView

2022-02-13 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-3418:
-
Component/s: core

> Save timeout option for remote RemoteFileSystemView
> ---
>
> Key: HUDI-3418
> URL: https://issues.apache.org/jira/browse/HUDI-3418
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core, flink
>Reporter: yuzhaojing
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3418) Save timeout option for remote Embedded timeline server

2022-02-13 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-3418:
-
Summary: Save timeout option for remote Embedded timeline server  (was: 
Save timeout option for Embedded timeline server in flink)

> Save timeout option for remote Embedded timeline server
> ---
>
> Key: HUDI-3418
> URL: https://issues.apache.org/jira/browse/HUDI-3418
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: yuzhaojing
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3418) Save timeout option for remote RemoteFileSystemView

2022-02-13 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-3418:
-
Summary: Save timeout option for remote RemoteFileSystemView  (was: Save 
timeout option for remote Embedded timeline server)

> Save timeout option for remote RemoteFileSystemView
> ---
>
> Key: HUDI-3418
> URL: https://issues.apache.org/jira/browse/HUDI-3418
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: yuzhaojing
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3418) Save timeout option for Embedded timeline server in flink

2022-02-13 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-3418:


 Summary: Save timeout option for Embedded timeline server in flink
 Key: HUDI-3418
 URL: https://issues.apache.org/jira/browse/HUDI-3418
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3124) Bootstrap when timeline have completed instant

2021-12-28 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-3124:


 Summary: Bootstrap when timeline have completed instant
 Key: HUDI-3124
 URL: https://issues.apache.org/jira/browse/HUDI-3124
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3120) Cache compactionPlan in buffer

2021-12-28 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-3120:
-
Summary: Cache compactionPlan in buffer  (was: Clear buffer when compaction 
is failed)

> Cache compactionPlan in buffer
> --
>
> Key: HUDI-3120
> URL: https://issues.apache.org/jira/browse/HUDI-3120
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3120) Clear buffer when compaction is failed

2021-12-28 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-3120:


 Summary: Clear buffer when compaction is failed
 Key: HUDI-3120
 URL: https://issues.apache.org/jira/browse/HUDI-3120
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3086) Add retry in RemoteHoodieTableFileSystemView

2021-12-21 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-3086:


 Summary: Add retry in RemoteHoodieTableFileSystemView
 Key: HUDI-3086
 URL: https://issues.apache.org/jira/browse/HUDI-3086
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Common Core
Reporter: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3046) Claim RFC number for RFC for Compaction / Clustering Service

2021-12-16 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-3046:
-
Labels: pull-request-available  (was: )

> Claim RFC number for RFC for Compaction / Clustering Service
> 
>
> Key: HUDI-3046
> URL: https://issues.apache.org/jira/browse/HUDI-3046
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3046) Claim RFC number for RFC for Compaction / Clustering Service

2021-12-16 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-3046:


 Summary: Claim RFC number for RFC for Compaction / Clustering 
Service
 Key: HUDI-3046
 URL: https://issues.apache.org/jira/browse/HUDI-3046
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3016) [RFC-42] Implement Compaction/Clustering Service for Hudi

2021-12-14 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-3016:


 Summary: [RFC-42] Implement Compaction/Clustering Service for Hudi
 Key: HUDI-3016
 URL: https://issues.apache.org/jira/browse/HUDI-3016
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Common Core, Compaction
Reporter: yuzhaojing
Assignee: yuzhaojing
 Fix For: 0.11.0


Implement Compaction/Clustering Service to manager Hudi table compaction / 
clustering action.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2913) Disable auto clean in writer task

2021-12-01 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2913:


 Summary: Disable auto clean in writer task
 Key: HUDI-2913
 URL: https://issues.apache.org/jira/browse/HUDI-2913
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2912) Fix CompactionPlanOperator typo

2021-12-01 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2912:


 Summary: Fix CompactionPlanOperator typo
 Key: HUDI-2912
 URL: https://issues.apache.org/jira/browse/HUDI-2912
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2817) Sync the configuration inference for HoodieFlinkStreamer

2021-11-22 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing reassigned HUDI-2817:


Assignee: yuzhaojing

> Sync the configuration inference for HoodieFlinkStreamer
> 
>
> Key: HUDI-2817
> URL: https://issues.apache.org/jira/browse/HUDI-2817
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2817) Sync the configuration inference for HoodieFlinkStreamer

2021-11-22 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2817:


 Summary: Sync the configuration inference for HoodieFlinkStreamer
 Key: HUDI-2817
 URL: https://issues.apache.org/jira/browse/HUDI-2817
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-2377) Add rateLimiter in Bootstrap

2021-11-14 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing closed HUDI-2377.

Resolution: Fixed

> Add rateLimiter in Bootstrap
> 
>
> Key: HUDI-2377
> URL: https://issues.apache.org/jira/browse/HUDI-2377
> Project: Apache Hudi
>  Issue Type: Wish
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> Add rateLimiter in Bootstrap to avoid taskManager heartbeat timeout when cpu 
> overload



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2738) Remove the bucketAssignFunction useless context

2021-11-10 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2738:


 Summary: Remove the bucketAssignFunction useless context
 Key: HUDI-2738
 URL: https://issues.apache.org/jira/browse/HUDI-2738
 Project: Apache Hudi
  Issue Type: Task
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2730) Move EventTimeAvroPayload into hudi-common module

2021-11-09 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing reassigned HUDI-2730:


Assignee: yuzhaojing

> Move EventTimeAvroPayload into hudi-common module
> -
>
> Key: HUDI-2730
> URL: https://issues.apache.org/jira/browse/HUDI-2730
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: yuzhaojing
>Priority: Major
> Fix For: 0.10.0
>
>
> So that the reader jar can see that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2698) Remove the table source options validation

2021-11-05 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing reassigned HUDI-2698:


Assignee: yuzhaojing

> Remove the table source options validation
> --
>
> Key: HUDI-2698
> URL: https://issues.apache.org/jira/browse/HUDI-2698
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Danny Chen
>Assignee: yuzhaojing
>Priority: Major
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2686) Proccess record after all bootstrap operator ready

2021-11-04 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2686:
-
Summary: Proccess record after all bootstrap operator ready  (was: Add DFS 
based message queue for bootstrap operator)

> Proccess record after all bootstrap operator ready
> --
>
> Key: HUDI-2686
> URL: https://issues.apache.org/jira/browse/HUDI-2686
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2686) Add DFS based message queue for bootstrap operator

2021-11-04 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing reassigned HUDI-2686:


Assignee: yuzhaojing

> Add DFS based message queue for bootstrap operator
> --
>
> Key: HUDI-2686
> URL: https://issues.apache.org/jira/browse/HUDI-2686
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2686) Add DFS based message queue for bootstrap operator

2021-11-04 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2686:
-
Summary: Add DFS based message queue for bootstrap operator  (was: Fix)

> Add DFS based message queue for bootstrap operator
> --
>
> Key: HUDI-2686
> URL: https://issues.apache.org/jira/browse/HUDI-2686
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: yuzhaojing
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2686) Fix

2021-11-04 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2686:


 Summary: Fix
 Key: HUDI-2686
 URL: https://issues.apache.org/jira/browse/HUDI-2686
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2685) Support scheduling online compaction plan when there are no commit data

2021-11-04 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing reassigned HUDI-2685:


Assignee: yuzhaojing  (was: Danny Chen)

> Support scheduling online compaction plan when there are no commit data
> ---
>
> Key: HUDI-2685
> URL: https://issues.apache.org/jira/browse/HUDI-2685
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: yuzhaojing
>Priority: Major
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2631) In CompactFunction, set up the write schema each time with the latest schema

2021-11-01 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing reassigned HUDI-2631:


Assignee: yuzhaojing

> In CompactFunction, set up the write schema each time with the latest schema
> 
>
> Key: HUDI-2631
> URL: https://issues.apache.org/jira/browse/HUDI-2631
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: yuzhaojing
>Priority: Major
> Fix For: 0.10.0
>
>
> To support schema evolution.
> We only need to do that when flag {{asyncCompaction}} is true, because for 
> offline compaction, we already infer the schema from the latest data file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2651) Sync all the missing sql options for HoodieFlinkStreamer

2021-11-01 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing reassigned HUDI-2651:


Assignee: yuzhaojing

> Sync all the missing sql options for HoodieFlinkStreamer
> 
>
> Key: HUDI-2651
> URL: https://issues.apache.org/jira/browse/HUDI-2651
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: yuzhaojing
>Priority: Major
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2624) Implement Non Index type for HUDI

2021-10-25 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing reassigned HUDI-2624:


Assignee: yuzhaojing

> Implement Non Index type for HUDI
> -
>
> Key: HUDI-2624
> URL: https://issues.apache.org/jira/browse/HUDI-2624
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Supports scenarios where the data does not have a primary key. At present, a 
> common practice for these scenarios is to give a uuid as the primary key, but 
> this will generate unnecessary indexes and consume more resources during 
> Compaction.
> In order to better support this scenario, the idea of non index is proposed. 
> This solution will not generate an index and can perform compaction faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2624) Implement Non Index type for HUDI

2021-10-25 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2624:


 Summary: Implement Non Index type for HUDI
 Key: HUDI-2624
 URL: https://issues.apache.org/jira/browse/HUDI-2624
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Common Core
Reporter: yuzhaojing


Supports scenarios where the data does not have a primary key. At present, a 
common practice for these scenarios is to give a uuid as the primary key, but 
this will generate unnecessary indexes and consume more resources during 
Compaction.
In order to better support this scenario, the idea of non index is proposed. 
This solution will not generate an index and can perform compaction faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2547) Schedule Flink compaction in service

2021-10-12 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2547:
-
Summary: Schedule Flink compaction in service  (was: Schedule Flink 
Compaction in service)

> Schedule Flink compaction in service
> 
>
> Key: HUDI-2547
> URL: https://issues.apache.org/jira/browse/HUDI-2547
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Now run HoodieFlinkCompactor once only trigger one compaction, we can add 
> CompactService to schedule compaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2547) Schedule Flink Compaction in service

2021-10-12 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2547:


 Summary: Schedule Flink Compaction in service
 Key: HUDI-2547
 URL: https://issues.apache.org/jira/browse/HUDI-2547
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Now run HoodieFlinkCompactor once only trigger one compaction, we can add 
CompactService to schedule compaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2377) Add rateLimiter in Bootstrap

2021-08-30 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2377:


 Summary: Add rateLimiter in Bootstrap
 Key: HUDI-2377
 URL: https://issues.apache.org/jira/browse/HUDI-2377
 Project: Apache Hudi
  Issue Type: Wish
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Add rateLimiter in Bootstrap to avoid taskManager heartbeat timeout when cpu 
overload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2376) Add pipeline for Append mode

2021-08-30 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2376:


 Summary: Add pipeline for Append mode
 Key: HUDI-2376
 URL: https://issues.apache.org/jira/browse/HUDI-2376
 Project: Apache Hudi
  Issue Type: Wish
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Add pipeline for Append mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2368) Catch Throwable in BoundedInMemoryExecutor

2021-08-26 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2368:


 Summary: Catch Throwable in BoundedInMemoryExecutor
 Key: HUDI-2368
 URL: https://issues.apache.org/jira/browse/HUDI-2368
 Project: Apache Hudi
  Issue Type: Wish
  Components: Common Core
Reporter: yuzhaojing
Assignee: yuzhaojing


Now BoundedInMemoryExecutor only catch Exception in produce async thread and 
don't process the caillback result. But there may throw error in produce async 
thread, that we can't identify the problem and don't know why the data is 
missing!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2342) Optimize Bootstrap operator

2021-08-20 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2342:


 Summary: Optimize Bootstrap operator
 Key: HUDI-2342
 URL: https://issues.apache.org/jira/browse/HUDI-2342
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Move load index logic in initializeState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2270) Remove corrupted clean action

2021-08-02 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2270:


 Summary: Remove corrupted clean action
 Key: HUDI-2270
 URL: https://issues.apache.org/jira/browse/HUDI-2270
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2087) Support Append only in Flink stream

2021-07-30 Thread yuzhaojing (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390452#comment-17390452
 ] 

yuzhaojing commented on HUDI-2087:
--

Ok, this PR is not eager.

> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2247) Filter file where length less than parquet MAGIC length

2021-07-28 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2247:


 Summary: Filter file where length less than parquet MAGIC length
 Key: HUDI-2247
 URL: https://issues.apache.org/jira/browse/HUDI-2247
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2207) Support independent flink hudi clustering function

2021-07-22 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2207:


 Summary: Support independent flink hudi clustering function
 Key: HUDI-2207
 URL: https://issues.apache.org/jira/browse/HUDI-2207
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2193) Remove state in BootstrapFunction

2021-07-18 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2193:


 Summary: Remove state in BootstrapFunction
 Key: HUDI-2193
 URL: https://issues.apache.org/jira/browse/HUDI-2193
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Remove state in BootstrapFunction to support restart job with out bootstrap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2171) Add parallelism conf for bootstrap operator

2021-07-13 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2171:


 Summary: Add parallelism conf for bootstrap operator
 Key: HUDI-2171
 URL: https://issues.apache.org/jira/browse/HUDI-2171
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Add parallelism conf for bootstrap operator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2169) Remove keyby when write.operation is Insert

2021-07-13 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2169:


 Summary: Remove keyby when write.operation is Insert
 Key: HUDI-2169
 URL: https://issues.apache.org/jira/browse/HUDI-2169
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


When write.operation is Insert, user can tolerate data duplication or data that 
does not need to be merged. In this case, the keyby is unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2087) Support Append only in Flink stream

2021-07-08 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2087:
-
Attachment: image-2021-07-08-22-04-40-018.png

> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2087) Support Append only in Flink stream

2021-07-08 Thread yuzhaojing (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377417#comment-17377417
 ] 

yuzhaojing commented on HUDI-2087:
--

CI is failed,but these test case success in my local.

https://github.com/apache/hudi/pull/3174/checks?check_run_id=3016573136

!image-2021-07-08-22-04-40-018.png!

> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2087) Support Append only in Flink stream

2021-07-08 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2087:
-
Attachment: image-2021-07-08-22-04-30-039.png

> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HUDI-2087) Support Append only in Flink stream

2021-07-08 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2087:
-
Comment: was deleted

(was: [https://github.com/apache/hudi/pull/3174/checks?check_run_id=3016573136]

!171625752829_.pic_hd.jpg!)

> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2087) Support Append only in Flink stream

2021-07-08 Thread yuzhaojing (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377414#comment-17377414
 ] 

yuzhaojing commented on HUDI-2087:
--

[https://github.com/apache/hudi/pull/3174/checks?check_run_id=3016573136]

!171625752829_.pic_hd.jpg!

> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2145) Create new bucket when NewFileAssignState filled

2021-07-08 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2145:


 Summary: Create new bucket when NewFileAssignState filled
 Key: HUDI-2145
 URL: https://issues.apache.org/jira/browse/HUDI-2145
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2141) Integration flink metric in flink stream

2021-07-07 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2141:
-
Summary: Integration flink metric in flink stream  (was: Integration flink 
metric)

> Integration flink metric in flink stream
> 
>
> Key: HUDI-2141
> URL: https://issues.apache.org/jira/browse/HUDI-2141
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Now hoodie metrics can't work in flink stream because Designed for batch 
> processing,  integration flink metric in flink stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2141) Integration flink metric

2021-07-07 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2141:


 Summary: Integration flink metric
 Key: HUDI-2141
 URL: https://issues.apache.org/jira/browse/HUDI-2141
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Now hoodie metrics can't work in flink stream because Designed for batch 
processing,  integration flink metric in flink stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2103) Add rebalance before index bootstrap

2021-06-29 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2103:


 Summary: Add rebalance before index bootstrap
 Key: HUDI-2103
 URL: https://issues.apache.org/jira/browse/HUDI-2103
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


When use flink sql upsert to hudi, user always set parallelism larger than 
kafak partition num. Now bootstrap operator need at least one element to 
trigger loading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2087) Support Append only in Flink stream

2021-06-28 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2087:


 Summary: Support Append only in Flink stream
 Key: HUDI-2087
 URL: https://issues.apache.org/jira/browse/HUDI-2087
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


It is necessary to support append mode in flink stream, as the data lake should 
be able to write log type data as parquet high performance without merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2062) Catch FileNotFoundException in WriteProfiles #getCommitMetadataSafely

2021-06-23 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2062:
-
Summary: Catch FileNotFoundException in WriteProfiles 
#getCommitMetadataSafely  (was: Catch IOException in WriteProfiles 
#getCommitMetadataSafely)

> Catch FileNotFoundException in WriteProfiles #getCommitMetadataSafely
> -
>
> Key: HUDI-2062
> URL: https://issues.apache.org/jira/browse/HUDI-2062
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> The function WriteProfiles #getCommitMetadataSafely expect get instant 
> safely, if instant deleted by cleaner that ignore it.
> But in case, FileNotFoundException will throw out of try catch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2062) Catch IOException in WriteProfiles #getCommitMetadataSafely

2021-06-23 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2062:
-
Description: 
The function WriteProfiles #getCommitMetadataSafely expect get instant safely, 
if instant deleted by cleaner that ignore it.
But in case, FileNotFoundException will throw out of try catch.

  was: 


> Catch IOException in WriteProfiles #getCommitMetadataSafely
> ---
>
> Key: HUDI-2062
> URL: https://issues.apache.org/jira/browse/HUDI-2062
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> The function WriteProfiles #getCommitMetadataSafely expect get instant 
> safely, if instant deleted by cleaner that ignore it.
> But in case, FileNotFoundException will throw out of try catch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2062) Catch IOException in WriteProfiles #getCommitMetadataSafely

2021-06-23 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2062:
-
Description:  

> Catch IOException in WriteProfiles #getCommitMetadataSafely
> ---
>
> Key: HUDI-2062
> URL: https://issues.apache.org/jira/browse/HUDI-2062
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2062) Catch IOException in WriteProfiles #getCommitMetadataSafely

2021-06-23 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2062:
-
Description: (was: Catch IOException in WriteProfiles 
#getCommitMetadataSafely)

> Catch IOException in WriteProfiles #getCommitMetadataSafely
> ---
>
> Key: HUDI-2062
> URL: https://issues.apache.org/jira/browse/HUDI-2062
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2062) Catch IOException in WriteProfiles #getCommitMetadataSafely

2021-06-23 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2062:


 Summary: Catch IOException in WriteProfiles 
#getCommitMetadataSafely
 Key: HUDI-2062
 URL: https://issues.apache.org/jira/browse/HUDI-2062
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Catch IOException in WriteProfiles #getCommitMetadataSafely



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2052) Support load logfile in BootstrapFunction

2021-06-21 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2052:


 Summary: Support load logfile in BootstrapFunction
 Key: HUDI-2052
 URL: https://issues.apache.org/jira/browse/HUDI-2052
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Support load logfile in BootstrapFunction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2038) Support rollback inflight compaction instances for CompactionPlanOperator

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2038:
-
Summary: Support rollback inflight compaction instances for 
CompactionPlanOperator  (was: Rollback pending compaction when schedule new 
compaction)

> Support rollback inflight compaction instances for CompactionPlanOperator
> -
>
> Key: HUDI-2038
> URL: https://issues.apache.org/jira/browse/HUDI-2038
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> Rollback pending compaction when schedule new compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2038) Support rollback inflight compaction instances for CompactionPlanOperator

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2038:
-
Description: Support rollback inflight compaction instances for 
CompactionPlanOperator  (was: Rollback pending compaction when schedule new 
compaction)

> Support rollback inflight compaction instances for CompactionPlanOperator
> -
>
> Key: HUDI-2038
> URL: https://issues.apache.org/jira/browse/HUDI-2038
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> Support rollback inflight compaction instances for CompactionPlanOperator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Ignore FileNotFoundException in WriteProfiles #getWritePathsOfInstant

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Summary: Ignore FileNotFoundException in WriteProfiles 
#getWritePathsOfInstant  (was: Ignore IOException in WriteProfiles 
#getWritePathsOfInstant)

> Ignore FileNotFoundException in WriteProfiles #getWritePathsOfInstant
> -
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> In case this instant will be clean at same time. 
> If this file be deleted, this file must not the latest version, skip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Ignore IOException in WriteProfiles #getWritePathsOfInstant

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Description: 
In case this instant will be clean at same time. 

If this file be deleted, this file must not the latest version, skip.

  was:
Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean.

In case this instant will be clean at same time.


> Ignore IOException in WriteProfiles #getWritePathsOfInstant
> ---
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> In case this instant will be clean at same time. 
> If this file be deleted, this file must not the latest version, skip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Ignore IOException in WriteProfiles #getWritePathsOfInstant

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Summary: Ignore IOException in WriteProfiles #getWritePathsOfInstant  (was: 
Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean)

> Ignore IOException in WriteProfiles #getWritePathsOfInstant
> ---
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean.
> In case this instant will be clean at same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Description: 
Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean.

In case this instant will be clean at same time.

  was:Filter hoodieTimeline instant in WriteProfile when the cleaner possibly 
clean, because the instant don't need to be reloaded. In case this instant will 
be clean at same time.


> Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean
> -
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean.
> In case this instant will be clean at same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Description: Filter hoodieTimeline instant in WriteProfile when the cleaner 
possibly clean, because the instant don't need to be reloaded. In case this 
instant will be clean at same time.  (was: Filter hoodieTimeline when the 
instant in metadataCache, because the instant don't need to be reloaded. In 
case this instant will be clean at same time.)

> Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean
> -
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Filter hoodieTimeline instant in WriteProfile when the cleaner possibly 
> clean, because the instant don't need to be reloaded. In case this instant 
> will be clean at same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Description: Filter hoodieTimeline when the instant in metadataCache, 
because the instant don't need to be reloaded. In case this instant will be 
clean at same time.  (was: Filter commit metadata when the instant in 
metadataCache, because the instant don't need to be reloaded. In case this 
instant will be clean at same time.)

> Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean
> -
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Filter hoodieTimeline when the instant in metadataCache, because the instant 
> don't need to be reloaded. In case this instant will be clean at same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Summary: Filter hoodieTimeline instant in WriteProfile when the cleaner 
possibly clean  (was: Filter hoodieTimeline)

> Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean
> -
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Filter commit metadata when the instant in metadataCache, because the instant 
> don't need to be reloaded. In case this instant will be clean at same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Filter hoodieTimeline

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Summary: Filter hoodieTimeline  (was: Filter commit metadata when the 
instant is loaded)

> Filter hoodieTimeline
> -
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Filter commit metadata when the instant in metadataCache, because the instant 
> don't need to be reloaded. In case this instant will be clean at same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Filter commit metadata when the instant is loaded

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Summary: Filter commit metadata when the instant is loaded  (was: Filter 
commit metadata when the instant loaded)

> Filter commit metadata when the instant is loaded
> -
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Filter commit metadata when the instant in metadataCache, because the instant 
> don't need to be reloaded. In case this instant will be clean at same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2047) Filter commit metadata when the instant loaded

2021-06-21 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2047:
-
Summary: Filter commit metadata when the instant loaded  (was: Filter 
commit metadata when the instant in metadataCache)

> Filter commit metadata when the instant loaded
> --
>
> Key: HUDI-2047
> URL: https://issues.apache.org/jira/browse/HUDI-2047
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> Filter commit metadata when the instant in metadataCache, because the instant 
> don't need to be reloaded. In case this instant will be clean at same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2047) Filter commit metadata when the instant in metadataCache

2021-06-21 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2047:


 Summary: Filter commit metadata when the instant in metadataCache
 Key: HUDI-2047
 URL: https://issues.apache.org/jira/browse/HUDI-2047
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Filter commit metadata when the instant in metadataCache, because the instant 
don't need to be reloaded. In case this instant will be clean at same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2038) Rollback pending compaction when schedule new compaction

2021-06-17 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2038:


 Summary: Rollback pending compaction when schedule new compaction
 Key: HUDI-2038
 URL: https://issues.apache.org/jira/browse/HUDI-2038
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Rollback pending compaction when schedule new compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2022) Release writer for append handle #close

2021-06-15 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2022:


 Summary: Release writer for append handle #close
 Key: HUDI-2022
 URL: https://issues.apache.org/jira/browse/HUDI-2022
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Common Core
Reporter: yuzhaojing
Assignee: yuzhaojing


The writer can be release eagerly to save the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2021) Use putAll instead default in TypedProperties constructor

2021-06-15 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2021:


 Summary: Use putAll instead default in TypedProperties constructor
 Key: HUDI-2021
 URL: https://issues.apache.org/jira/browse/HUDI-2021
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Common Core
Reporter: yuzhaojing
Assignee: yuzhaojing


the default in Properties can't put in other Properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2019) Update writeConfig in every client

2021-06-15 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2019:


 Summary: Update writeConfig in every client 
 Key: HUDI-2019
 URL: https://issues.apache.org/jira/browse/HUDI-2019
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


It's only update writeConfig once in every taskmanager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2018) Skip creating marker files for flink append handle

2021-06-15 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2018:


 Summary: Skip creating marker files for flink append handle
 Key: HUDI-2018
 URL: https://issues.apache.org/jira/browse/HUDI-2018
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Skip creating the marker files for flink append handle to make it more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2000) Release file writer for merge handle #close

2021-06-11 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2000:
-
Description: The file writer can be release eagerly to save the memory 
footprint.  (was: The file writer can be cleaned eagerly to save the memory 
footprint.)

> Release file writer for merge handle #close
> ---
>
> Key: HUDI-2000
> URL: https://issues.apache.org/jira/browse/HUDI-2000
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> The file writer can be release eagerly to save the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2000) Release file writer for merge handle #close

2021-06-11 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-2000:
-
Summary: Release file writer for merge handle #close  (was: Release the new 
records map for merge handle #close)

> Release file writer for merge handle #close
> ---
>
> Key: HUDI-2000
> URL: https://issues.apache.org/jira/browse/HUDI-2000
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> The file writer can be cleaned eagerly to save the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2000) Release the new records map for merge handle #close

2021-06-11 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-2000:


 Summary: Release the new records map for merge handle #close
 Key: HUDI-2000
 URL: https://issues.apache.org/jira/browse/HUDI-2000
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Common Core
Reporter: yuzhaojing
Assignee: yuzhaojing


The file writer can be cleaned eagerly to save the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1990) Delete duplicate BootstrapFunction

2021-06-08 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-1990:


 Summary: Delete duplicate BootstrapFunction
 Key: HUDI-1990
 URL: https://issues.apache.org/jira/browse/HUDI-1990
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


_delete duplicate BootstrapFunction operator in HoodieTableSink._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1954) StreamWriterFunction only reset when flush success

2021-06-02 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-1954:


 Summary: StreamWriterFunction only reset when flush success
 Key: HUDI-1954
 URL: https://issues.apache.org/jira/browse/HUDI-1954
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Now StreamWriterFunction flush bucket is unsafe. When instant is null, 
flushBucket will return immediately, and then reset this bucket resulting in 
data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1938) Don't flush to disk before notifyCompleteCheckpoint

2021-06-01 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing closed HUDI-1938.

Resolution: Fixed

> Don't flush to disk before notifyCompleteCheckpoint
> ---
>
> Key: HUDI-1938
> URL: https://issues.apache.org/jira/browse/HUDI-1938
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>
> Now there may flush disk after snapshotState and before 
> notifyCompleteCheckpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1938) Don't flush to disk before notifyCompleteCheckpoint

2021-05-27 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-1938:


 Summary: Don't flush to disk before notifyCompleteCheckpoint
 Key: HUDI-1938
 URL: https://issues.apache.org/jira/browse/HUDI-1938
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Now there may flush disk after snapshotState and before 
notifyCompleteCheckpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1923) Add state in StreamWriteFunction to restore

2021-05-25 Thread yuzhaojing (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuzhaojing updated HUDI-1923:
-
Description: 
In flink, notifyCheckpointComplete not in checkpoint life cycle. If a 
checkpoint is success and commit action in notifyCheckpointComplete is failed, 
when we restore from the latest checkpoint, the element belong this instant 
will be discard.

So, we should store commit state and restore it when flink restart.

  was:if coordinator notifyCheckpointComplete funtion execute failed, when we 
restore from the latest checkpoint, the element belong this instant will be 
discard.


> Add state in StreamWriteFunction to restore
> ---
>
> Key: HUDI-1923
> URL: https://issues.apache.org/jira/browse/HUDI-1923
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>
> In flink, notifyCheckpointComplete not in checkpoint life cycle. If a 
> checkpoint is success and commit action in notifyCheckpointComplete is 
> failed, when we restore from the latest checkpoint, the element belong this 
> instant will be discard.
> So, we should store commit state and restore it when flink restart.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1924) Support bootstrap operator to load index from hoodieTable

2021-05-23 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-1924:


 Summary: Support bootstrap operator to load index from hoodieTable 
 Key: HUDI-1924
 URL: https://issues.apache.org/jira/browse/HUDI-1924
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


Now we load index in BucketAssign, but hoodieRecords in a baseFile may be 
belong many task,  So we have to load all files in any BucketAssign task.

If we add a operator before BucketAssign, then key by index Record to 
BucketAssign, that we can implement assign part of files to any task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1923) Add state in StreamWriteFunction to restore

2021-05-23 Thread yuzhaojing (Jira)
yuzhaojing created HUDI-1923:


 Summary: Add state in StreamWriteFunction to restore
 Key: HUDI-1923
 URL: https://issues.apache.org/jira/browse/HUDI-1923
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: yuzhaojing
Assignee: yuzhaojing


if coordinator notifyCheckpointComplete funtion execute failed, when we restore 
from the latest checkpoint, the element belong this instant will be discard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)