[jira] [Created] (HUDI-3477) Sync timeline for client from embedded timeline service instead scan filesystem
yuzhaojing created HUDI-3477: Summary: Sync timeline for client from embedded timeline service instead scan filesystem Key: HUDI-3477 URL: https://issues.apache.org/jira/browse/HUDI-3477 Project: Apache Hudi Issue Type: Improvement Components: flink, spark, timeline-server Reporter: yuzhaojing Assignee: yuzhaojing Currently, the hudi meta client gets timeline using scan filesystem. But when the task num and job num incre, the file system will be under enormous pressure, which will affect the system stability. In this issue, I hope to use embedded timeline service instead scan the filesystem to solve this problem. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3475) Support run compaction / clustering job in Service
yuzhaojing created HUDI-3475: Summary: Support run compaction / clustering job in Service Key: HUDI-3475 URL: https://issues.apache.org/jira/browse/HUDI-3475 Project: Apache Hudi Issue Type: New Feature Components: core Reporter: yuzhaojing Is an implementation for RFC-43 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3420) Remove duplicates type in HoodieClusteringGroup.avsc
[ https://issues.apache.org/jira/browse/HUDI-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-3420: - Summary: Remove duplicates type in HoodieClusteringGroup.avsc (was: Remove duplicates type in HoodieCleanMetadata.avsc) > Remove duplicates type in HoodieClusteringGroup.avsc > > > Key: HUDI-3420 > URL: https://issues.apache.org/jira/browse/HUDI-3420 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: yuzhaojing >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3420) Remove duplicates type in HoodieCleanMetadata.avsc
yuzhaojing created HUDI-3420: Summary: Remove duplicates type in HoodieCleanMetadata.avsc Key: HUDI-3420 URL: https://issues.apache.org/jira/browse/HUDI-3420 Project: Apache Hudi Issue Type: Improvement Components: core Reporter: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3418) Save timeout option for remote RemoteFileSystemView
[ https://issues.apache.org/jira/browse/HUDI-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-3418: - Component/s: core > Save timeout option for remote RemoteFileSystemView > --- > > Key: HUDI-3418 > URL: https://issues.apache.org/jira/browse/HUDI-3418 > Project: Apache Hudi > Issue Type: Improvement > Components: core, flink >Reporter: yuzhaojing >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3418) Save timeout option for remote Embedded timeline server
[ https://issues.apache.org/jira/browse/HUDI-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-3418: - Summary: Save timeout option for remote Embedded timeline server (was: Save timeout option for Embedded timeline server in flink) > Save timeout option for remote Embedded timeline server > --- > > Key: HUDI-3418 > URL: https://issues.apache.org/jira/browse/HUDI-3418 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: yuzhaojing >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3418) Save timeout option for remote RemoteFileSystemView
[ https://issues.apache.org/jira/browse/HUDI-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-3418: - Summary: Save timeout option for remote RemoteFileSystemView (was: Save timeout option for remote Embedded timeline server) > Save timeout option for remote RemoteFileSystemView > --- > > Key: HUDI-3418 > URL: https://issues.apache.org/jira/browse/HUDI-3418 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: yuzhaojing >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3418) Save timeout option for Embedded timeline server in flink
yuzhaojing created HUDI-3418: Summary: Save timeout option for Embedded timeline server in flink Key: HUDI-3418 URL: https://issues.apache.org/jira/browse/HUDI-3418 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3124) Bootstrap when timeline have completed instant
yuzhaojing created HUDI-3124: Summary: Bootstrap when timeline have completed instant Key: HUDI-3124 URL: https://issues.apache.org/jira/browse/HUDI-3124 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3120) Cache compactionPlan in buffer
[ https://issues.apache.org/jira/browse/HUDI-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-3120: - Summary: Cache compactionPlan in buffer (was: Clear buffer when compaction is failed) > Cache compactionPlan in buffer > -- > > Key: HUDI-3120 > URL: https://issues.apache.org/jira/browse/HUDI-3120 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3120) Clear buffer when compaction is failed
yuzhaojing created HUDI-3120: Summary: Clear buffer when compaction is failed Key: HUDI-3120 URL: https://issues.apache.org/jira/browse/HUDI-3120 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3086) Add retry in RemoteHoodieTableFileSystemView
yuzhaojing created HUDI-3086: Summary: Add retry in RemoteHoodieTableFileSystemView Key: HUDI-3086 URL: https://issues.apache.org/jira/browse/HUDI-3086 Project: Apache Hudi Issue Type: Improvement Components: Common Core Reporter: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3046) Claim RFC number for RFC for Compaction / Clustering Service
[ https://issues.apache.org/jira/browse/HUDI-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-3046: - Labels: pull-request-available (was: ) > Claim RFC number for RFC for Compaction / Clustering Service > > > Key: HUDI-3046 > URL: https://issues.apache.org/jira/browse/HUDI-3046 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: yuzhaojing >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3046) Claim RFC number for RFC for Compaction / Clustering Service
yuzhaojing created HUDI-3046: Summary: Claim RFC number for RFC for Compaction / Clustering Service Key: HUDI-3046 URL: https://issues.apache.org/jira/browse/HUDI-3046 Project: Apache Hudi Issue Type: Sub-task Reporter: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3016) [RFC-42] Implement Compaction/Clustering Service for Hudi
yuzhaojing created HUDI-3016: Summary: [RFC-42] Implement Compaction/Clustering Service for Hudi Key: HUDI-3016 URL: https://issues.apache.org/jira/browse/HUDI-3016 Project: Apache Hudi Issue Type: New Feature Components: Common Core, Compaction Reporter: yuzhaojing Assignee: yuzhaojing Fix For: 0.11.0 Implement Compaction/Clustering Service to manager Hudi table compaction / clustering action. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2913) Disable auto clean in writer task
yuzhaojing created HUDI-2913: Summary: Disable auto clean in writer task Key: HUDI-2913 URL: https://issues.apache.org/jira/browse/HUDI-2913 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2912) Fix CompactionPlanOperator typo
yuzhaojing created HUDI-2912: Summary: Fix CompactionPlanOperator typo Key: HUDI-2912 URL: https://issues.apache.org/jira/browse/HUDI-2912 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-2817) Sync the configuration inference for HoodieFlinkStreamer
[ https://issues.apache.org/jira/browse/HUDI-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing reassigned HUDI-2817: Assignee: yuzhaojing > Sync the configuration inference for HoodieFlinkStreamer > > > Key: HUDI-2817 > URL: https://issues.apache.org/jira/browse/HUDI-2817 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2817) Sync the configuration inference for HoodieFlinkStreamer
yuzhaojing created HUDI-2817: Summary: Sync the configuration inference for HoodieFlinkStreamer Key: HUDI-2817 URL: https://issues.apache.org/jira/browse/HUDI-2817 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-2377) Add rateLimiter in Bootstrap
[ https://issues.apache.org/jira/browse/HUDI-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing closed HUDI-2377. Resolution: Fixed > Add rateLimiter in Bootstrap > > > Key: HUDI-2377 > URL: https://issues.apache.org/jira/browse/HUDI-2377 > Project: Apache Hudi > Issue Type: Wish > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > Add rateLimiter in Bootstrap to avoid taskManager heartbeat timeout when cpu > overload -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2738) Remove the bucketAssignFunction useless context
yuzhaojing created HUDI-2738: Summary: Remove the bucketAssignFunction useless context Key: HUDI-2738 URL: https://issues.apache.org/jira/browse/HUDI-2738 Project: Apache Hudi Issue Type: Task Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-2730) Move EventTimeAvroPayload into hudi-common module
[ https://issues.apache.org/jira/browse/HUDI-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing reassigned HUDI-2730: Assignee: yuzhaojing > Move EventTimeAvroPayload into hudi-common module > - > > Key: HUDI-2730 > URL: https://issues.apache.org/jira/browse/HUDI-2730 > Project: Apache Hudi > Issue Type: Task > Components: Flink Integration >Reporter: Danny Chen >Assignee: yuzhaojing >Priority: Major > Fix For: 0.10.0 > > > So that the reader jar can see that. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-2698) Remove the table source options validation
[ https://issues.apache.org/jira/browse/HUDI-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing reassigned HUDI-2698: Assignee: yuzhaojing > Remove the table source options validation > -- > > Key: HUDI-2698 > URL: https://issues.apache.org/jira/browse/HUDI-2698 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Danny Chen >Assignee: yuzhaojing >Priority: Major > Fix For: 0.10.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2686) Proccess record after all bootstrap operator ready
[ https://issues.apache.org/jira/browse/HUDI-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2686: - Summary: Proccess record after all bootstrap operator ready (was: Add DFS based message queue for bootstrap operator) > Proccess record after all bootstrap operator ready > -- > > Key: HUDI-2686 > URL: https://issues.apache.org/jira/browse/HUDI-2686 > Project: Apache Hudi > Issue Type: New Feature >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2686) Add DFS based message queue for bootstrap operator
[ https://issues.apache.org/jira/browse/HUDI-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing reassigned HUDI-2686: Assignee: yuzhaojing > Add DFS based message queue for bootstrap operator > -- > > Key: HUDI-2686 > URL: https://issues.apache.org/jira/browse/HUDI-2686 > Project: Apache Hudi > Issue Type: New Feature >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2686) Add DFS based message queue for bootstrap operator
[ https://issues.apache.org/jira/browse/HUDI-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2686: - Summary: Add DFS based message queue for bootstrap operator (was: Fix) > Add DFS based message queue for bootstrap operator > -- > > Key: HUDI-2686 > URL: https://issues.apache.org/jira/browse/HUDI-2686 > Project: Apache Hudi > Issue Type: New Feature >Reporter: yuzhaojing >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2686) Fix
yuzhaojing created HUDI-2686: Summary: Fix Key: HUDI-2686 URL: https://issues.apache.org/jira/browse/HUDI-2686 Project: Apache Hudi Issue Type: New Feature Reporter: yuzhaojing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2685) Support scheduling online compaction plan when there are no commit data
[ https://issues.apache.org/jira/browse/HUDI-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing reassigned HUDI-2685: Assignee: yuzhaojing (was: Danny Chen) > Support scheduling online compaction plan when there are no commit data > --- > > Key: HUDI-2685 > URL: https://issues.apache.org/jira/browse/HUDI-2685 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: yuzhaojing >Priority: Major > Fix For: 0.10.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2631) In CompactFunction, set up the write schema each time with the latest schema
[ https://issues.apache.org/jira/browse/HUDI-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing reassigned HUDI-2631: Assignee: yuzhaojing > In CompactFunction, set up the write schema each time with the latest schema > > > Key: HUDI-2631 > URL: https://issues.apache.org/jira/browse/HUDI-2631 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: yuzhaojing >Priority: Major > Fix For: 0.10.0 > > > To support schema evolution. > We only need to do that when flag {{asyncCompaction}} is true, because for > offline compaction, we already infer the schema from the latest data file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2651) Sync all the missing sql options for HoodieFlinkStreamer
[ https://issues.apache.org/jira/browse/HUDI-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing reassigned HUDI-2651: Assignee: yuzhaojing > Sync all the missing sql options for HoodieFlinkStreamer > > > Key: HUDI-2651 > URL: https://issues.apache.org/jira/browse/HUDI-2651 > Project: Apache Hudi > Issue Type: Task > Components: Flink Integration >Reporter: Danny Chen >Assignee: yuzhaojing >Priority: Major > Fix For: 0.10.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2624) Implement Non Index type for HUDI
[ https://issues.apache.org/jira/browse/HUDI-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing reassigned HUDI-2624: Assignee: yuzhaojing > Implement Non Index type for HUDI > - > > Key: HUDI-2624 > URL: https://issues.apache.org/jira/browse/HUDI-2624 > Project: Apache Hudi > Issue Type: New Feature > Components: Common Core >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Supports scenarios where the data does not have a primary key. At present, a > common practice for these scenarios is to give a uuid as the primary key, but > this will generate unnecessary indexes and consume more resources during > Compaction. > In order to better support this scenario, the idea of non index is proposed. > This solution will not generate an index and can perform compaction faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2624) Implement Non Index type for HUDI
yuzhaojing created HUDI-2624: Summary: Implement Non Index type for HUDI Key: HUDI-2624 URL: https://issues.apache.org/jira/browse/HUDI-2624 Project: Apache Hudi Issue Type: New Feature Components: Common Core Reporter: yuzhaojing Supports scenarios where the data does not have a primary key. At present, a common practice for these scenarios is to give a uuid as the primary key, but this will generate unnecessary indexes and consume more resources during Compaction. In order to better support this scenario, the idea of non index is proposed. This solution will not generate an index and can perform compaction faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2547) Schedule Flink compaction in service
[ https://issues.apache.org/jira/browse/HUDI-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2547: - Summary: Schedule Flink compaction in service (was: Schedule Flink Compaction in service) > Schedule Flink compaction in service > > > Key: HUDI-2547 > URL: https://issues.apache.org/jira/browse/HUDI-2547 > Project: Apache Hudi > Issue Type: New Feature > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Now run HoodieFlinkCompactor once only trigger one compaction, we can add > CompactService to schedule compaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2547) Schedule Flink Compaction in service
yuzhaojing created HUDI-2547: Summary: Schedule Flink Compaction in service Key: HUDI-2547 URL: https://issues.apache.org/jira/browse/HUDI-2547 Project: Apache Hudi Issue Type: New Feature Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Now run HoodieFlinkCompactor once only trigger one compaction, we can add CompactService to schedule compaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2377) Add rateLimiter in Bootstrap
yuzhaojing created HUDI-2377: Summary: Add rateLimiter in Bootstrap Key: HUDI-2377 URL: https://issues.apache.org/jira/browse/HUDI-2377 Project: Apache Hudi Issue Type: Wish Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Add rateLimiter in Bootstrap to avoid taskManager heartbeat timeout when cpu overload -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2376) Add pipeline for Append mode
yuzhaojing created HUDI-2376: Summary: Add pipeline for Append mode Key: HUDI-2376 URL: https://issues.apache.org/jira/browse/HUDI-2376 Project: Apache Hudi Issue Type: Wish Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Add pipeline for Append mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2368) Catch Throwable in BoundedInMemoryExecutor
yuzhaojing created HUDI-2368: Summary: Catch Throwable in BoundedInMemoryExecutor Key: HUDI-2368 URL: https://issues.apache.org/jira/browse/HUDI-2368 Project: Apache Hudi Issue Type: Wish Components: Common Core Reporter: yuzhaojing Assignee: yuzhaojing Now BoundedInMemoryExecutor only catch Exception in produce async thread and don't process the caillback result. But there may throw error in produce async thread, that we can't identify the problem and don't know why the data is missing! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2342) Optimize Bootstrap operator
yuzhaojing created HUDI-2342: Summary: Optimize Bootstrap operator Key: HUDI-2342 URL: https://issues.apache.org/jira/browse/HUDI-2342 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Move load index logic in initializeState. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2270) Remove corrupted clean action
yuzhaojing created HUDI-2270: Summary: Remove corrupted clean action Key: HUDI-2270 URL: https://issues.apache.org/jira/browse/HUDI-2270 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2087) Support Append only in Flink stream
[ https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390452#comment-17390452 ] yuzhaojing commented on HUDI-2087: -- Ok, this PR is not eager. > Support Append only in Flink stream > --- > > Key: HUDI-2087 > URL: https://issues.apache.org/jira/browse/HUDI-2087 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > Attachments: image-2021-07-08-22-04-30-039.png, > image-2021-07-08-22-04-40-018.png > > > It is necessary to support append mode in flink stream, as the data lake > should be able to write log type data as parquet high performance without > merge. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2247) Filter file where length less than parquet MAGIC length
yuzhaojing created HUDI-2247: Summary: Filter file where length less than parquet MAGIC length Key: HUDI-2247 URL: https://issues.apache.org/jira/browse/HUDI-2247 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2207) Support independent flink hudi clustering function
yuzhaojing created HUDI-2207: Summary: Support independent flink hudi clustering function Key: HUDI-2207 URL: https://issues.apache.org/jira/browse/HUDI-2207 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2193) Remove state in BootstrapFunction
yuzhaojing created HUDI-2193: Summary: Remove state in BootstrapFunction Key: HUDI-2193 URL: https://issues.apache.org/jira/browse/HUDI-2193 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Remove state in BootstrapFunction to support restart job with out bootstrap -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2171) Add parallelism conf for bootstrap operator
yuzhaojing created HUDI-2171: Summary: Add parallelism conf for bootstrap operator Key: HUDI-2171 URL: https://issues.apache.org/jira/browse/HUDI-2171 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Add parallelism conf for bootstrap operator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2169) Remove keyby when write.operation is Insert
yuzhaojing created HUDI-2169: Summary: Remove keyby when write.operation is Insert Key: HUDI-2169 URL: https://issues.apache.org/jira/browse/HUDI-2169 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing When write.operation is Insert, user can tolerate data duplication or data that does not need to be merged. In this case, the keyby is unnecessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2087) Support Append only in Flink stream
[ https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2087: - Attachment: image-2021-07-08-22-04-40-018.png > Support Append only in Flink stream > --- > > Key: HUDI-2087 > URL: https://issues.apache.org/jira/browse/HUDI-2087 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > Attachments: image-2021-07-08-22-04-30-039.png, > image-2021-07-08-22-04-40-018.png > > > It is necessary to support append mode in flink stream, as the data lake > should be able to write log type data as parquet high performance without > merge. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2087) Support Append only in Flink stream
[ https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377417#comment-17377417 ] yuzhaojing commented on HUDI-2087: -- CI is failed,but these test case success in my local. https://github.com/apache/hudi/pull/3174/checks?check_run_id=3016573136 !image-2021-07-08-22-04-40-018.png! > Support Append only in Flink stream > --- > > Key: HUDI-2087 > URL: https://issues.apache.org/jira/browse/HUDI-2087 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > Attachments: image-2021-07-08-22-04-30-039.png, > image-2021-07-08-22-04-40-018.png > > > It is necessary to support append mode in flink stream, as the data lake > should be able to write log type data as parquet high performance without > merge. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2087) Support Append only in Flink stream
[ https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2087: - Attachment: image-2021-07-08-22-04-30-039.png > Support Append only in Flink stream > --- > > Key: HUDI-2087 > URL: https://issues.apache.org/jira/browse/HUDI-2087 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > Attachments: image-2021-07-08-22-04-30-039.png, > image-2021-07-08-22-04-40-018.png > > > It is necessary to support append mode in flink stream, as the data lake > should be able to write log type data as parquet high performance without > merge. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (HUDI-2087) Support Append only in Flink stream
[ https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2087: - Comment: was deleted (was: [https://github.com/apache/hudi/pull/3174/checks?check_run_id=3016573136] !171625752829_.pic_hd.jpg!) > Support Append only in Flink stream > --- > > Key: HUDI-2087 > URL: https://issues.apache.org/jira/browse/HUDI-2087 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > It is necessary to support append mode in flink stream, as the data lake > should be able to write log type data as parquet high performance without > merge. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2087) Support Append only in Flink stream
[ https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377414#comment-17377414 ] yuzhaojing commented on HUDI-2087: -- [https://github.com/apache/hudi/pull/3174/checks?check_run_id=3016573136] !171625752829_.pic_hd.jpg! > Support Append only in Flink stream > --- > > Key: HUDI-2087 > URL: https://issues.apache.org/jira/browse/HUDI-2087 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > It is necessary to support append mode in flink stream, as the data lake > should be able to write log type data as parquet high performance without > merge. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2145) Create new bucket when NewFileAssignState filled
yuzhaojing created HUDI-2145: Summary: Create new bucket when NewFileAssignState filled Key: HUDI-2145 URL: https://issues.apache.org/jira/browse/HUDI-2145 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2141) Integration flink metric in flink stream
[ https://issues.apache.org/jira/browse/HUDI-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2141: - Summary: Integration flink metric in flink stream (was: Integration flink metric) > Integration flink metric in flink stream > > > Key: HUDI-2141 > URL: https://issues.apache.org/jira/browse/HUDI-2141 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Now hoodie metrics can't work in flink stream because Designed for batch > processing, integration flink metric in flink stream. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2141) Integration flink metric
yuzhaojing created HUDI-2141: Summary: Integration flink metric Key: HUDI-2141 URL: https://issues.apache.org/jira/browse/HUDI-2141 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Now hoodie metrics can't work in flink stream because Designed for batch processing, integration flink metric in flink stream. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2103) Add rebalance before index bootstrap
yuzhaojing created HUDI-2103: Summary: Add rebalance before index bootstrap Key: HUDI-2103 URL: https://issues.apache.org/jira/browse/HUDI-2103 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing When use flink sql upsert to hudi, user always set parallelism larger than kafak partition num. Now bootstrap operator need at least one element to trigger loading. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2087) Support Append only in Flink stream
yuzhaojing created HUDI-2087: Summary: Support Append only in Flink stream Key: HUDI-2087 URL: https://issues.apache.org/jira/browse/HUDI-2087 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing It is necessary to support append mode in flink stream, as the data lake should be able to write log type data as parquet high performance without merge. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2062) Catch FileNotFoundException in WriteProfiles #getCommitMetadataSafely
[ https://issues.apache.org/jira/browse/HUDI-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2062: - Summary: Catch FileNotFoundException in WriteProfiles #getCommitMetadataSafely (was: Catch IOException in WriteProfiles #getCommitMetadataSafely) > Catch FileNotFoundException in WriteProfiles #getCommitMetadataSafely > - > > Key: HUDI-2062 > URL: https://issues.apache.org/jira/browse/HUDI-2062 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > The function WriteProfiles #getCommitMetadataSafely expect get instant > safely, if instant deleted by cleaner that ignore it. > But in case, FileNotFoundException will throw out of try catch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2062) Catch IOException in WriteProfiles #getCommitMetadataSafely
[ https://issues.apache.org/jira/browse/HUDI-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2062: - Description: The function WriteProfiles #getCommitMetadataSafely expect get instant safely, if instant deleted by cleaner that ignore it. But in case, FileNotFoundException will throw out of try catch. was: > Catch IOException in WriteProfiles #getCommitMetadataSafely > --- > > Key: HUDI-2062 > URL: https://issues.apache.org/jira/browse/HUDI-2062 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > The function WriteProfiles #getCommitMetadataSafely expect get instant > safely, if instant deleted by cleaner that ignore it. > But in case, FileNotFoundException will throw out of try catch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2062) Catch IOException in WriteProfiles #getCommitMetadataSafely
[ https://issues.apache.org/jira/browse/HUDI-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2062: - Description: > Catch IOException in WriteProfiles #getCommitMetadataSafely > --- > > Key: HUDI-2062 > URL: https://issues.apache.org/jira/browse/HUDI-2062 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2062) Catch IOException in WriteProfiles #getCommitMetadataSafely
[ https://issues.apache.org/jira/browse/HUDI-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2062: - Description: (was: Catch IOException in WriteProfiles #getCommitMetadataSafely) > Catch IOException in WriteProfiles #getCommitMetadataSafely > --- > > Key: HUDI-2062 > URL: https://issues.apache.org/jira/browse/HUDI-2062 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2062) Catch IOException in WriteProfiles #getCommitMetadataSafely
yuzhaojing created HUDI-2062: Summary: Catch IOException in WriteProfiles #getCommitMetadataSafely Key: HUDI-2062 URL: https://issues.apache.org/jira/browse/HUDI-2062 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Catch IOException in WriteProfiles #getCommitMetadataSafely -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2052) Support load logfile in BootstrapFunction
yuzhaojing created HUDI-2052: Summary: Support load logfile in BootstrapFunction Key: HUDI-2052 URL: https://issues.apache.org/jira/browse/HUDI-2052 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Support load logfile in BootstrapFunction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2038) Support rollback inflight compaction instances for CompactionPlanOperator
[ https://issues.apache.org/jira/browse/HUDI-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2038: - Summary: Support rollback inflight compaction instances for CompactionPlanOperator (was: Rollback pending compaction when schedule new compaction) > Support rollback inflight compaction instances for CompactionPlanOperator > - > > Key: HUDI-2038 > URL: https://issues.apache.org/jira/browse/HUDI-2038 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > Rollback pending compaction when schedule new compaction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2038) Support rollback inflight compaction instances for CompactionPlanOperator
[ https://issues.apache.org/jira/browse/HUDI-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2038: - Description: Support rollback inflight compaction instances for CompactionPlanOperator (was: Rollback pending compaction when schedule new compaction) > Support rollback inflight compaction instances for CompactionPlanOperator > - > > Key: HUDI-2038 > URL: https://issues.apache.org/jira/browse/HUDI-2038 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > Support rollback inflight compaction instances for CompactionPlanOperator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Ignore FileNotFoundException in WriteProfiles #getWritePathsOfInstant
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Summary: Ignore FileNotFoundException in WriteProfiles #getWritePathsOfInstant (was: Ignore IOException in WriteProfiles #getWritePathsOfInstant) > Ignore FileNotFoundException in WriteProfiles #getWritePathsOfInstant > - > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > In case this instant will be clean at same time. > If this file be deleted, this file must not the latest version, skip. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Ignore IOException in WriteProfiles #getWritePathsOfInstant
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Description: In case this instant will be clean at same time. If this file be deleted, this file must not the latest version, skip. was: Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean. In case this instant will be clean at same time. > Ignore IOException in WriteProfiles #getWritePathsOfInstant > --- > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > In case this instant will be clean at same time. > If this file be deleted, this file must not the latest version, skip. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Ignore IOException in WriteProfiles #getWritePathsOfInstant
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Summary: Ignore IOException in WriteProfiles #getWritePathsOfInstant (was: Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean) > Ignore IOException in WriteProfiles #getWritePathsOfInstant > --- > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean. > In case this instant will be clean at same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Description: Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean. In case this instant will be clean at same time. was:Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean, because the instant don't need to be reloaded. In case this instant will be clean at same time. > Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean > - > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean. > In case this instant will be clean at same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Description: Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean, because the instant don't need to be reloaded. In case this instant will be clean at same time. (was: Filter hoodieTimeline when the instant in metadataCache, because the instant don't need to be reloaded. In case this instant will be clean at same time.) > Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean > - > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Filter hoodieTimeline instant in WriteProfile when the cleaner possibly > clean, because the instant don't need to be reloaded. In case this instant > will be clean at same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Description: Filter hoodieTimeline when the instant in metadataCache, because the instant don't need to be reloaded. In case this instant will be clean at same time. (was: Filter commit metadata when the instant in metadataCache, because the instant don't need to be reloaded. In case this instant will be clean at same time.) > Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean > - > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Filter hoodieTimeline when the instant in metadataCache, because the instant > don't need to be reloaded. In case this instant will be clean at same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Summary: Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean (was: Filter hoodieTimeline) > Filter hoodieTimeline instant in WriteProfile when the cleaner possibly clean > - > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Filter commit metadata when the instant in metadataCache, because the instant > don't need to be reloaded. In case this instant will be clean at same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Filter hoodieTimeline
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Summary: Filter hoodieTimeline (was: Filter commit metadata when the instant is loaded) > Filter hoodieTimeline > - > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Filter commit metadata when the instant in metadataCache, because the instant > don't need to be reloaded. In case this instant will be clean at same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Filter commit metadata when the instant is loaded
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Summary: Filter commit metadata when the instant is loaded (was: Filter commit metadata when the instant loaded) > Filter commit metadata when the instant is loaded > - > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Filter commit metadata when the instant in metadataCache, because the instant > don't need to be reloaded. In case this instant will be clean at same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2047) Filter commit metadata when the instant loaded
[ https://issues.apache.org/jira/browse/HUDI-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2047: - Summary: Filter commit metadata when the instant loaded (was: Filter commit metadata when the instant in metadataCache) > Filter commit metadata when the instant loaded > -- > > Key: HUDI-2047 > URL: https://issues.apache.org/jira/browse/HUDI-2047 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > Filter commit metadata when the instant in metadataCache, because the instant > don't need to be reloaded. In case this instant will be clean at same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2047) Filter commit metadata when the instant in metadataCache
yuzhaojing created HUDI-2047: Summary: Filter commit metadata when the instant in metadataCache Key: HUDI-2047 URL: https://issues.apache.org/jira/browse/HUDI-2047 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Filter commit metadata when the instant in metadataCache, because the instant don't need to be reloaded. In case this instant will be clean at same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2038) Rollback pending compaction when schedule new compaction
yuzhaojing created HUDI-2038: Summary: Rollback pending compaction when schedule new compaction Key: HUDI-2038 URL: https://issues.apache.org/jira/browse/HUDI-2038 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Rollback pending compaction when schedule new compaction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2022) Release writer for append handle #close
yuzhaojing created HUDI-2022: Summary: Release writer for append handle #close Key: HUDI-2022 URL: https://issues.apache.org/jira/browse/HUDI-2022 Project: Apache Hudi Issue Type: Improvement Components: Common Core Reporter: yuzhaojing Assignee: yuzhaojing The writer can be release eagerly to save the memory footprint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2021) Use putAll instead default in TypedProperties constructor
yuzhaojing created HUDI-2021: Summary: Use putAll instead default in TypedProperties constructor Key: HUDI-2021 URL: https://issues.apache.org/jira/browse/HUDI-2021 Project: Apache Hudi Issue Type: Improvement Components: Common Core Reporter: yuzhaojing Assignee: yuzhaojing the default in Properties can't put in other Properties -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2019) Update writeConfig in every client
yuzhaojing created HUDI-2019: Summary: Update writeConfig in every client Key: HUDI-2019 URL: https://issues.apache.org/jira/browse/HUDI-2019 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing It's only update writeConfig once in every taskmanager -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2018) Skip creating marker files for flink append handle
yuzhaojing created HUDI-2018: Summary: Skip creating marker files for flink append handle Key: HUDI-2018 URL: https://issues.apache.org/jira/browse/HUDI-2018 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Skip creating the marker files for flink append handle to make it more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2000) Release file writer for merge handle #close
[ https://issues.apache.org/jira/browse/HUDI-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2000: - Description: The file writer can be release eagerly to save the memory footprint. (was: The file writer can be cleaned eagerly to save the memory footprint.) > Release file writer for merge handle #close > --- > > Key: HUDI-2000 > URL: https://issues.apache.org/jira/browse/HUDI-2000 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > The file writer can be release eagerly to save the memory footprint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2000) Release file writer for merge handle #close
[ https://issues.apache.org/jira/browse/HUDI-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-2000: - Summary: Release file writer for merge handle #close (was: Release the new records map for merge handle #close) > Release file writer for merge handle #close > --- > > Key: HUDI-2000 > URL: https://issues.apache.org/jira/browse/HUDI-2000 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > The file writer can be cleaned eagerly to save the memory footprint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2000) Release the new records map for merge handle #close
yuzhaojing created HUDI-2000: Summary: Release the new records map for merge handle #close Key: HUDI-2000 URL: https://issues.apache.org/jira/browse/HUDI-2000 Project: Apache Hudi Issue Type: Improvement Components: Common Core Reporter: yuzhaojing Assignee: yuzhaojing The file writer can be cleaned eagerly to save the memory footprint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1990) Delete duplicate BootstrapFunction
yuzhaojing created HUDI-1990: Summary: Delete duplicate BootstrapFunction Key: HUDI-1990 URL: https://issues.apache.org/jira/browse/HUDI-1990 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing _delete duplicate BootstrapFunction operator in HoodieTableSink._ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1954) StreamWriterFunction only reset when flush success
yuzhaojing created HUDI-1954: Summary: StreamWriterFunction only reset when flush success Key: HUDI-1954 URL: https://issues.apache.org/jira/browse/HUDI-1954 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Now StreamWriterFunction flush bucket is unsafe. When instant is null, flushBucket will return immediately, and then reset this bucket resulting in data loss. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1938) Don't flush to disk before notifyCompleteCheckpoint
[ https://issues.apache.org/jira/browse/HUDI-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing closed HUDI-1938. Resolution: Fixed > Don't flush to disk before notifyCompleteCheckpoint > --- > > Key: HUDI-1938 > URL: https://issues.apache.org/jira/browse/HUDI-1938 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > > Now there may flush disk after snapshotState and before > notifyCompleteCheckpoint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1938) Don't flush to disk before notifyCompleteCheckpoint
yuzhaojing created HUDI-1938: Summary: Don't flush to disk before notifyCompleteCheckpoint Key: HUDI-1938 URL: https://issues.apache.org/jira/browse/HUDI-1938 Project: Apache Hudi Issue Type: Sub-task Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Now there may flush disk after snapshotState and before notifyCompleteCheckpoint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1923) Add state in StreamWriteFunction to restore
[ https://issues.apache.org/jira/browse/HUDI-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuzhaojing updated HUDI-1923: - Description: In flink, notifyCheckpointComplete not in checkpoint life cycle. If a checkpoint is success and commit action in notifyCheckpointComplete is failed, when we restore from the latest checkpoint, the element belong this instant will be discard. So, we should store commit state and restore it when flink restart. was:if coordinator notifyCheckpointComplete funtion execute failed, when we restore from the latest checkpoint, the element belong this instant will be discard. > Add state in StreamWriteFunction to restore > --- > > Key: HUDI-1923 > URL: https://issues.apache.org/jira/browse/HUDI-1923 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > > In flink, notifyCheckpointComplete not in checkpoint life cycle. If a > checkpoint is success and commit action in notifyCheckpointComplete is > failed, when we restore from the latest checkpoint, the element belong this > instant will be discard. > So, we should store commit state and restore it when flink restart. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1924) Support bootstrap operator to load index from hoodieTable
yuzhaojing created HUDI-1924: Summary: Support bootstrap operator to load index from hoodieTable Key: HUDI-1924 URL: https://issues.apache.org/jira/browse/HUDI-1924 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing Now we load index in BucketAssign, but hoodieRecords in a baseFile may be belong many task, So we have to load all files in any BucketAssign task. If we add a operator before BucketAssign, then key by index Record to BucketAssign, that we can implement assign part of files to any task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1923) Add state in StreamWriteFunction to restore
yuzhaojing created HUDI-1923: Summary: Add state in StreamWriteFunction to restore Key: HUDI-1923 URL: https://issues.apache.org/jira/browse/HUDI-1923 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: yuzhaojing Assignee: yuzhaojing if coordinator notifyCheckpointComplete funtion execute failed, when we restore from the latest checkpoint, the element belong this instant will be discard. -- This message was sent by Atlassian Jira (v8.3.4#803005)