[jira] [Created] (HUDI-8074) Improve comaction operator shuffle rebanlance

2024-08-13 Thread Danny Chen (Jira)
Danny Chen created HUDI-8074:


 Summary: Improve comaction operator shuffle  rebanlance
 Key: HUDI-8074
 URL: https://issues.apache.org/jira/browse/HUDI-8074
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-8074) Improve comaction operator shuffle rebanlance

2024-08-13 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-8074.

Resolution: Fixed

Fixed via master branch: 35c00daaf871a6c1b87d6a440832d60f9b26ee14

> Improve comaction operator shuffle  rebanlance
> --
>
> Key: HUDI-8074
> URL: https://issues.apache.org/jira/browse/HUDI-8074
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Danny Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-8077) Fix the incremental cleaning to base on completion time

2024-08-13 Thread Danny Chen (Jira)
Danny Chen created HUDI-8077:


 Summary: Fix the incremental cleaning to base on completion time
 Key: HUDI-8077
 URL: https://issues.apache.org/jira/browse/HUDI-8077
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: Danny Chen
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8077) Fix the incremental cleaning to base on completion time

2024-08-13 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-8077:
-
Description: 
Currently, the incremental cleaning will remember a marker instant of last 
retained in the commit metadata, both the marker and the filtering instant on 
fs view are start times(instant times), while this is okay for most of the 
cases because we actually have some buffer time for cleaning(30 commits 
retained by default), but if the user sets up a very redical strategy like 
clean for every commits, then there might be isses in NB-CC mode:

A instant that starts very early but finished recently might be skipped by the 
cleaning table service.

> Fix the incremental cleaning to base on completion time
> ---
>
> Key: HUDI-8077
> URL: https://issues.apache.org/jira/browse/HUDI-8077
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Danny Chen
>Priority: Major
> Fix For: 1.0.0
>
>
> Currently, the incremental cleaning will remember a marker instant of last 
> retained in the commit metadata, both the marker and the filtering instant on 
> fs view are start times(instant times), while this is okay for most of the 
> cases because we actually have some buffer time for cleaning(30 commits 
> retained by default), but if the user sets up a very redical strategy like 
> clean for every commits, then there might be isses in NB-CC mode:
> A instant that starts very early but finished recently might be skipped by 
> the cleaning table service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7873) Remove getStorage method from HoodieReaderContext

2024-08-14 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7873.

Fix Version/s: 1.0.0
   Resolution: Fixed

Fixed via master branch: 64f546b8f0cae70793a6150170a649bad8e0e146

> Remove getStorage method from HoodieReaderContext
> -
>
> Key: HUDI-7873
> URL: https://issues.apache.org/jira/browse/HUDI-7873
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> All implementations of the method were the same, and it was only used by a 
> test method becuase storage is passed as a param to the fg reader.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8097) Schema evolution setting from hudi-defaults.conf is ignored while altering column in Spark

2024-08-19 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-8097:
-
Fix Version/s: 1.0.0

> Schema evolution setting from hudi-defaults.conf is ignored while altering 
> column in Spark
> --
>
> Key: HUDI-8097
> URL: https://issues.apache.org/jira/browse/HUDI-8097
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vova Kolmakov
>Assignee: Vova Kolmakov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Value of `hoodie.schema.on.read.enable` from externalized config 
> `hudi-defaults.conf` is not taken into consideration while processing 
> commands such as `alter table change column .. ` in Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8097) Schema evolution setting from hudi-defaults.conf is ignored while altering column in Spark

2024-08-19 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-8097:
-
Status: Open  (was: In Progress)

> Schema evolution setting from hudi-defaults.conf is ignored while altering 
> column in Spark
> --
>
> Key: HUDI-8097
> URL: https://issues.apache.org/jira/browse/HUDI-8097
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vova Kolmakov
>Assignee: Vova Kolmakov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Value of `hoodie.schema.on.read.enable` from externalized config 
> `hudi-defaults.conf` is not taken into consideration while processing 
> commands such as `alter table change column .. ` in Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-8097) Schema evolution setting from hudi-defaults.conf is ignored while altering column in Spark

2024-08-19 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-8097.

Resolution: Fixed

Fixed via master branch: e4e5bf0232093bfba25f1769fb63258e9f816a8f

> Schema evolution setting from hudi-defaults.conf is ignored while altering 
> column in Spark
> --
>
> Key: HUDI-8097
> URL: https://issues.apache.org/jira/browse/HUDI-8097
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vova Kolmakov
>Assignee: Vova Kolmakov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Value of `hoodie.schema.on.read.enable` from externalized config 
> `hudi-defaults.conf` is not taken into consideration while processing 
> commands such as `alter table change column .. ` in Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8104) Hive on spark select mor rt table, org.apache.hudi.hadoop.hive.HoodieCombineRealtimeFileSplit cannot be cast to org.apache.hadoop.hive.shims.HadoopShimsSecure

2024-08-20 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-8104:
-
Fix Version/s: 1.0.0

> Hive on spark select mor rt table, 
> org.apache.hudi.hadoop.hive.HoodieCombineRealtimeFileSplit cannot be cast to 
> org.apache.hadoop.hive.shims.HadoopShimsSecure
> --
>
> Key: HUDI-8104
> URL: https://issues.apache.org/jira/browse/HUDI-8104
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: felixzh
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Hive on spark select mor rt table, ERROR: 
> org.apache.hudi.hadoop.hive.HoodieCombineRealtimeFileSplit cannot be cast to 
> org.apache.hadoop.hive.shims.HadoopShimsSecure
> The reason for the error is 
> HoodieParquetRealtimeInputFormat's job not set correctly hudi.hive.realtime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-8104) Hive on spark select mor rt table, org.apache.hudi.hadoop.hive.HoodieCombineRealtimeFileSplit cannot be cast to org.apache.hadoop.hive.shims.HadoopShimsSecure

2024-08-20 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-8104.

Resolution: Fixed

Fixed via master branch: b6bad403fa97479c44112b741007a08102617ab3

> Hive on spark select mor rt table, 
> org.apache.hudi.hadoop.hive.HoodieCombineRealtimeFileSplit cannot be cast to 
> org.apache.hadoop.hive.shims.HadoopShimsSecure
> --
>
> Key: HUDI-8104
> URL: https://issues.apache.org/jira/browse/HUDI-8104
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: felixzh
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Hive on spark select mor rt table, ERROR: 
> org.apache.hudi.hadoop.hive.HoodieCombineRealtimeFileSplit cannot be cast to 
> org.apache.hadoop.hive.shims.HadoopShimsSecure
> The reason for the error is 
> HoodieParquetRealtimeInputFormat's job not set correctly hudi.hive.realtime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-8112) Fix TestHoodieActiveTimeline#testTimelineGetOperations test logic error

2024-08-22 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-8112.

Resolution: Fixed

Fixed via master branch: e1f70fda5890cff44b3e44eb362c5a411f4174f3

> Fix TestHoodieActiveTimeline#testTimelineGetOperations test logic error
> ---
>
> Key: HUDI-8112
> URL: https://issues.apache.org/jira/browse/HUDI-8112
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: bradley
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8112) Fix TestHoodieActiveTimeline#testTimelineGetOperations test logic error

2024-08-22 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-8112:
-
Fix Version/s: 1.0.0

> Fix TestHoodieActiveTimeline#testTimelineGetOperations test logic error
> ---
>
> Key: HUDI-8112
> URL: https://issues.apache.org/jira/browse/HUDI-8112
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: bradley
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-8070) Support Flink 1.19 in Hudi

2024-08-22 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-8070.

Resolution: Fixed

Fixed via master branch: d550b46d72488f92abfb234f919506e76900d69f

> Support Flink 1.19 in Hudi
> --
>
> Key: HUDI-8070
> URL: https://issues.apache.org/jira/browse/HUDI-8070
> Project: Apache Hudi
>  Issue Type: Task
>  Components: flink
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-8078) Persisting writestatus to optimize the writes

2024-08-24 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-8078.

Resolution: Fixed

Fixed via master branch: e0ef86421993c1664da5cbb3ab7de7e87f16cb49

> Persisting writestatus to optimize the writes 
> --
>
> Key: HUDI-8078
> URL: https://issues.apache.org/jira/browse/HUDI-8078
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Aditya Goenka
>Priority: Critical
>  Labels: pull-request-available, spark
> Fix For: 1.1.0, 0.15.1
>
>
> Details on this issue - [https://github.com/apache/hudi/issues/11741]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-8078) Persisting writestatus to optimize the writes

2024-08-24 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-8078:


Assignee: Danny Chen

> Persisting writestatus to optimize the writes 
> --
>
> Key: HUDI-8078
> URL: https://issues.apache.org/jira/browse/HUDI-8078
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Aditya Goenka
>Assignee: Danny Chen
>Priority: Critical
>  Labels: pull-request-available, spark
> Fix For: 1.1.0, 0.15.1
>
>
> Details on this issue - [https://github.com/apache/hudi/issues/11741]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6788) Integrate FileGroupReader with MergeOnReadInputFormat for Flink

2024-08-25 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6788:
-
Description: The existing 

> Integrate FileGroupReader with MergeOnReadInputFormat for Flink
> ---
>
> Key: HUDI-6788
> URL: https://issues.apache.org/jira/browse/HUDI-6788
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Zhenqiu Huang
>Priority: Blocker
> Fix For: 1.0.0
>
>
> The existing 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6788) Integrate FileGroupReader with MergeOnReadInputFormat for Flink

2024-08-25 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6788:
-
Description: 
The existing MergeOnReadInputFormat implements different iterators for all 
kinds of read more: incremental read, read optimized view, snapshot view etc. 
While for better performance and code evolving, we can integrate the new 
FileGroupReader, the main difference is that the FileGroupReader capsulate the 
file slice logs and parquet merging logic, so each iterator can ease the 
redundant work for quering the fs view and comprising the file slices.

We can integrate step by step for different read views: 1. snapshot queries 2. 
read optimized queries 3. skip merge queries

For usability and smoth evolving, we should add a flag for the new reader, the 
old code path should be kept there for 1 or 2 releases.

The major work AIs includes:

1. implement the HoodieFlinkRecord akka to the HoodieSparkRecord;
2. implement the Flink specific FileGroupReader with the HoodieFlinkRecord;

3. Flink implements the snapshot queries using the file group reader;

4. Flink implements the read optimized queries using the file group reader;

5. Flink implements the skip merge queries using the file group reader.

  was:The existing 


> Integrate FileGroupReader with MergeOnReadInputFormat for Flink
> ---
>
> Key: HUDI-6788
> URL: https://issues.apache.org/jira/browse/HUDI-6788
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Zhenqiu Huang
>Priority: Blocker
> Fix For: 1.0.0
>
>
> The existing MergeOnReadInputFormat implements different iterators for all 
> kinds of read more: incremental read, read optimized view, snapshot view etc. 
> While for better performance and code evolving, we can integrate the new 
> FileGroupReader, the main difference is that the FileGroupReader capsulate 
> the file slice logs and parquet merging logic, so each iterator can ease the 
> redundant work for quering the fs view and comprising the file slices.
> We can integrate step by step for different read views: 1. snapshot queries 
> 2. read optimized queries 3. skip merge queries
> For usability and smoth evolving, we should add a flag for the new reader, 
> the old code path should be kept there for 1 or 2 releases.
> The major work AIs includes:
> 1. implement the HoodieFlinkRecord akka to the HoodieSparkRecord;
> 2. implement the Flink specific FileGroupReader with the HoodieFlinkRecord;
> 3. Flink implements the snapshot queries using the file group reader;
> 4. Flink implements the read optimized queries using the file group reader;
> 5. Flink implements the skip merge queries using the file group reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-8118) Implement the Flink specific FileGroupReader with the HoodieFlinkRecord

2024-08-25 Thread Danny Chen (Jira)
Danny Chen created HUDI-8118:


 Summary: Implement the Flink specific FileGroupReader with the 
HoodieFlinkRecord
 Key: HUDI-8118
 URL: https://issues.apache.org/jira/browse/HUDI-8118
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: flink
Reporter: Danny Chen
Assignee: Zhenqiu Huang
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-8117) Implement the HoodieFlinkRecord akka to the HoodieSparkRecord

2024-08-25 Thread Danny Chen (Jira)
Danny Chen created HUDI-8117:


 Summary: Implement the HoodieFlinkRecord akka to the 
HoodieSparkRecord
 Key: HUDI-8117
 URL: https://issues.apache.org/jira/browse/HUDI-8117
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: flink
Reporter: Danny Chen
Assignee: Zhenqiu Huang
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-8119) Flink implements the snapshot queries using the file group reader

2024-08-25 Thread Danny Chen (Jira)
Danny Chen created HUDI-8119:


 Summary: Flink implements the snapshot queries using the file 
group reader
 Key: HUDI-8119
 URL: https://issues.apache.org/jira/browse/HUDI-8119
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: flink
Reporter: Danny Chen
Assignee: Zhenqiu Huang
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-8120) Flink implements the read optimized queries using the file group reader

2024-08-25 Thread Danny Chen (Jira)
Danny Chen created HUDI-8120:


 Summary: Flink implements the read optimized queries using the 
file group reader
 Key: HUDI-8120
 URL: https://issues.apache.org/jira/browse/HUDI-8120
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: flink
Reporter: Danny Chen
Assignee: Zhenqiu Huang
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-8121) Flink implements the skip merge queries using the file group reader

2024-08-25 Thread Danny Chen (Jira)
Danny Chen created HUDI-8121:


 Summary: Flink implements the skip merge queries using the file 
group reader
 Key: HUDI-8121
 URL: https://issues.apache.org/jira/browse/HUDI-8121
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: flink
Reporter: Danny Chen
Assignee: Zhenqiu Huang
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-8136) New instant time generation for Flink streaming pipeline

2024-08-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-8136:


Assignee: Danny Chen

> New instant time generation for Flink streaming pipeline
> 
>
> Key: HUDI-8136
> URL: https://issues.apache.org/jira/browse/HUDI-8136
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-8136) New instant time generation for Flink streaming pipeline

2024-08-28 Thread Danny Chen (Jira)
Danny Chen created HUDI-8136:


 Summary: New instant time generation for Flink streaming pipeline
 Key: HUDI-8136
 URL: https://issues.apache.org/jira/browse/HUDI-8136
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Danny Chen
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8136) New instant time generation for Flink streaming pipeline

2024-08-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-8136:
-
Description: 
h2. Design

The new design moves the instant time generation from being blocked on the 
checkpoint completion on the co-ordinator, to the writers cheaply obtaining the 
instant time to use before every file write by sending a light-weight request 
to the coordinator. The coordinator will maintain a mapping from checkpoint 
barriers(ids) to the instant times to be used by writers for file written 
during a checkpoint. When the writer sends a new instant time request, 
coordinator will return an existing pending instant or a new one(based on the 
comparison of the last finished checkpoint barrier from the writer and the 
existing barriers on the coordinator). We allow at most 2 pending instants on 
the coordinator. When an instant is committed, the pending mapping item from 
memory should be removed. Note that new instant time generation/switching on 
checkpoint start and instant time serving should happen in sequence (the same 
thread or a in-process lock) in the coordinator.
The design assumes the following invariants.. * Once a writer task starts 
writing a file with time {{tx}} , it will not write any file with time {{ty}} , 
such that {{ty < tx}}
 * Checkpoint {{i}} will be started on the co-ordinator only after checkpoint 
{{j}} completes such that {{i < j}}

 
Overall flow is as follows : # During startup co-ordinator generates a new 
instant {{tx}} and request it on the timeline.
 # 
 ## Happens within a process-local lock shared by instant time generation
 ## Any request for instant times from writer tasks, will serve {{tx}}
 # Writer tasks fetch an instant time to use for any file written, by issuing a 
call to co-ordinator
 ## writer tasks keep track of all files written by them between checkpoints.
 # When starting a checkpoint, co-ordinator takes the lock again and generates 
a new instant {{ty}} , request it on the timeline
 ## This ensure all instant time fetches for {{tx}} are first served, before 
checkpoint starts. i.e they will be reported back to the co-ordinator when the 
checkpoint completes.
 # When they receive the event to checkpoint, the writer tasks flush all open 
files, return the list of files written as a part of the checkpoint.
 # Once co-ordinator receives a list of all files from all writer tasks, it can 
include two types of files.
 ## files belonging to {{tx}} , which can now be comitted since we know no 
files with time {{tx}} could have been written between steps 3 & 5.
 ## files belonging to {{ty}} , which cannot be committed yet (there could be 
files still being written for {{ty}} ), but co-ordinator needs to ensure these 
files are ultimately included when {{ty}} is committed.

h4. The rollback of failed instants
The clean policy should be always configured as lazy, when a {{ck_n, 
instant_m}} was committed to Hudi timeline, all the instants that are earlier 
than {{instant_m}} should invoke explicitly close to shutdown the heartbeat 
thread so that the async cleaner would finally roll back it.
When a full task failover triggers, all the pending instants should be rolled 
back, but we would hand over this task to async cleaner.

  was:
h2. Design
The new design moves the instant time generation from being blocked on the 
checkpoint completion on the co-ordinator, to the writers cheaply obtaining the 
instant time to use before every file write by sending a light-weight request 
to the coordinator. The coordinator will maintain a mapping from checkpoint 
barriers(ids) to the instant times to be used by writers for file written 
during a checkpoint. When the writer sends a new instant time request, 
coordinator will return an existing pending instant or a new one(based on the 
comparison of the last finished checkpoint barrier from the writer and the 
existing barriers on the coordinator). We allow at most 2 pending instants on 
the coordinator. When an instant is committed, the pending mapping item from 
memory should be removed. Note that new instant time generation/switching on 
checkpoint start and instant time serving should happen in sequence (the same 
thread or a in-process lock) in the coordinator.
The design assumes the following invariants.. * Once a writer task starts 
writing a file with time {{tx}} , it will not write any file with time {{ty}} , 
such that {{ty < tx}}
 * Checkpoint {{i}} will be started on the co-ordinator only after checkpoint 
{{j}} completes such that {{i < j}}

 
Overall flow is as follows : # During startup co-ordinator generates a new 
instant {{tx}} and request it on the timeline.
 ## Happens within a process-local lock shared by instant time generation
 ## Any request for instant times from writer tasks, will serve {{tx}}
 # Writer tasks fetch an instant time to use for any file written, by issuing a 
call to co-ordinator
 ## wr

[jira] [Updated] (HUDI-8136) New instant time generation for Flink streaming pipeline

2024-08-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-8136:
-
Description: 
h2. Design
The new design moves the instant time generation from being blocked on the 
checkpoint completion on the co-ordinator, to the writers cheaply obtaining the 
instant time to use before every file write by sending a light-weight request 
to the coordinator. The coordinator will maintain a mapping from checkpoint 
barriers(ids) to the instant times to be used by writers for file written 
during a checkpoint. When the writer sends a new instant time request, 
coordinator will return an existing pending instant or a new one(based on the 
comparison of the last finished checkpoint barrier from the writer and the 
existing barriers on the coordinator). We allow at most 2 pending instants on 
the coordinator. When an instant is committed, the pending mapping item from 
memory should be removed. Note that new instant time generation/switching on 
checkpoint start and instant time serving should happen in sequence (the same 
thread or a in-process lock) in the coordinator.
The design assumes the following invariants.. * Once a writer task starts 
writing a file with time {{tx}} , it will not write any file with time {{ty}} , 
such that {{ty < tx}}
 * Checkpoint {{i}} will be started on the co-ordinator only after checkpoint 
{{j}} completes such that {{i < j}}

 
Overall flow is as follows : # During startup co-ordinator generates a new 
instant {{tx}} and request it on the timeline.
 ## Happens within a process-local lock shared by instant time generation
 ## Any request for instant times from writer tasks, will serve {{tx}}
 # Writer tasks fetch an instant time to use for any file written, by issuing a 
call to co-ordinator
 ## writer tasks keep track of all files written by them between checkpoints.
 # When starting a checkpoint, co-ordinator takes the lock again and generates 
a new instant {{ty}} , request it on the timeline
 ## This ensure all instant time fetches for {{tx}} are first served, before 
checkpoint starts. i.e they will be reported back to the co-ordinator when the 
checkpoint completes.
 # When they receive the event to checkpoint, the writer tasks flush all open 
files, return the list of files written as a part of the checkpoint.
 # Once co-ordinator receives a list of all files from all writer tasks, it can 
include two types of files.
 ## files belonging to {{tx}} , which can now be comitted since we know no 
files with time {{tx}} could have been written between steps 3 & 5.
 ## files belonging to {{ty}} , which cannot be committed yet (there could be 
files still being written for {{ty}} ), but co-ordinator needs to ensure these 
files are ultimately included when {{ty}} is committed. 

 

> New instant time generation for Flink streaming pipeline
> 
>
> Key: HUDI-8136
> URL: https://issues.apache.org/jira/browse/HUDI-8136
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.0.0
>
>
> h2. Design
> The new design moves the instant time generation from being blocked on the 
> checkpoint completion on the co-ordinator, to the writers cheaply obtaining 
> the instant time to use before every file write by sending a light-weight 
> request to the coordinator. The coordinator will maintain a mapping from 
> checkpoint barriers(ids) to the instant times to be used by writers for file 
> written during a checkpoint. When the writer sends a new instant time 
> request, coordinator will return an existing pending instant or a new 
> one(based on the comparison of the last finished checkpoint barrier from the 
> writer and the existing barriers on the coordinator). We allow at most 2 
> pending instants on the coordinator. When an instant is committed, the 
> pending mapping item from memory should be removed. Note that new instant 
> time generation/switching on checkpoint start and instant time serving should 
> happen in sequence (the same thread or a in-process lock) in the coordinator.
> The design assumes the following invariants.. * Once a writer task starts 
> writing a file with time {{tx}} , it will not write any file with time {{ty}} 
> , such that {{ty < tx}}
>  * Checkpoint {{i}} will be started on the co-ordinator only after checkpoint 
> {{j}} completes such that {{i < j}}
>  
> Overall flow is as follows : # During startup co-ordinator generates a new 
> instant {{tx}} and request it on the timeline.
>  ## Happens within a process-local lock shared by instant time generation
>  ## Any request for instant times from writer tasks, will serve {{tx}}
>  # Writer tasks fetch an instant time to use for any file written, by issu

[jira] [Created] (HUDI-2143) Tweak the default compaction target IO to 500GB when flink async compaction is off

2021-07-07 Thread Danny Chen (Jira)
Danny Chen created HUDI-2143:


 Summary: Tweak the default compaction target IO to 500GB when 
flink async compaction is off
 Key: HUDI-2143
 URL: https://issues.apache.org/jira/browse/HUDI-2143
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-11 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378949#comment-17378949
 ] 

Danny Chen commented on HUDI-2162:
--

You should set up the timeout correctly.

> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous , and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown. 
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once, This kind of usage is too weak under the 
> context.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-07-13 Thread Danny Chen (Jira)
Danny Chen created HUDI-2170:


 Summary: Always choose the latest record for HoodieRecordPayload
 Key: HUDI-2170
 URL: https://issues.apache.org/jira/browse/HUDI-2170
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Common Core
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
record when the new record has the same preCombine field with the old one, 
actually it is more natural to keep the new incoming record instead. The 
{{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-07-13 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-2170:
-
Description: 
Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
record when the new record has the same preCombine field with the old one, 
actually it is more natural to keep the new incoming record instead. The 
{{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did that.

See issue: https://github.com/apache/hudi/issues/3266.

  was:Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the 
old record when the new record has the same preCombine field with the old one, 
actually it is more natural to keep the new incoming record instead. The 
{{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did that.


> Always choose the latest record for HoodieRecordPayload
> ---
>
> Key: HUDI-2170
> URL: https://issues.apache.org/jira/browse/HUDI-2170
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>
> Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
> record when the new record has the same preCombine field with the old one, 
> actually it is more natural to keep the new incoming record instead. The 
> {{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did 
> that.
> See issue: https://github.com/apache/hudi/issues/3266.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2171) Add parallelism conf for bootstrap operator

2021-07-14 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2171.
--
Fix Version/s: 0.9.0
   Resolution: Fixed

Fixed via master branch: 632bfd1a65f55deff60bd56e514738b9c8730140

> Add parallelism conf for bootstrap operator
> ---
>
> Key: HUDI-2171
> URL: https://issues.apache.org/jira/browse/HUDI-2171
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Add parallelism conf for bootstrap operator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-15 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2153.
--
  Assignee: Danny Chen
Resolution: Fixed

Fixed via master branch: 23a4a96eb416c13d8b1f921cc51286f93d61d9b3

> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2181) Refine doc for FlinkCreateHandle

2021-07-15 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381287#comment-17381287
 ] 

Danny Chen commented on HUDI-2181:
--

Reasonable, welcome for the contribution ~

> Refine doc for FlinkCreateHandle
> 
>
> Key: HUDI-2181
> URL: https://issues.apache.org/jira/browse/HUDI-2181
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Major
>
> FlinkCreateHandle does not append to the original file for subsequent 
> mini-batches, instead every inserts batch would create a new file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2185) Remove the default parallelism of index bootstrap and bucket assigner

2021-07-15 Thread Danny Chen (Jira)
Danny Chen created HUDI-2185:


 Summary: Remove the default parallelism of index bootstrap and 
bucket assigner
 Key: HUDI-2185
 URL: https://issues.apache.org/jira/browse/HUDI-2185
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2185) Remove the default parallelism of index bootstrap and bucket assigner

2021-07-16 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2185.
--
Resolution: Fixed

Fixed via master branch: c8aaf00819b7af9c5575662868aa40f702a8aa1e

> Remove the default parallelism of index bootstrap and bucket assigner
> -
>
> Key: HUDI-2185
> URL: https://issues.apache.org/jira/browse/HUDI-2185
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2087) Support Append only in Flink stream

2021-07-16 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reopened HUDI-2087:
--

Revert and reopen for more disscussion.

> Support Append only in Flink stream
> ---
>
> Key: HUDI-2087
> URL: https://issues.apache.org/jira/browse/HUDI-2087
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-07-08-22-04-30-039.png, 
> image-2021-07-08-22-04-40-018.png
>
>
> It is necessary to support append mode in flink stream, as the data lake 
> should be able to write log type data as parquet high performance without 
> merge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2191) Bump flink version to 1.13.1

2021-07-17 Thread Danny Chen (Jira)
Danny Chen created HUDI-2191:


 Summary: Bump flink version to 1.13.1
 Key: HUDI-2191
 URL: https://issues.apache.org/jira/browse/HUDI-2191
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Aims to use flink 1.13.1 for 0.9.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2193) Remove state in BootstrapFunction

2021-07-19 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2193.
--
Fix Version/s: 0.9.0
   Resolution: Fixed

Fixed via master branch: 2099bf41db76e9a6e946aa41c318b7c0e18be04d

> Remove state in BootstrapFunction
> -
>
> Key: HUDI-2193
> URL: https://issues.apache.org/jira/browse/HUDI-2193
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Remove state in BootstrapFunction to support restart job with out bootstrap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2198) Clean and reset the bootstrap events for coordinator when task failover

2021-07-19 Thread Danny Chen (Jira)
Danny Chen created HUDI-2198:


 Summary: Clean and reset the bootstrap events for coordinator when 
task failover
 Key: HUDI-2198
 URL: https://issues.apache.org/jira/browse/HUDI-2198
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2145) Create new bucket when NewFileAssignState filled

2021-07-20 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2145.
--
Fix Version/s: 0.9.0
   Resolution: Fixed

Fixed via master branch: 634163a990569aa4463b58830396f455dd15340c

> Create new bucket when NewFileAssignState filled
> 
>
> Key: HUDI-2145
> URL: https://issues.apache.org/jira/browse/HUDI-2145
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2198) Clean and reset the bootstrap events for coordinator when task failover

2021-07-20 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2198.
--
Fix Version/s: 0.9.0
   Resolution: Fixed

Fixed via master branch: 858e84b5b2f1c5fbd4266922b494ad0d16f5b92a

> Clean and reset the bootstrap events for coordinator when task failover
> ---
>
> Key: HUDI-2198
> URL: https://issues.apache.org/jira/browse/HUDI-2198
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2204) Add marker files for flink writer

2021-07-21 Thread Danny Chen (Jira)
Danny Chen created HUDI-2204:


 Summary: Add marker files for flink writer
 Key: HUDI-2204
 URL: https://issues.apache.org/jira/browse/HUDI-2204
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2205) Rollback inflight compaction for flink writer

2021-07-21 Thread Danny Chen (Jira)
Danny Chen created HUDI-2205:


 Summary: Rollback inflight compaction for flink writer
 Key: HUDI-2205
 URL: https://issues.apache.org/jira/browse/HUDI-2205
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2204) Add marker files for flink writer

2021-07-21 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2204.
--
Resolution: Fixed

Fixed via master branch: 2370a9facbe4418f994f29c426e9b2a255e3abb0

> Add marker files for flink writer
> -
>
> Key: HUDI-2204
> URL: https://issues.apache.org/jira/browse/HUDI-2204
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2206) Fix checkpoint blocked because getLastPendingInstant() action after than restoreWriteMetadata() action

2021-07-22 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2206.
--
  Assignee: Danny Chen
Resolution: Fixed

Fixed via master branch: fe5d2e7f53eb36d9e35acd8d944e59d988dba475

> Fix checkpoint blocked  because getLastPendingInstant() action after than 
> restoreWriteMetadata() action
> ---
>
> Key: HUDI-2206
> URL: https://issues.apache.org/jira/browse/HUDI-2206
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Fix checkpoint blocked because getLastPendingInstant() action after than 
> restoreWriteMetadata() action which will cause a deadlock during checkpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2209) Bulk insert for flink writer

2021-07-22 Thread Danny Chen (Jira)
Danny Chen created HUDI-2209:


 Summary: Bulk insert for flink writer
 Key: HUDI-2209
 URL: https://issues.apache.org/jira/browse/HUDI-2209
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2219) Fix NPE of HoodieConfig

2021-07-25 Thread Danny Chen (Jira)
Danny Chen created HUDI-2219:


 Summary: Fix NPE of HoodieConfig
 Key: HUDI-2219
 URL: https://issues.apache.org/jira/browse/HUDI-2219
 Project: Apache Hudi
  Issue Type: Bug
  Components: Compaction
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2209) Bulk insert for flink writer

2021-07-26 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2209.
--
Resolution: Fixed

Fixed via master branch: 9d2a65a6a6ff9add81411147f1cddd03f7c08e6c

> Bulk insert for flink writer
> 
>
> Key: HUDI-2209
> URL: https://issues.apache.org/jira/browse/HUDI-2209
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2227) Only sync hive meta on successful commit for flink batch writer

2021-07-26 Thread Danny Chen (Jira)
Danny Chen created HUDI-2227:


 Summary: Only sync hive meta on successful commit for flink batch 
writer
 Key: HUDI-2227
 URL: https://issues.apache.org/jira/browse/HUDI-2227
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2219) Fix NPE of HoodieConfig

2021-07-27 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2219.
--
Resolution: Fixed

Fixed via master branch: ab2e0d0ba2697ba0750ec52fbb3b3a0187734a4b

> Fix NPE of HoodieConfig
> ---
>
> Key: HUDI-2219
> URL: https://issues.apache.org/jira/browse/HUDI-2219
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2227) Only sync hive meta on successful commit for flink batch writer

2021-07-27 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2227.
--
Resolution: Fixed

Fixed via master branch: 60758b36ea033a5985eabc53da5f6036ffaa5c0d

> Only sync hive meta on successful commit for flink batch writer
> ---
>
> Key: HUDI-2227
> URL: https://issues.apache.org/jira/browse/HUDI-2227
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2215) Add rateLimiter when Flink writes to hudi

2021-07-27 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2215.
--
Resolution: Fixed

Fixed via master branch: 00cd35f90ad0cd96a943edd148bf881fc4b7bb5b

> Add rateLimiter when Flink writes to hudi
> -
>
> Key: HUDI-2215
> URL: https://issues.apache.org/jira/browse/HUDI-2215
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: WangMinChao
>Assignee: WangMinChao
>Priority: Minor
>  Labels: flink, pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2241) Explicit parallelism for flink bulk insert

2021-07-27 Thread Danny Chen (Jira)
Danny Chen created HUDI-2241:


 Summary: Explicit parallelism for flink bulk insert
 Key: HUDI-2241
 URL: https://issues.apache.org/jira/browse/HUDI-2241
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2245) BucketAssigner generates the fileId evenly to avoid data skew

2021-07-28 Thread Danny Chen (Jira)
Danny Chen created HUDI-2245:


 Summary: BucketAssigner generates the fileId evenly to avoid data 
skew
 Key: HUDI-2245
 URL: https://issues.apache.org/jira/browse/HUDI-2245
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2246) Time travel query for flink sql

2021-07-28 Thread Danny Chen (Jira)
Danny Chen created HUDI-2246:


 Summary: Time travel query for flink sql
 Key: HUDI-2246
 URL: https://issues.apache.org/jira/browse/HUDI-2246
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2245) BucketAssigner generates the fileId evenly to avoid data skew

2021-07-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2245.
--
Resolution: Fixed

Fixed via master branch: 91c221341293e80c28cce19f1642199495a96f66

> BucketAssigner generates the fileId evenly to avoid data skew
> -
>
> Key: HUDI-2245
> URL: https://issues.apache.org/jira/browse/HUDI-2245
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2228) Add option 'hive_sync.mode' for flink writer

2021-07-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2228.
--
Resolution: Fixed

Fixed via master branch: 7739518879cf1c30576b938d705947192a2dd9ad

> Add option 'hive_sync.mode' for flink writer
> 
>
> Key: HUDI-2228
> URL: https://issues.apache.org/jira/browse/HUDI-2228
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> [Add option 'hive_sync.mode' for flink 
> writer|https://github.com/apache/hudi/pull/3352/commits/9ff4b7382c22745aed8dfc96e93bdc53fbd51a25].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2241) Explicit parallelism for flink bulk insert

2021-07-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2241.
--
Resolution: Fixed

Fixed via master branch: efbbb67420b7082a3960f3d32215dabd959f5525

> Explicit parallelism for flink bulk insert
> --
>
> Key: HUDI-2241
> URL: https://issues.apache.org/jira/browse/HUDI-2241
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2254) Builtin sort operator for flink bulk insert

2021-07-29 Thread Danny Chen (Jira)
Danny Chen created HUDI-2254:


 Summary: Builtin sort operator for flink bulk insert
 Key: HUDI-2254
 URL: https://issues.apache.org/jira/browse/HUDI-2254
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2252) Default consumes from the latest instant for flink streaming reader

2021-07-29 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-2252:
-
Summary: Default consumes from the latest instant for flink streaming 
reader  (was: Replace read full data with read  latest commit data in flink 
stream read)

> Default consumes from the latest instant for flink streaming reader
> ---
>
> Key: HUDI-2252
> URL: https://issues.apache.org/jira/browse/HUDI-2252
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Replace read full data with read latest commit data in flink stream read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2252) Replace read full data with read latest commit data in flink stream read

2021-07-29 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2252.
--
Resolution: Fixed

Fixed via master branch: 8b19ec9ca07070a9819502e82091dd14d559ef94

> Replace read full data with read  latest commit data in flink stream read
> -
>
> Key: HUDI-2252
> URL: https://issues.apache.org/jira/browse/HUDI-2252
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Replace read full data with read latest commit data in flink stream read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1949) Refactor BucketAssigner to make it more efficient

2021-05-31 Thread Danny Chen (Jira)
Danny Chen created HUDI-1949:


 Summary: Refactor BucketAssigner to make it more efficient
 Key: HUDI-1949
 URL: https://issues.apache.org/jira/browse/HUDI-1949
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Add a process single class {{WriteProfile}}, the record and small files profile 
re-construction can be more efficient if we reuse by same checkpoint id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1952) Support hive3 meta sync for flink writer

2021-06-01 Thread Danny Chen (Jira)
Danny Chen created HUDI-1952:


 Summary: Support hive3 meta sync for flink writer
 Key: HUDI-1952
 URL: https://issues.apache.org/jira/browse/HUDI-1952
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1956) BucketAssignFunction use ValueState instead of MapState

2021-06-02 Thread Danny Chen (Jira)
Danny Chen created HUDI-1956:


 Summary: BucketAssignFunction use ValueState instead of MapState
 Key: HUDI-1956
 URL: https://issues.apache.org/jira/browse/HUDI-1956
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Use the value state to reduce the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1956) BucketAssignFunction use ValueState instead of MapState

2021-06-02 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-1956.

Resolution: Duplicate

> BucketAssignFunction use ValueState instead of MapState
> ---
>
> Key: HUDI-1956
> URL: https://issues.apache.org/jira/browse/HUDI-1956
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>
> Use the value state to reduce the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1961) Add a debezium json integration test case for flink

2021-06-03 Thread Danny Chen (Jira)
Danny Chen created HUDI-1961:


 Summary: Add a debezium json integration test case for flink
 Key: HUDI-1961
 URL: https://issues.apache.org/jira/browse/HUDI-1961
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1967) Fix the NPE for MOR Hive rt table query

2021-06-03 Thread Danny Chen (Jira)
Danny Chen created HUDI-1967:


 Summary: Fix the NPE for MOR Hive  rt table query
 Key: HUDI-1967
 URL: https://issues.apache.org/jira/browse/HUDI-1967
 Project: Apache Hudi
  Issue Type: Bug
  Components: Hive Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1967) Fix the NPE for MOR Hive rt table query

2021-06-03 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-1967:
-
Description: See the discussion here: 
https://github.com/apache/hudi/issues/2874

> Fix the NPE for MOR Hive  rt table query
> 
>
> Key: HUDI-1967
> URL: https://issues.apache.org/jira/browse/HUDI-1967
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>
> See the discussion here: https://github.com/apache/hudi/issues/2874



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1969) Support reading logs for MOR Hive rt table

2021-06-04 Thread Danny Chen (Jira)
Danny Chen created HUDI-1969:


 Summary: Support reading logs for MOR Hive rt table
 Key: HUDI-1969
 URL: https://issues.apache.org/jira/browse/HUDI-1969
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Hive Integration
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1756) Assigns the buckets by record key for Flink writer

2021-06-07 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-1756.
--
Resolution: Duplicate

> Assigns the buckets by record key for Flink writer
> --
>
> Key: HUDI-1756
> URL: https://issues.apache.org/jira/browse/HUDI-1756
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1819) Remove legacy code for Flink writer

2021-06-07 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-1819.

  Assignee: Danny Chen
Resolution: Duplicate

> Remove legacy code for Flink writer
> ---
>
> Key: HUDI-1819
> URL: https://issues.apache.org/jira/browse/HUDI-1819
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>
> Removes the useless code to avoid confusion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1857) Shade google guava for hudi-flink-bundle jar

2021-06-07 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-1857.

Resolution: Fixed

> Shade google guava for hudi-flink-bundle jar
> 
>
> Key: HUDI-1857
> URL: https://issues.apache.org/jira/browse/HUDI-1857
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>
> Shade out the google guava to avoid conflicts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1986) Skip creating marker files for flink merge handle

2021-06-07 Thread Danny Chen (Jira)
Danny Chen created HUDI-1986:


 Summary: Skip creating marker files for flink merge handle
 Key: HUDI-1986
 URL: https://issues.apache.org/jira/browse/HUDI-1986
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen


Skip creating the marker files for flink merge handle to make it more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1987) Fix non partition table hive meta sync for flink writer

2021-06-08 Thread Danny Chen (Jira)
Danny Chen created HUDI-1987:


 Summary: Fix non partition table hive meta sync for flink writer
 Key: HUDI-1987
 URL: https://issues.apache.org/jira/browse/HUDI-1987
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1992) Release the new records map for merge handle #close

2021-06-08 Thread Danny Chen (Jira)
Danny Chen created HUDI-1992:


 Summary: Release the new records map for merge handle #close
 Key: HUDI-1992
 URL: https://issues.apache.org/jira/browse/HUDI-1992
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Common Core
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


The new records map can be cleaned eagerly to save the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1994) [HUDI-1992] Release the new records iterator for append handle #close

2021-06-09 Thread Danny Chen (Jira)
Danny Chen created HUDI-1994:


 Summary: [HUDI-1992] Release the new records iterator for append 
handle #close
 Key: HUDI-1994
 URL: https://issues.apache.org/jira/browse/HUDI-1994
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Common Core
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


To save the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1994) Release the new records iterator for append handle #close

2021-06-09 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-1994:
-
Summary: Release the new records iterator for append handle #close  (was: 
[HUDI-1992] Release the new records iterator for append handle #close)

> Release the new records iterator for append handle #close
> -
>
> Key: HUDI-1994
> URL: https://issues.apache.org/jira/browse/HUDI-1994
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>
> To save the memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1999) Refresh the base file view cache for WriteProfile

2021-06-11 Thread Danny Chen (Jira)
Danny Chen created HUDI-1999:


 Summary: Refresh the base file view cache for WriteProfile
 Key: HUDI-1999
 URL: https://issues.apache.org/jira/browse/HUDI-1999
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Refresh the view to discover new small files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2015) Fix flink operator uid to allow multiple pipelines in one job

2021-06-14 Thread Danny Chen (Jira)
Danny Chen created HUDI-2015:


 Summary: Fix flink operator uid to allow multiple pipelines in one 
job
 Key: HUDI-2015
 URL: https://issues.apache.org/jira/browse/HUDI-2015
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Fix the uid conflicts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2030) Add metadata cache to WriteProfile to reduce IO

2021-06-15 Thread Danny Chen (Jira)
Danny Chen created HUDI-2030:


 Summary: Add metadata cache to WriteProfile to reduce IO
 Key: HUDI-2030
 URL: https://issues.apache.org/jira/browse/HUDI-2030
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Keeps same number of instant metadata cache and refresh the cache on new 
commits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2036) Move the compaction plan scheduling out of flink writer coordinator

2021-06-17 Thread Danny Chen (Jira)
Danny Chen created HUDI-2036:


 Summary: Move the compaction plan scheduling out of flink writer 
coordinator
 Key: HUDI-2036
 URL: https://issues.apache.org/jira/browse/HUDI-2036
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Since HUDI-1955 was fixed, we can move the scheduling out if the coordinator to 
make the coordinator more lightweight.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2037) Move the compaction plan scheduling out of flink writer coordinator

2021-06-17 Thread Danny Chen (Jira)
Danny Chen created HUDI-2037:


 Summary: Move the compaction plan scheduling out of flink writer 
coordinator
 Key: HUDI-2037
 URL: https://issues.apache.org/jira/browse/HUDI-2037
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0


Since HUDI-1955 was fixed, we can move the scheduling out if the coordinator to 
make the coordinator more lightweight.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2037) Move the compaction plan scheduling out of flink writer coordinator

2021-06-17 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-2037.

Resolution: Invalid

> Move the compaction plan scheduling out of flink writer coordinator
> ---
>
> Key: HUDI-2037
> URL: https://issues.apache.org/jira/browse/HUDI-2037
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>
> Since HUDI-1955 was fixed, we can move the scheduling out if the coordinator 
> to make the coordinator more lightweight.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2040) Make flink writer as exactly-once by default

2021-06-17 Thread Danny Chen (Jira)
Danny Chen created HUDI-2040:


 Summary: Make flink writer as exactly-once by default
 Key: HUDI-2040
 URL: https://issues.apache.org/jira/browse/HUDI-2040
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2048) HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws NPE for file group that has only logs

2021-06-21 Thread Danny Chen (Jira)
Danny Chen created HUDI-2048:


 Summary: HoodieRealtimeInputFormatUtils#groupLogsByBaseFile throws 
NPE for file group that has only logs
 Key: HUDI-2048
 URL: https://issues.apache.org/jira/browse/HUDI-2048
 Project: Apache Hudi
  Issue Type: Bug
  Components: Hive Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2049) StreamWriteFunction should wait for the next inflight instant time before flushing

2021-06-21 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-2049:


Assignee: Danny Chen

> StreamWriteFunction should wait for the  next inflight instant time before 
> flushing
> ---
>
> Key: HUDI-2049
> URL: https://issues.apache.org/jira/browse/HUDI-2049
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2049) StreamWriteFunction should wait for the next inflight instant time before flushing

2021-06-21 Thread Danny Chen (Jira)
Danny Chen created HUDI-2049:


 Summary: StreamWriteFunction should wait for the  next inflight 
instant time before flushing
 Key: HUDI-2049
 URL: https://issues.apache.org/jira/browse/HUDI-2049
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2049) StreamWriteFunction should wait for the next inflight instant time before flushing

2021-06-21 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-2049:
-
Fix Version/s: 0.9.0

> StreamWriteFunction should wait for the  next inflight instant time before 
> flushing
> ---
>
> Key: HUDI-2049
> URL: https://issues.apache.org/jira/browse/HUDI-2049
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2054) Remove the duplicate name for flink write pipeline

2021-06-22 Thread Danny Chen (Jira)
Danny Chen created HUDI-2054:


 Summary: Remove the duplicate name for flink write pipeline
 Key: HUDI-2054
 URL: https://issues.apache.org/jira/browse/HUDI-2054
 Project: Apache Hudi
  Issue Type: Task
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2067) Sync all the options of FlinkOptions to FlinkStreamerConfig

2021-06-23 Thread Danny Chen (Jira)
Danny Chen created HUDI-2067:


 Summary: Sync all the options of FlinkOptions to 
FlinkStreamerConfig
 Key: HUDI-2067
 URL: https://issues.apache.org/jira/browse/HUDI-2067
 Project: Apache Hudi
  Issue Type: Task
  Components: Flink Integration
Reporter: Danny Chen
 Fix For: 0.9.0


Sync the options so that the {{HoodieFlinkStreamer}} can have more config 
options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2068) Skip the assign state for SmallFileAssign when the state can not assign initially

2021-06-24 Thread Danny Chen (Jira)
Danny Chen created HUDI-2068:


 Summary: Skip the assign state for SmallFileAssign when the state 
can not assign initially
 Key: HUDI-2068
 URL: https://issues.apache.org/jira/browse/HUDI-2068
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2074) Use while loop instead of recursive call in MergeOnReadInputFormat#MergeIterator to avoid StackOverflow

2021-06-25 Thread Danny Chen (Jira)
Danny Chen created HUDI-2074:


 Summary: Use while loop instead of recursive call in 
MergeOnReadInputFormat#MergeIterator to avoid StackOverflow
 Key: HUDI-2074
 URL: https://issues.apache.org/jira/browse/HUDI-2074
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2084) Resend the uncommitted write metadata when start up

2021-06-27 Thread Danny Chen (Jira)
Danny Chen created HUDI-2084:


 Summary: Resend the uncommitted write metadata when start up
 Key: HUDI-2084
 URL: https://issues.apache.org/jira/browse/HUDI-2084
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2084) Resend the uncommitted write metadata when start up

2021-06-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2084.
--
Resolution: Fixed

Resolved via: 37b7c65d8a3ede00ae16909a06e31c24f179998c

> Resend the uncommitted write metadata when start up
> ---
>
> Key: HUDI-2084
> URL: https://issues.apache.org/jira/browse/HUDI-2084
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2094) Supports hive style partitioning for flink writer

2021-06-28 Thread Danny Chen (Jira)
Danny Chen created HUDI-2094:


 Summary: Supports hive style partitioning for flink writer
 Key: HUDI-2094
 URL: https://issues.apache.org/jira/browse/HUDI-2094
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2112) Support reading pure logs file group for flink batch reader

2021-07-01 Thread Danny Chen (Jira)
Danny Chen created HUDI-2112:


 Summary: Support reading pure logs file group for flink batch 
reader
 Key: HUDI-2112
 URL: https://issues.apache.org/jira/browse/HUDI-2112
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2112) Support reading pure logs file group for flink batch reader after compaction

2021-07-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-2112:
-
Summary: Support reading pure logs file group for flink batch reader after 
compaction  (was: Support reading pure logs file group for flink batch reader)

> Support reading pure logs file group for flink batch reader after compaction
> 
>
> Key: HUDI-2112
> URL: https://issues.apache.org/jira/browse/HUDI-2112
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2121) Add operator uid for flink stateful operators

2021-07-01 Thread Danny Chen (Jira)
Danny Chen created HUDI-2121:


 Summary: Add operator uid for flink stateful operators
 Key: HUDI-2121
 URL: https://issues.apache.org/jira/browse/HUDI-2121
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2126) The coordinator send events to write function when there are no data for the checkpoint

2021-07-03 Thread Danny Chen (Jira)
Danny Chen created HUDI-2126:


 Summary: The coordinator send events to write function when there 
are no data for the checkpoint
 Key: HUDI-2126
 URL: https://issues.apache.org/jira/browse/HUDI-2126
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2129) StreamerUtil.medianInstantTime should return a valid date time string

2021-07-04 Thread Danny Chen (Jira)
Danny Chen created HUDI-2129:


 Summary: StreamerUtil.medianInstantTime should return a valid date 
time string
 Key: HUDI-2129
 URL: https://issues.apache.org/jira/browse/HUDI-2129
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >