[jira] [Created] (FLINK-35774) The cache of transform is not updated after process schema change event

2024-07-05 Thread Wenkai Qi (Jira)
Wenkai Qi created FLINK-35774:
-

 Summary: The cache of transform is not updated after process 
schema change event
 Key: FLINK-35774
 URL: https://issues.apache.org/jira/browse/FLINK-35774
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: cdc-3.1.1
Reporter: Wenkai Qi


The cache of transform is not updated after process schema change event.

For example, when add column event, tableInfo is not updated in TransformSchema 
and TransformData.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Potential Kafka Connector FLIP: Large Message Handling

2024-07-05 Thread Kevin Lam
Hi all,

Writing to see if the community would be open to exploring a FLIP for the
Kafka Table Connectors. The FLIP would allow for storing Kafka Messages
beyond a Kafka cluster's message limit (1 MB by default) out of band in
cloud object storage or another backend.

During serialization the message would be replaced with a reference, and
during deserialization the reference would be used to fetch the large
message and pass it to Flink. Something like Option 1 in this blog post

.

What do you think?

We can make it generic by allowing users to implement their own
LargeMessageSerializer/Deserializer interface for serializing and
deserializing and handling interactions with object storage or some other
backend.

The Kafka Connectors can be extended to support ConfigOptions to
specify the class to load, as well as some user-specified properties. For
example: `large-record-handling.class` and `
large-record-handling.properties.*` (where the user can specify any
properties similar to how the Kafka Consumer and Producer properties are
handled

).

In terms of call sites for the LargeMessage handling, I think we can
consider inside of DynamicKafkaDeserializationSchema

and DynamicKafkaRecordSerializationSchema
,
where the ConsumerRecord

and ProducerRecords

are passed respectively.

If there's interest, I would be happy to help flesh out the proposal more.


[jira] [Created] (FLINK-35773) Document s5cmd

2024-07-05 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-35773:
--

 Summary: Document s5cmd
 Key: FLINK-35773
 URL: https://issues.apache.org/jira/browse/FLINK-35773
 Project: Flink
  Issue Type: Sub-task
Reporter: Piotr Nowojski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35772) Deprecate/remove DuplicatingFileSystem

2024-07-05 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-35772:
--

 Summary: Deprecate/remove DuplicatingFileSystem
 Key: FLINK-35772
 URL: https://issues.apache.org/jira/browse/FLINK-35772
 Project: Flink
  Issue Type: Sub-task
Reporter: Piotr Nowojski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35771) Limit s5cmd resource usage

2024-07-05 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-35771:
--

 Summary: Limit s5cmd resource usage
 Key: FLINK-35771
 URL: https://issues.apache.org/jira/browse/FLINK-35771
 Project: Flink
  Issue Type: Sub-task
Reporter: Piotr Nowojski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35770) Interrupt s5cmd call on cancellation

2024-07-05 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-35770:
--

 Summary: Interrupt s5cmd call on cancellation 
 Key: FLINK-35770
 URL: https://issues.apache.org/jira/browse/FLINK-35770
 Project: Flink
  Issue Type: Sub-task
Reporter: Piotr Nowojski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35769) State files might not be deleted on task cancellation

2024-07-05 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-35769:
-

 Summary: State files might not be deleted on task cancellation
 Key: FLINK-35769
 URL: https://issues.apache.org/jira/browse/FLINK-35769
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.19.1
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.20.0


We have a job in an infinite (fast) restart loop, that’s crashing with a 
serialization issue.
The issue here is that each restart seems to leak state files (not cleaning up 
ones from the previous run):

{{/tmp/tm_10.56.9.147:6122-c560c5/tmp $ ls | grep KeyedProcessOperator | wc -l
7990}}
{{/tmp/tm_10.56.9.147:6122-c560c5/tmp $ ls | grep StreamingJoinOperator | wc -l
689}}
Eventually TM will use too much disk space.

 

The problem is in 
[https://github.com/apache/flink/blob/64f745a5b1fc14a2cba1ddd977ab8e8db9cf45a4/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBStateDownloader.java#L75]
{code:java}
try {
            List> futures =
                    transferAllStateDataToDirectoryAsync(downloadRequests, 
internalCloser)
                            .collect(Collectors.toList());
            // Wait until either all futures completed successfully or one 
failed exceptionally.
            FutureUtils.completeAll(futures).get();
        } catch (Exception e) {
            downloadRequests.stream()
                    .map(StateHandleDownloadSpec::getDownloadDestination)
                    .map(Path::toFile)
                    .forEach(FileUtils::deleteDirectoryQuietly); {code}
Where {{FileUtils::deleteDirectoryQuietly}} will list the files and delete them.
But if {{completeAll}} is interrupted, then download runnable might re-create 
it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [2.0] How to handle on-going feature development in Flink 2.0?

2024-07-05 Thread Matthias Pohl
Thanks for the feedback. I went ahead and added a list for new features [1]
to the end of the 2.0 release page. That enables us to document such
changes.
I hope that's ok for the 2.0 release manager. Feel free to revert my
changes if you have anything else in mind.

Best,
Matthias

[1]
https://cwiki.apache.org/confluence/display/FLINK/2.0+Release#id-2.0Release-NewFeatures

On Wed, Jun 26, 2024 at 11:09 AM Zakelly Lan  wrote:

> +1 for a preview before the formal release. It would help us find issues in
> advance.
>
>
> Best,
> Zakelly
>
> On Wed, Jun 26, 2024 at 4:44 PM Jingsong Li 
> wrote:
>
> > +1 to release a preview version.
> >
> > Best,
> > Jingsong
> >
> > On Wed, Jun 26, 2024 at 10:12 AM Jark Wu  wrote:
> > >
> > > I also think this should not block new feature development.
> > > Having "nice-to-have" and "must-to-have" tags on the FLIPs is a good
> > idea.
> > >
> > > For the downstream projects, I think we need to release a 2.0 preview
> > > version one or
> > > two months before the formal release. This can leave some time for the
> > > downstream
> > > projects to integrate and provide feedback. So we can fix the problems
> > > (e.g. unexpected
> > > breaking changes, Java versions) before 2.0.
> > >
> > > Best,
> > > Jark
> > >
> > > On Wed, 26 Jun 2024 at 09:39, Xintong Song 
> > wrote:
> > >
> > > > I also don't think we should block new feature development until 2.0.
> > From
> > > > my understanding, the new major release is no different from the
> > regular
> > > > minor releases for new features.
> > > >
> > > > I think tracking new features, either as nice-to-have items or in a
> > > > separate list, is necessary. It helps us understand what's going on
> in
> > the
> > > > release cycle, and what to announce and promote. Maybe we should
> start
> > a
> > > > discussion on updating the 2.0 item list, to 1) collect new items
> that
> > are
> > > > proposed / initiated after the original list being created and 2) to
> > remove
> > > > some items that are no longer suitable. I'll discuss this with the
> > other
> > > > release managers first.
> > > >
> > > > For the connectors and operators, I think it depends on whether they
> > depend
> > > > on any deprecated APIs or internal implementations of Flink. Ideally,
> > > > all @Public APIs and @PublicEvolving APIs that we plan to change /
> > remove
> > > > should have been deprecated in 1.19 and 1.20 respectively. That means
> > if
> > > > the connectors and operators only use non-deprecated @Puclib
> > > > and @PublicEvolving APIs in 1.20, hopefully there should not be any
> > > > problems upgrading to 2.0.
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > >
> > > > On Wed, Jun 26, 2024 at 5:20 AM Becket Qin 
> > wrote:
> > > >
> > > > > Thanks for the question, Matthias.
> > > > >
> > > > > My two cents, I don't think we are blocking new feature
> development.
> > My
> > > > > understanding is that the community will just prioritize removing
> > > > > deprecated APIs in the 2.0 dev cycle. Because of that, it is
> possible
> > > > that
> > > > > some new feature development may slow down a little bit since some
> > > > > contributors may be working on the must-have features for 2.0. But
> > policy
> > > > > wise, I don't see a reason to block the new feature development for
> > the
> > > > 2.0
> > > > > release feature plan[1].
> > > > >
> > > > > Process wise, I like your idea of adding the new features as
> > nice-to-have
> > > > > in the 2.0 feature list.
> > > > >
> > > > > Re: David,
> > > > > Given it is a major version bump. It is possible that some of the
> > > > > downstream projects (e.g. connectors, Paimon, etc) will have to see
> > if a
> > > > > major version bump is also needed there. And it is probably going
> to
> > be
> > > > > decisions made on a per-project basis.
> > > > > Regarding the Java version specifically, this probably worth a
> > separate
> > > > > discussion. According to a recent report[2] on the state of Java,
> it
> > > > might
> > > > > be a little early to drop support for Java 11. We can discuss this
> > > > > separately.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > [1] https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
> > > > > [2]
> > > > >
> > > > >
> > > >
> >
> https://newrelic.com/sites/default/files/2024-04/new-relic-state-of-the-java-ecosystem-report-2024-04-30.pdf
> > > > >
> > > > > On Tue, Jun 25, 2024 at 4:58 AM David Radley <
> > david_rad...@uk.ibm.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > > I think this is a great question. I am not sure if this has been
> > > > covered
> > > > > > elsewhere, but it would be good to be clear how this effects the
> > > > > connectors
> > > > > > and operator repos, with potentially v1 and v2 oriented new
> > featuresI
> > > > > > suspect this will be a connector by connector investigation. I am
> > > > > thinking
> > > > > > connectors with Hadoop eco-system 

[jira] [Created] (FLINK-35768) Use native file copy in RocksDBStateDownloader

2024-07-05 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-35768:
--

 Summary: Use native file copy in RocksDBStateDownloader
 Key: FLINK-35768
 URL: https://issues.apache.org/jira/browse/FLINK-35768
 Project: Flink
  Issue Type: Sub-task
Reporter: Piotr Nowojski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35767) Provide native file copy support for S3 using s5cmd

2024-07-05 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-35767:
--

 Summary: Provide native file copy support for S3 using s5cmd
 Key: FLINK-35767
 URL: https://issues.apache.org/jira/browse/FLINK-35767
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / FileSystem
Reporter: Piotr Nowojski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35766) When the job contains many YieldingOperatorFactory instances, compiling the JobGraph hangs

2024-07-05 Thread Junrui Li (Jira)
Junrui Li created FLINK-35766:
-

 Summary: When the job contains many YieldingOperatorFactory 
instances, compiling the JobGraph hangs
 Key: FLINK-35766
 URL: https://issues.apache.org/jira/browse/FLINK-35766
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Reporter: Junrui Li


When a job contains YieldingOperatorFactory instances, the time complexity of 
compiling the JobGraph is very high (with a complexity of O(N!)). This leads to 
the job compilation hanging on creating chains when there are many 
YieldingOperatorFactory instances (e.g., more than 30).

This is a very rare bug, but we have users who use SQL that contains many 
LookupJoins that use YieldingOperatorFactory in the production environment. A 
simple reproducible case is as follows:
{code:java}
@Test
void test() {
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createLocalEnvironment(1);
env.fromSource(
new NumberSequenceSource(0, 10), 
WatermarkStrategy.noWatermarks(), "input")
.map((x) -> x)
// add 32 YieldingOperatorFactory
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(
"test", BasicTypeInfo.LONG_TYPE_INFO, new 
YieldingTestOperatorFactory<>())
.transform(

[jira] [Created] (FLINK-35765) Support Fine Grained Resource Specifications for Adaptive Scheduler

2024-07-05 Thread RocMarshal (Jira)
RocMarshal created FLINK-35765:
--

 Summary: Support Fine Grained Resource Specifications for Adaptive 
Scheduler
 Key: FLINK-35765
 URL: https://issues.apache.org/jira/browse/FLINK-35765
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Reporter: RocMarshal






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35764) TimerGauge is incorrect when update is called during a measurement

2024-07-05 Thread Liu Liu (Jira)
Liu Liu created FLINK-35764:
---

 Summary: TimerGauge is incorrect when update is called during a 
measurement
 Key: FLINK-35764
 URL: https://issues.apache.org/jira/browse/FLINK-35764
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics
Affects Versions: 1.19.0, 1.18.0, 1.17.0, 1.16.0, 1.15.0
Reporter: Liu Liu


Currently in {{{}TimerGauge{}}}, the {{currentMeasurement}} in {{markEnd}} is 
incorrectly set to the time since the last {{{}markStart{}}}. When calling 
{{{}markStart -> update -> markEnd{}}}, this will result in the time between 
{{markStart}} and {{update}} being counted twice. A piece of test code that 
reflects this scenario:
{code:java}
@Test  
void testUpdateBeforeMarkingEnd() {  
ManualClock clock = new ManualClock(42_000_000);
// time span = 2 intervals
TimerGauge gauge = new TimerGauge(clock, 2 * View.UPDATE_INTERVAL_SECONDS); 
 

// the event spans 2 intervals
// interval 1
gauge.markStart();  
clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  
gauge.update();
// interval 2
clock.advanceTime(SLEEP, TimeUnit.MILLISECONDS);  
gauge.markEnd();  
gauge.update();  

// expected: 2, actual: 3
assertThat(gauge.getValue()).isEqualTo(SLEEP / 
View.UPDATE_INTERVAL_SECONDS);  
}
{code}
Proposed changes:
 # Modify {{markEnd}} so that updates to {{currentCount}} and 
{{accumulatedCount}} resembles those in {{{}update{}}}.
 # Add the test case to {{{}TimeGaugeTest{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35763) Bump Java version

2024-07-05 Thread 911432 (Jira)
911432 created FLINK-35763:
--

 Summary: Bump Java version
 Key: FLINK-35763
 URL: https://issues.apache.org/jira/browse/FLINK-35763
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.20.0
Reporter: 911432


java8 is displayed as [document] 
(https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/java_compatibility/)
 is not available in flink1.15).


However, java8 is still available on flink 1.20 at the moment.
[#1] 
(https://github.com/apache/flink/blob/6b08eb15d09d6c8debb609b8b3a2a241ca855155/tools/azure-pipelines/build-apache-repo.yml#L75),
 )
[#2] 
(https://github.com/apache/flink/blob/6b08eb15d09d6c8debb609b8b3a2a241ca855155/tools/azure-pipelines/build-apache-repo.yml#L108),
 )
[#3] 
(https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release) )
I hope you can fix this part.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: [DISCUSS] FLIP-466: Introduce ProcessFunction Attribute in DataStream API V2

2024-07-05 Thread Yuxin Tan
Hi, Wencong,
Thanks for driving the FLIP.

+1 for this FLIP.

I believe these hints will improve the performance in many use cases.
I only have a minor question about the Idempotence annotation. When
this annotation is added, how does StreamTask optimize the frequency?
Does it ensure a single output, or does it merely reduce the frequency
of the outputs?

Best,
Yuxin


Wencong Liu  于2024年7月1日周一 16:39写道:

> Hi, Jeyhun,
> Thanks for the reply.
>
>
> > Integrate these annotations with the execution plan.
> I believe DataStream is an Imperative API, which means
> that the actual execution plan is basically consistent with
> the computational logic expressed by the user with DataStream,
> and it is different from SQL, so the significance of supporting
> getExecutionPlan in the short term may not be great. If it is to
> be supported later, it is possible to consider the impact of Hints.
>
>
> > Check for misuse of attributes or ignore it.
> For illegal use (annotated on the inappropriate ProcessFunction),
> an exception will be thrown. For legal use, the framework can also
> choose to ignore it.
>
>
> > A framework to include attributes.
> Yes, we will provide a basic framework in the implementation
> to help developers for extension.
>
>
> Best,
> Wencong
>
>
> At 2024-06-28 02:06:37, "Jeyhun Karimov"  wrote:
> >Hi Wencong,
> >
> >Thanks for the FLIP. +1 for it.
> >
> >Providing hints to users will enable more optimization potential for DSv2.
> >I have a few questions.
> >
> >I think currently, DSv2 ExecutionEnvironment does not support getting
> >execution plan (getExecutionPlan()).
> >Do you plan to integrate these annotations with the execution plan?
> >
> >Any plans to check for misuse of attributes? Or any plans for a framework
> >to implicitly include attributes?
> >
> >Also, now that we make analogy with SQL hints, SQL query planners usually
> >ignore wrong hints and continue with its best plan.
> >Do we want to consider this approach? Or should we throw exception
> whenever
> >the hint (attribute in this case) is wrong?
> >
> >
> >Regards,
> >Jeyhun
> >
> >
> >On Thu, Jun 27, 2024 at 7:47 AM Xintong Song 
> wrote:
> >
> >> +1 for this FLIP.
> >>
> >> I think this is similar to SQL hints, where users can provide optional
> >> information to help the engine execute the workload more efficiently.
> >> Having a unified mechanism for such kind of hints should improve
> usability
> >> compared to introducing tons of configuration knobs.
> >>
> >> Best,
> >>
> >> Xintong
> >>
> >>
> >>
> >> On Wed, Jun 26, 2024 at 8:09 PM Wencong Liu 
> wrote:
> >>
> >> > Hi devs,
> >> >
> >> >
> >> > I'm proposing a new FLIP[1] to introduce the ProcessFunction
> Attribute in
> >> > the
> >> > DataStream API V2. The goal is to optimize job execution by leveraging
> >> > additional information about users' ProcessFunction logic. The
> proposal
> >> > includes
> >> > the following scenarios where the ProcessFunction Attribute can
> >> > significantly
> >> > enhance optimization:
> >> >
> >> >
> >> > Scenario 1: If the framework recognizes that the ProcessFunction
> outputs
> >> > data
> >> > only after all input is received, the downstream operators can be
> >> > scheduled until
> >> > the ProcessFunction is finished, which effectively reduces resource
> >> > consumption.
> >> > Ignoring this information could lead to premature scheduling of
> >> downstream
> >> > operators with no data to process. This scenario is addressed and
> >> > optimized by FLIP-331[2].
> >> >
> >> >
> >> > Scenario 2: For stream processing, where users are only interested in
> the
> >> > latest
> >> > result per key at the current time, the framework can optimize by
> >> > adjusting the
> >> > frequency of ProcessFunction outputs. This reduces shuffle data volume
> >> and
> >> > downstream operator workload. If this optimization is ignored, each
> new
> >> > input
> >> > would trigger a new output. This scenario is addressed and
> >> > optimized by FLIP-365[3].
> >> >
> >> >
> >> > Scenario 3: If a user's ProcessFunction neither caches inputs nor
> >> outputs,
> >> > recognizing this can enable object reuse for this data within the
> >> > OperatorChain,
> >> > enhancing performance. Without this optimization, data would be copied
> >> > before
> >> > being passed to the next operator. This scenario is addressed and
> >> > optimized by FLIP-329[4].
> >> >
> >> >
> >> > To unify the mechanism for utilizing additional information and
> >> optimizing
> >> > jobs,
> >> > we propose introducing the ProcessFunction Attribute represented by
> >> > Java annotations, which allow users to provide relevant information
> about
> >> > their
> >> > ProcessFunctions. The framework can then use this to optimize job
> >> > execution.
> >> > Importantly, regular job execution remains unaffected whether users
> use
> >> > this
> >> > attribute or not.
> >> >
> >> >
> >> > Looking forward to discussing this in the upcoming FLIP.
> >> >
> >> >
> >> > Best regards,
> >> > 

[jira] [Created] (FLINK-35762) Cache hashCode for immutable map key classes

2024-07-05 Thread yux (Jira)
yux created FLINK-35762:
---

 Summary: Cache hashCode for immutable map key classes
 Key: FLINK-35762
 URL: https://issues.apache.org/jira/browse/FLINK-35762
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: yux


As suggested by [~kunni], hash code caching is a common optimization in Java 
world (for example, java.lang.String uses such optimization to reduce duplicate 
hash calculation since String as a hashmap key is quite common). Such 
optimization could be applied in CDC to optimize execution performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)