Re:[VOTE] FLIP-444: Native file copy support

2024-06-27 Thread Feifan Wang
+1 (non binding)




——

Best regards,

Feifan Wang




At 2024-06-25 16:58:22, "Piotr Nowojski"  wrote:
>Hi all,
>
>I would like to start a vote for the FLIP-444 [1]. The discussion thread is
>here [2].
>
>The vote will be open for at least 72.
>
>Best,
>Piotrek
>
>[1] https://cwiki.apache.org/confluence/x/rAn9EQ
>[2] https://lists.apache.org/thread/lkwmyjt2bnmvgx4qpp82rldwmtd4516c


[jira] [Created] (FLINK-35719) [Source]NullPointerException cause by RecordsWithSplitIds.finishedSplits()

2024-06-27 Thread zyh (Jira)
zyh created FLINK-35719:
---

 Summary: [Source]NullPointerException cause by 
RecordsWithSplitIds.finishedSplits()  
 Key: FLINK-35719
 URL: https://issues.apache.org/jira/browse/FLINK-35719
 Project: Flink
  Issue Type: Bug
  Components: Connectors / JDBC
Affects Versions: 1.17.2
Reporter: zyh
 Attachments: image-2024-06-28-11-59-13-193.png, 
image-2024-06-28-12-02-21-362.png

RecordsWithSplitIds.finishedSplits()  return a collection that only has a null 
object.

Then the whole flink job failed and report NullpointerException.

Look at the picture below:

1. the first pic tells that the split add success

2. the second pic tells that the finish split was a null object

!image-2024-06-28-11-59-13-193.png!

!image-2024-06-28-12-02-21-362.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-444: Native file copy support

2024-06-27 Thread yue ma
+1 (non-binding)

Piotr Nowojski  于2024年6月25日周二 16:59写道:

> Hi all,
>
> I would like to start a vote for the FLIP-444 [1]. The discussion thread is
> here [2].
>
> The vote will be open for at least 72.
>
> Best,
> Piotrek
>
> [1] https://cwiki.apache.org/confluence/x/rAn9EQ
> [2] https://lists.apache.org/thread/lkwmyjt2bnmvgx4qpp82rldwmtd4516c
>


-- 
Best,
Yue


[jira] [Created] (FLINK-35718) Cherrypick DBZ-5333 to fix frequently failover by EOFException.

2024-06-27 Thread LvYanquan (Jira)
LvYanquan created FLINK-35718:
-

 Summary: Cherrypick DBZ-5333 to fix frequently failover by 
EOFException.
 Key: FLINK-35718
 URL: https://issues.apache.org/jira/browse/FLINK-35718
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: LvYanquan
 Fix For: cdc-3.2.0


This EOFException issue should happen occasionally, and Debezium provided a 
retry mechanism to avoid frequently failover.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35717) Allow DISTRIBUTED BY in CREATE TABLE AS (CTAS)

2024-06-27 Thread Jira
Sergio Peña created FLINK-35717:
---

 Summary: Allow DISTRIBUTED BY in CREATE TABLE AS (CTAS)
 Key: FLINK-35717
 URL: https://issues.apache.org/jira/browse/FLINK-35717
 Project: Flink
  Issue Type: Sub-task
Reporter: Sergio Peña






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-466: Introduce ProcessFunction Attribute in DataStream API V2

2024-06-27 Thread Jeyhun Karimov
Hi Wencong,

Thanks for the FLIP. +1 for it.

Providing hints to users will enable more optimization potential for DSv2.
I have a few questions.

I think currently, DSv2 ExecutionEnvironment does not support getting
execution plan (getExecutionPlan()).
Do you plan to integrate these annotations with the execution plan?

Any plans to check for misuse of attributes? Or any plans for a framework
to implicitly include attributes?

Also, now that we make analogy with SQL hints, SQL query planners usually
ignore wrong hints and continue with its best plan.
Do we want to consider this approach? Or should we throw exception whenever
the hint (attribute in this case) is wrong?


Regards,
Jeyhun


On Thu, Jun 27, 2024 at 7:47 AM Xintong Song  wrote:

> +1 for this FLIP.
>
> I think this is similar to SQL hints, where users can provide optional
> information to help the engine execute the workload more efficiently.
> Having a unified mechanism for such kind of hints should improve usability
> compared to introducing tons of configuration knobs.
>
> Best,
>
> Xintong
>
>
>
> On Wed, Jun 26, 2024 at 8:09 PM Wencong Liu  wrote:
>
> > Hi devs,
> >
> >
> > I'm proposing a new FLIP[1] to introduce the ProcessFunction Attribute in
> > the
> > DataStream API V2. The goal is to optimize job execution by leveraging
> > additional information about users' ProcessFunction logic. The proposal
> > includes
> > the following scenarios where the ProcessFunction Attribute can
> > significantly
> > enhance optimization:
> >
> >
> > Scenario 1: If the framework recognizes that the ProcessFunction outputs
> > data
> > only after all input is received, the downstream operators can be
> > scheduled until
> > the ProcessFunction is finished, which effectively reduces resource
> > consumption.
> > Ignoring this information could lead to premature scheduling of
> downstream
> > operators with no data to process. This scenario is addressed and
> > optimized by FLIP-331[2].
> >
> >
> > Scenario 2: For stream processing, where users are only interested in the
> > latest
> > result per key at the current time, the framework can optimize by
> > adjusting the
> > frequency of ProcessFunction outputs. This reduces shuffle data volume
> and
> > downstream operator workload. If this optimization is ignored, each new
> > input
> > would trigger a new output. This scenario is addressed and
> > optimized by FLIP-365[3].
> >
> >
> > Scenario 3: If a user's ProcessFunction neither caches inputs nor
> outputs,
> > recognizing this can enable object reuse for this data within the
> > OperatorChain,
> > enhancing performance. Without this optimization, data would be copied
> > before
> > being passed to the next operator. This scenario is addressed and
> > optimized by FLIP-329[4].
> >
> >
> > To unify the mechanism for utilizing additional information and
> optimizing
> > jobs,
> > we propose introducing the ProcessFunction Attribute represented by
> > Java annotations, which allow users to provide relevant information about
> > their
> > ProcessFunctions. The framework can then use this to optimize job
> > execution.
> > Importantly, regular job execution remains unaffected whether users use
> > this
> > attribute or not.
> >
> >
> > Looking forward to discussing this in the upcoming FLIP.
> >
> >
> > Best regards,
> > Wencong Liu
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-466%3A+Introduce+ProcessFunction+Attribute+in+DataStream+API+V2
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamTrigger+and+isOutputOnlyAfterEndOfStream+operator+attribute+to+optimize+task+deployment
> > [3]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-365%3A+Introduce+flush+interval+to+adjust+the+interval+of+emitting+results+with+idempotent+semantics
> > [4]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-329%3A+Add+operator+attribute+to+specify+support+for+object-reuse
>


[jira] [Created] (FLINK-35716) Fix use of AsyncScalarFunction in join conditions

2024-06-27 Thread Alan Sheinberg (Jira)
Alan Sheinberg created FLINK-35716:
--

 Summary: Fix use of AsyncScalarFunction in join conditions
 Key: FLINK-35716
 URL: https://issues.apache.org/jira/browse/FLINK-35716
 Project: Flink
  Issue Type: Sub-task
Reporter: Alan Sheinberg






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Question about v2 committer guarantees

2024-06-27 Thread Scott Sandre
Thanks for the answers, Fabian! To follow up:

*1.* SGTM, thanks!

*2.* << If some of the RPCs are delayed or do not reach the manager, the
Committer will accumulate committables from multiple checkpoints. >>
So does that mean it's possible for the Commiter to commit Committables for
checkpointIds=[2], then checkpointId=[1], then checkpointIds=[3, 4] ?
That is:
  - (*2a*) the committable checkpointIds can be out of order, and
  - (*2b*) there can be committables for multiple checkpointIds in a given
Committer::commit call?

*3.* Imagine this scenario: the SinkWrinters start writing parquet files.
The latest Delta table version they are aware of is version 10. The
SinkWriters create committables, and when the Committer tries to commit
them, it sees that the latest Delta table version is now 15 *and* someone
enabled Column Mapping at version 12. That means that, at and after version
12, all parquet files must be written using the physical column names, not
the logical column names. This means that the parquet files written by our
SinkWriters and referenced by the committables are *invalid*. In Spark, we
would fail the commit and would allow the writers to rebase, re-write the
data, and re-attempt the commit. I don't think this is possible in Flink?

*4.* << Can you explain more about the relation between your questions and
using the SupportsPreCommitTopology::addPreCommitTopology? It sounds like
you are not planning to use the Committer but a custom globalCommitter. >>

Yup, we do want a single committer (also called a global committer). Having
a preCommitTopology that routes all the committables using `.global()` will
cause all the committables to go to a single (global) committer, correct?
(And yes, if there is job parallelism of N, then there will be N-1
committers instantiated but never called).

The V1-to-V2 Sink Adapter maps the V1GlobalCommitter to a
V2-*post*-commit-topology
operator. When starting fresh with the V2 Sink API, this
post-commit-topology doesn't seem the most natural place to implement the
single committer. Further, we may want to do other optional optimizations
in the post-commit-topology. Having the commit done in the Committer and
optional optimizations done in a post-commit seems like the best choice.

Let me know what you think!

Thanks!

Scott

On Thu, Jun 27, 2024 at 2:52 AM Fabian Paul  wrote:

> Hi Scott,
>
> It's great to see further adoption of the Sink V2 architecture.
> Happy to answer your questions.
>
> 1. The sink architecture should ensure that Committer:commit is always
> called with all committables from a subtask for a given subtaskId.There is
> an open issue where users have reported a problem with that assumption [1]
> but we haven't been able to track the problem down. Behind the scenes, if
> you use a SinkWriter->Committer topology, the Committables are transferred
> via the checkpoint barrier channel to the committer (not transferring the
> committables results in failing the checkpoint) and checkpointed by the
> committer before Committer:commit is called. This means when
> Committer:commit is called, it reads the comittables from the local state.
>
> 2. Committer::commit is called on what we call in Flink
> notifyCheckpointComplete which is based on a RPC call that the Jobmanager
> makes to all Taskmanagers when a checkpoint is finished. There is no
> guarantee when or if this will be called, but eventually. If some of the
> RPCs are delayed or do not reach the manager, the Committer will accumulate
> committables from multiple checkpoints.
>
> 3. I am not sure I fully understand that point. I see two different
> requirements. First, you could skip a committable if you do not want to
> commit it, which you could do with calling
> CommitRequest::signalAlreadyCommitted [2]. It's not the primary purpose of
> the method, but it should suffice. The second point is having a
> communication mechanism between SinkWriter and Committer, which at the
> moment does not exist. I would love to hear more details about why the
> rewrite is necessary maybe we can model the sink differently to achieve
> that requirement.
>
> Can you explain more about the relation between your questions and using
> the SupportsPreCommitTopology::addPreCommitTopology? It sounds like you are
> not planning to use the Committer but a custom globalCommitter.
>
> Best,
> Fabian
>
> [1] https://issues.apache.org/jira/browse/FLINK-25920
> [2]
>
> https://github.com/apache/flink/blob/7fc3aac774f5deb9b48727ba5f916c78085b49b9/flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java#L100
>


-- 
[image: email_signature_logo_sm]
*Scott Sandre*
*Sr. Software Engineer*
*Delta Ecosystem Team*
*scott.san...@databricks.com *


Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-27 Thread Ahmed Hamdy
Congratulations,
Thanks for the efforts
Best Regards
Ahmed Hamdy


On Thu, 20 Jun 2024 at 12:57, Muhammet Orazov 
wrote:

> Great, thanks Qingsheng for your efforts!
>
> Best,
> Muhammet
>
> On 2024-06-18 15:50, Qingsheng Ren wrote:
> > The Apache Flink community is very happy to announce the release of
> > Apache
> > Flink CDC 3.1.1.
> >
> > Apache Flink CDC is a distributed data integration tool for real time
> > data
> > and batch data, bringing the simplicity and elegance of data
> > integration
> > via YAML to describe the data movement and transformation in a data
> > pipeline.
> >
> > Please check out the release blog post for an overview of the release:
> >
> https://flink.apache.org/2024/06/18/apache-flink-cdc-3.1.1-release-announcement/
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Maven artifacts for Flink CDC can be found at:
> > https://search.maven.org/search?q=g:org.apache.flink%20cdc
> >
> > The full release notes are available in Jira:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12354763
> >
> > We would like to thank all contributors of the Apache Flink community
> > who
> > made this release possible!
> >
> > Regards,
> > Qingsheng Ren
>


Re: [VOTE] FLIP-444: Native file copy support

2024-06-27 Thread Ahmed Hamdy
+1 (non-binding)

Best Regards
Ahmed Hamdy


On Thu, 27 Jun 2024 at 17:29, Martijn Visser 
wrote:

> +1 (binding)
>
> On Thu, Jun 27, 2024 at 3:32 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> > +1(binding)
> >
> > Best,
> > Rui
> >
> > On Wed, Jun 26, 2024 at 10:22 PM Stefan Richter
> >  wrote:
> >
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Stefan
> > >
> > >
> > >
> > > > On 26. Jun 2024, at 16:14, Hong  wrote:
> > > >
> > > > +1 (binding)
> > > >
> > > > Hong
> > > >
> > > >> On 26 Jun 2024, at 12:27, Keith Lee  > > > wrote:
> > > >>
> > > >> +1 (non binding)
> > > >>
> > > >> Best regards
> > > >> Keith Lee
> > > >>
> > > >>
> > > >>> On Wed, Jun 26, 2024 at 9:48 AM Zakelly Lan  > > > wrote:
> > > >>> +1 (binding)
> > > >>> Best,
> > > >>> Zakelly
> > >  On Wed, Jun 26, 2024 at 3:54 PM Yuepeng Pan <
> panyuep...@apache.org
> > > > wrote:
> > >  +1 (non-binding)
> > >  Best regards,
> > >  Yuepeng Pan
> > >  At 2024-06-26 15:27:17, "Piotr Nowojski"  > > > wrote:
> > > > Thanks for pointing this out Zakelly. After the discussion on the
> > dev
> > > > mailing list, I have updated the `PathsCopyingFileSystem` to
> merge
> > > its
> > > > functionalities with `DuplicatingFileSystem`, but I've just
> > > forgotten to
> > > > mention that it will removed/replaced with
> > `PathsCopyingFileSystem`.
> > > > Vote can be resumed.
> > > > Best,
> > > > Piotrek
> > > > wt., 25 cze 2024 o 18:57 Piotr Nowojski 
> > >  napisał(a):
> > > >> Ops, I must have forgotten to update the FLIP as we discussed. I
> > > will
> > >  fix
> > > >> it tomorrow and the vote period will be extended.
> > > >> Best,
> > > >> Piotrek
> > > >> wt., 25 cze 2024 o 13:56 Zakelly Lan 
> > >  napisał(a):
> > > >>> Hi Piotrek,
> > > >>> I don't see any statement about removing or renaming the
> > > >>> `DuplicatingFileSystem` in the FLIP, shall we do that as
> > mentioned
> > > in
> > >  the
> > > >>> discussion thread?
> > > >>> Best,
> > > >>> Zakelly
> > > >>> On Tue, Jun 25, 2024 at 4:58 PM Piotr Nowojski <
> > > pnowoj...@apache.org
> > > >>> wrote:
> > >  Hi all,
> > >  I would like to start a vote for the FLIP-444 [1]. The
> > discussion
> > > >>> thread is
> > >  here [2].
> > >  The vote will be open for at least 72.
> > >  Best,
> > >  Piotrek
> > >  [1]
> > >
> >
> https://www.google.com/url?q=https://cwiki.apache.org/confluence/x/rAn9EQ&source=gmail-imap&ust=172001618500&usg=AOvVaw2Yrz31zWmRgrWMKU4z4V0k
> > >  [2]
> > > >>>
> > >
> >
> https://www.google.com/url?q=https://lists.apache.org/thread/lkwmyjt2bnmvgx4qpp82rldwmtd4516c&source=gmail-imap&ust=172001618500&usg=AOvVaw2i8Laq3tyfQM_Zd4rZQoPz
> > >
> > >
> >
>


Re: [VOTE] FLIP-444: Native file copy support

2024-06-27 Thread Martijn Visser
+1 (binding)

On Thu, Jun 27, 2024 at 3:32 AM Rui Fan <1996fan...@gmail.com> wrote:

> +1(binding)
>
> Best,
> Rui
>
> On Wed, Jun 26, 2024 at 10:22 PM Stefan Richter
>  wrote:
>
> >
> > +1 (binding)
> >
> > Best,
> > Stefan
> >
> >
> >
> > > On 26. Jun 2024, at 16:14, Hong  wrote:
> > >
> > > +1 (binding)
> > >
> > > Hong
> > >
> > >> On 26 Jun 2024, at 12:27, Keith Lee  > > wrote:
> > >>
> > >> +1 (non binding)
> > >>
> > >> Best regards
> > >> Keith Lee
> > >>
> > >>
> > >>> On Wed, Jun 26, 2024 at 9:48 AM Zakelly Lan  > > wrote:
> > >>> +1 (binding)
> > >>> Best,
> > >>> Zakelly
> >  On Wed, Jun 26, 2024 at 3:54 PM Yuepeng Pan  > > wrote:
> >  +1 (non-binding)
> >  Best regards,
> >  Yuepeng Pan
> >  At 2024-06-26 15:27:17, "Piotr Nowojski"  > > wrote:
> > > Thanks for pointing this out Zakelly. After the discussion on the
> dev
> > > mailing list, I have updated the `PathsCopyingFileSystem` to merge
> > its
> > > functionalities with `DuplicatingFileSystem`, but I've just
> > forgotten to
> > > mention that it will removed/replaced with
> `PathsCopyingFileSystem`.
> > > Vote can be resumed.
> > > Best,
> > > Piotrek
> > > wt., 25 cze 2024 o 18:57 Piotr Nowojski 
> >  napisał(a):
> > >> Ops, I must have forgotten to update the FLIP as we discussed. I
> > will
> >  fix
> > >> it tomorrow and the vote period will be extended.
> > >> Best,
> > >> Piotrek
> > >> wt., 25 cze 2024 o 13:56 Zakelly Lan 
> >  napisał(a):
> > >>> Hi Piotrek,
> > >>> I don't see any statement about removing or renaming the
> > >>> `DuplicatingFileSystem` in the FLIP, shall we do that as
> mentioned
> > in
> >  the
> > >>> discussion thread?
> > >>> Best,
> > >>> Zakelly
> > >>> On Tue, Jun 25, 2024 at 4:58 PM Piotr Nowojski <
> > pnowoj...@apache.org
> > >>> wrote:
> >  Hi all,
> >  I would like to start a vote for the FLIP-444 [1]. The
> discussion
> > >>> thread is
> >  here [2].
> >  The vote will be open for at least 72.
> >  Best,
> >  Piotrek
> >  [1]
> >
> https://www.google.com/url?q=https://cwiki.apache.org/confluence/x/rAn9EQ&source=gmail-imap&ust=172001618500&usg=AOvVaw2Yrz31zWmRgrWMKU4z4V0k
> >  [2]
> > >>>
> >
> https://www.google.com/url?q=https://lists.apache.org/thread/lkwmyjt2bnmvgx4qpp82rldwmtd4516c&source=gmail-imap&ust=172001618500&usg=AOvVaw2i8Laq3tyfQM_Zd4rZQoPz
> >
> >
>


Re: [DISCUSS] Release flink-shaded 19.0

2024-06-27 Thread Dawid Wysakowicz
I’ve seen fixes for the blocker has been merged. Thank you! I’ll proceed
with my original plan and prepare a release tomorrow if there are no
further concerns.

On Thu, 27 Jun 2024 at 15:06, Zakelly Lan  wrote:

> Hi Sergey,
>
> FYI we also meet the regression of stringRead or stringWrite when upgrading
> the JMH[1]. I guess it is caused by JIT. I'd suggest accepting it if JIT is
> so sensitive to 'unrelated' change in codebase, since this probably won't
> be a problem in production environment.
>
> I recommend we do not overemphasize these results. How about going ahead
> with the upgrade first and testing the performance, then we can discuss if
> there is indeed a concern.
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-35629
>
> Best,
> Zakelly
>
> On Thu, Jun 27, 2024 at 6:21 PM Sergey Nuyanzin 
> wrote:
>
> > Thanks for starting this discussion Dawid
> >
> > In general I'm +1 for the release
> >
> > IIRC there is one blocker for flink-shaded[1] which was the reason of
> > downgrading it from 18 to 17[2]
> >
> > I would suggest to resolve it first
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-34234
> > [2] https://issues.apache.org/jira/browse/FLINK-34148
> >
> > On Thu, Jun 27, 2024 at 11:42 AM Dawid Wysakowicz
> >  wrote:
> > >
> > > Hi,
> > > I'd like to propose releasing flink-shaded 19.0. I suggest it has a
> very
> > > limited scope and include just shading of
> > com.jayway.jsonpath:json-path[1]
> > >
> > > This would let us fix FLINK-35696[2]
> > >
> > > If there are no objections, I'd prepare a release.
> > >
> > > [1]
> > >
> >
> https://github.com/apache/flink-shaded/commit/b23e1a811fcacbc5f53993297304131246bb5d04
> > > [2] https://issues.apache.org/jira/browse/FLINK-35696
> > >
> > > Best,
> > > Dawid
> >
> >
> >
> > --
> > Best regards,
> > Sergey
> >
>


[jira] [Created] (FLINK-35715) Mysql Source support schema cache to deserialize record

2024-06-27 Thread Hongshun Wang (Jira)
Hongshun Wang created FLINK-35715:
-

 Summary: Mysql Source support schema cache to deserialize record
 Key: FLINK-35715
 URL: https://issues.apache.org/jira/browse/FLINK-35715
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: cdc-3.1.1
Reporter: Hongshun Wang
 Fix For: cdc-3.2.0


 

Current, DebeziumEventDeserializationSchema will deserialize each record with 
schema inferred by this record.

 
{code:java}
private RecordData extractDataRecord(Struct value, Schema valueSchema) throws 
Exception {
DataType dataType = schemaDataTypeInference.infer(value, valueSchema);
return (RecordData) getOrCreateConverter(dataType).convert(value, 
valueSchema);
}
 {code}
There are some issues:
 # Inferring and creating a converter as soon as a record arrives will incur 
additional costs.
 # Inferring from a record might not reflect the real table schema accurately. 
For instance, a timestamp type with precision 6 in MySQL might have a value 
with 0 nanoseconds of the millisecond. When inferred, it will appear to have a 
precision of 0.

{code:java}
protected DataType inferString(Object value, Schema schema) {
if (ZonedTimestamp.SCHEMA_NAME.equals(schema.name())) {
int nano =
Optional.ofNullable((String) value)
.map(s -> ZonedTimestamp.FORMATTER.parse(s, 
Instant::from))
.map(Instant::getNano)
.orElse(0);

int precision;
if (nano == 0) {
precision = 0;
} else if (nano % 1000 > 0) {
precision = 9;
} else if (nano % 1000_000 > 0) {
precision = 6;
} else if (nano % 1000_000_000 > 0) {
precision = 3;
} else {
precision = 0;
}
return DataTypes.TIMESTAMP_LTZ(precision);
}
return DataTypes.STRING();
} {code}
However, timestamps with different precisions will have different data formats 
in BinaryRecordData. Placing data with a timestamp of 0 precision and then 
parsing it with a precision of 6 will result in an exception being thrown.

 
{code:java}
//org.apache.flink.cdc.common.data.binary.BinaryRecordData#getTimestamp
@Override
public TimestampData getTimestamp(int pos, int precision) {
assertIndexIsValid(pos);

if (TimestampData.isCompact(precision)) {
return 
TimestampData.fromMillis(segments[0].getLong(getFieldOffset(pos)));
}

int fieldOffset = getFieldOffset(pos);
final long offsetAndNanoOfMilli = segments[0].getLong(fieldOffset);
return BinarySegmentUtils.readTimestampData(segments, offset, 
offsetAndNanoOfMilli);
} {code}
Thus, I think we should cache the table schema in Source, and only update it 
with SchemaChangeRecord. Thus, the schema of source 
SourceRecordEventDeserializer is always same with database.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Release flink-shaded 19.0

2024-06-27 Thread Zakelly Lan
Hi Sergey,

FYI we also meet the regression of stringRead or stringWrite when upgrading
the JMH[1]. I guess it is caused by JIT. I'd suggest accepting it if JIT is
so sensitive to 'unrelated' change in codebase, since this probably won't
be a problem in production environment.

I recommend we do not overemphasize these results. How about going ahead
with the upgrade first and testing the performance, then we can discuss if
there is indeed a concern.


[1] https://issues.apache.org/jira/browse/FLINK-35629

Best,
Zakelly

On Thu, Jun 27, 2024 at 6:21 PM Sergey Nuyanzin  wrote:

> Thanks for starting this discussion Dawid
>
> In general I'm +1 for the release
>
> IIRC there is one blocker for flink-shaded[1] which was the reason of
> downgrading it from 18 to 17[2]
>
> I would suggest to resolve it first
>
> [1] https://issues.apache.org/jira/browse/FLINK-34234
> [2] https://issues.apache.org/jira/browse/FLINK-34148
>
> On Thu, Jun 27, 2024 at 11:42 AM Dawid Wysakowicz
>  wrote:
> >
> > Hi,
> > I'd like to propose releasing flink-shaded 19.0. I suggest it has a very
> > limited scope and include just shading of
> com.jayway.jsonpath:json-path[1]
> >
> > This would let us fix FLINK-35696[2]
> >
> > If there are no objections, I'd prepare a release.
> >
> > [1]
> >
> https://github.com/apache/flink-shaded/commit/b23e1a811fcacbc5f53993297304131246bb5d04
> > [2] https://issues.apache.org/jira/browse/FLINK-35696
> >
> > Best,
> > Dawid
>
>
>
> --
> Best regards,
> Sergey
>


Re: [DISCUSS] Release flink-shaded 19.0

2024-06-27 Thread Sergey Nuyanzin
Thanks for starting this discussion Dawid

In general I'm +1 for the release

IIRC there is one blocker for flink-shaded[1] which was the reason of
downgrading it from 18 to 17[2]

I would suggest to resolve it first

[1] https://issues.apache.org/jira/browse/FLINK-34234
[2] https://issues.apache.org/jira/browse/FLINK-34148

On Thu, Jun 27, 2024 at 11:42 AM Dawid Wysakowicz
 wrote:
>
> Hi,
> I'd like to propose releasing flink-shaded 19.0. I suggest it has a very
> limited scope and include just shading of com.jayway.jsonpath:json-path[1]
>
> This would let us fix FLINK-35696[2]
>
> If there are no objections, I'd prepare a release.
>
> [1]
> https://github.com/apache/flink-shaded/commit/b23e1a811fcacbc5f53993297304131246bb5d04
> [2] https://issues.apache.org/jira/browse/FLINK-35696
>
> Best,
> Dawid



-- 
Best regards,
Sergey


Re: Question about v2 committer guarantees

2024-06-27 Thread Fabian Paul
Hi Scott,

It's great to see further adoption of the Sink V2 architecture.
Happy to answer your questions.

1. The sink architecture should ensure that Committer:commit is always
called with all committables from a subtask for a given subtaskId.There is
an open issue where users have reported a problem with that assumption [1]
but we haven't been able to track the problem down. Behind the scenes, if
you use a SinkWriter->Committer topology, the Committables are transferred
via the checkpoint barrier channel to the committer (not transferring the
committables results in failing the checkpoint) and checkpointed by the
committer before Committer:commit is called. This means when
Committer:commit is called, it reads the comittables from the local state.

2. Committer::commit is called on what we call in Flink
notifyCheckpointComplete which is based on a RPC call that the Jobmanager
makes to all Taskmanagers when a checkpoint is finished. There is no
guarantee when or if this will be called, but eventually. If some of the
RPCs are delayed or do not reach the manager, the Committer will accumulate
committables from multiple checkpoints.

3. I am not sure I fully understand that point. I see two different
requirements. First, you could skip a committable if you do not want to
commit it, which you could do with calling
CommitRequest::signalAlreadyCommitted [2]. It's not the primary purpose of
the method, but it should suffice. The second point is having a
communication mechanism between SinkWriter and Committer, which at the
moment does not exist. I would love to hear more details about why the
rewrite is necessary maybe we can model the sink differently to achieve
that requirement.

Can you explain more about the relation between your questions and using
the SupportsPreCommitTopology::addPreCommitTopology? It sounds like you are
not planning to use the Committer but a custom globalCommitter.

Best,
Fabian

[1] https://issues.apache.org/jira/browse/FLINK-25920
[2]
https://github.com/apache/flink/blob/7fc3aac774f5deb9b48727ba5f916c78085b49b9/flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java#L100


[DISCUSS] Release flink-shaded 19.0

2024-06-27 Thread Dawid Wysakowicz
Hi,
I'd like to propose releasing flink-shaded 19.0. I suggest it has a very
limited scope and include just shading of com.jayway.jsonpath:json-path[1]

This would let us fix FLINK-35696[2]

If there are no objections, I'd prepare a release.

[1]
https://github.com/apache/flink-shaded/commit/b23e1a811fcacbc5f53993297304131246bb5d04
[2] https://issues.apache.org/jira/browse/FLINK-35696

Best,
Dawid


[DISCUSS]FLIP-XXX Add processing of unmatched events in Flink CEP

2024-06-27 Thread Anton Sidorov
Hello!

Our team works with Flink CEP library for real time events processing. But
we noticed that some non-matching events are lost. We propose some changes
in the Flink CEP library that provide the ability to handle non-matching
events.

We propose adding a new interface  UnmatchedEventsHandler similar to
TimedOutPartialMatchHandler, and add a state for storing all events in
CepOperator.

More information in FLIP document
https://docs.google.com/document/d/1V5QldVFUOLRggIJzAoESLm8Orkk9th7f4o0Y0gsolh4/edit?usp=sharing


[jira] [Created] (FLINK-35714) Flink CDC pipeline sink support Automatic create tag

2024-06-27 Thread melin (Jira)
melin created FLINK-35714:
-

 Summary: Flink CDC pipeline sink support Automatic create tag
 Key: FLINK-35714
 URL: https://issues.apache.org/jira/browse/FLINK-35714
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Affects Versions: cdc-3.2.0
Reporter: melin


Flink CDC pipeline sink support Automatic create tag:

https://paimon.apache.org/docs/master/maintenance/manage-tags/#automatic-creation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)