[jira] [Created] (FLINK-35221) Support SQL 2011 reserved keywords as identifiers in Flink HiveParser

2024-04-23 Thread Wencong Liu (Jira)
Wencong Liu created FLINK-35221:
---

 Summary: Support SQL 2011 reserved keywords as identifiers in 
Flink HiveParser 
 Key: FLINK-35221
 URL: https://issues.apache.org/jira/browse/FLINK-35221
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Affects Versions: 1.20.0
Reporter: Wencong Liu


According to Hive user documentation[1], starting from version 0.13.0, Hive 
prohibits the use of reserved keywords as identifiers. Moreover, versions 2.1.0 
and earlier allow using SQL11 reserved keywords as identifiers by setting 
{{hive.support.sql11.reserved.keywords=false}} in hive-site.xml. This 
compatibility feature facilitates jobs that utilize keywords as identifiers.

HiveParser in Flink, relying on Hive version 2.3.9, lacks the option to treat 
SQL11 reserved keywords as identifiers. This poses a challenge for users 
migrating SQL from Hive 1.x to Flink SQL, as they might encounter scenarios 
where keywords are used as identifiers. Addressing this issue is necessary to 
support such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-04-23 Thread Gyula Fóra
Thank you for driving this effort

+1

Cheers
Gyula

On Wed, 24 Apr 2024 at 06:25, Yuan Mei  wrote:

> Hey Yue,
>
> Thanks for all the great efforts significantly improving rescaling and
> upgrading rocksdb.
>
> +1 for this.
>
> Best
> Yuan
>
> On Wed, Apr 24, 2024 at 10:46 AM Zakelly Lan 
> wrote:
>
> > Hi Yue,
> >
> > Thanks for this proposal!
> >
> > Given the great improvement we could have, the slight regression in write
> > performance is a worthwhile trade-off, particularly as the mem-table
> > operations contribute only a minor part to the overall overhead. So +1
> for
> > this.
> >
> >
> > Best,
> > Zakelly
> >
> > On Tue, Apr 23, 2024 at 12:53 PM Yun Tang  wrote:
> >
> > > Hi Yue,
> > >
> > > Thanks for driving this work.
> > >
> > > It has been three years since last major upgrade of FRocksDB. And it
> > would
> > > be great improvement of Flink's state-backend with this upgrade. +1 for
> > > this work.
> > >
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: Yanfei Lei 
> > > Sent: Tuesday, April 23, 2024 12:50
> > > To: dev@flink.apache.org 
> > > Subject: Re: [DISCUSS] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0
> > >
> > > Hi Yue & Roman,
> > >
> > > Thanks for initiating this FLIP and all the efforts for the upgrade.
> > >
> > > 8.10.0 introduces some new features, making it possible for Flink to
> > > implement some new exciting features, and the upgrade also makes
> > > FRocksDB easier to maintain, +1 for upgrading.
> > >
> > > I read the FLIP and have a minor comment, it would be better to add
> > > some description about the environment/configuration of the nexmark's
> > > result.
> > >
> > > Roman Khachatryan  于2024年4月23日周二 12:07写道:
> > >
> > > >
> > > > Hi,
> > > >
> > > > Thanks for writing the proposal and preparing the upgrade.
> > > >
> > > > FRocksDB  definitely needs to be kept in sync with the upstream and
> the
> > > new
> > > > APIs are necessary for faster rescaling.
> > > > We're already using a similar version internally.
> > > >
> > > > I reviewed the FLIP and it looks good to me (disclaimer: I took part
> in
> > > > some steps of this effort).
> > > >
> > > >
> > > > Regards,
> > > > Roman
> > > >
> > > > On Mon, Apr 22, 2024, 08:11 yue ma  wrote:
> > > >
> > > > > Hi Flink devs,
> > > > >
> > > > > I would like to start a discussion on FLIP-447: Upgrade FRocksDB
> from
> > > > > 6.20.3 to 8.10.0
> > > > >
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> > > > >
> > > > > This FLIP proposes upgrading the version of FRocksDB in the Flink
> > > Project
> > > > > from 6.20.3 to 8.10.0.
> > > > > The FLIP mainly introduces the main benefits of upgrading FRocksDB,
> > > > > including the use of IngestDB which can improve Rescaling
> performance
> > > by
> > > > > more than 10 times in certain scenarios, as well as other potential
> > > > > optimization points such as async_io, blob db, and tiered
> storage.The
> > > > > FLIP also presented test results based on RocksDB 8.10, including
> > > > > StateBenchmark and Nexmark tests.
> > > > > Overall, upgrading FRocksDB may result in a small regression of
> write
> > > > > performance( which is a very small part of the overall overhead),
> but
> > > it
> > > > > can bring many important performance benefits.
> > > > > So we hope to upgrade the version of FRocksDB through this FLIP.
> > > > >
> > > > > Looking forward to everyone's feedback and suggestions. Thank you!
> > > > > --
> > > > > Best regards,
> > > > > Yue
> > > > >
> > >
> > >
> > >
> > > --
> > > Best,
> > > Yanfei
> > >
> >
>


Re: [DISCUSS] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-04-23 Thread Yuan Mei
Hey Yue,

Thanks for all the great efforts significantly improving rescaling and
upgrading rocksdb.

+1 for this.

Best
Yuan

On Wed, Apr 24, 2024 at 10:46 AM Zakelly Lan  wrote:

> Hi Yue,
>
> Thanks for this proposal!
>
> Given the great improvement we could have, the slight regression in write
> performance is a worthwhile trade-off, particularly as the mem-table
> operations contribute only a minor part to the overall overhead. So +1 for
> this.
>
>
> Best,
> Zakelly
>
> On Tue, Apr 23, 2024 at 12:53 PM Yun Tang  wrote:
>
> > Hi Yue,
> >
> > Thanks for driving this work.
> >
> > It has been three years since last major upgrade of FRocksDB. And it
> would
> > be great improvement of Flink's state-backend with this upgrade. +1 for
> > this work.
> >
> >
> > Best
> > Yun Tang
> > 
> > From: Yanfei Lei 
> > Sent: Tuesday, April 23, 2024 12:50
> > To: dev@flink.apache.org 
> > Subject: Re: [DISCUSS] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0
> >
> > Hi Yue & Roman,
> >
> > Thanks for initiating this FLIP and all the efforts for the upgrade.
> >
> > 8.10.0 introduces some new features, making it possible for Flink to
> > implement some new exciting features, and the upgrade also makes
> > FRocksDB easier to maintain, +1 for upgrading.
> >
> > I read the FLIP and have a minor comment, it would be better to add
> > some description about the environment/configuration of the nexmark's
> > result.
> >
> > Roman Khachatryan  于2024年4月23日周二 12:07写道:
> >
> > >
> > > Hi,
> > >
> > > Thanks for writing the proposal and preparing the upgrade.
> > >
> > > FRocksDB  definitely needs to be kept in sync with the upstream and the
> > new
> > > APIs are necessary for faster rescaling.
> > > We're already using a similar version internally.
> > >
> > > I reviewed the FLIP and it looks good to me (disclaimer: I took part in
> > > some steps of this effort).
> > >
> > >
> > > Regards,
> > > Roman
> > >
> > > On Mon, Apr 22, 2024, 08:11 yue ma  wrote:
> > >
> > > > Hi Flink devs,
> > > >
> > > > I would like to start a discussion on FLIP-447: Upgrade FRocksDB from
> > > > 6.20.3 to 8.10.0
> > > >
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> > > >
> > > > This FLIP proposes upgrading the version of FRocksDB in the Flink
> > Project
> > > > from 6.20.3 to 8.10.0.
> > > > The FLIP mainly introduces the main benefits of upgrading FRocksDB,
> > > > including the use of IngestDB which can improve Rescaling performance
> > by
> > > > more than 10 times in certain scenarios, as well as other potential
> > > > optimization points such as async_io, blob db, and tiered storage.The
> > > > FLIP also presented test results based on RocksDB 8.10, including
> > > > StateBenchmark and Nexmark tests.
> > > > Overall, upgrading FRocksDB may result in a small regression of write
> > > > performance( which is a very small part of the overall overhead), but
> > it
> > > > can bring many important performance benefits.
> > > > So we hope to upgrade the version of FRocksDB through this FLIP.
> > > >
> > > > Looking forward to everyone's feedback and suggestions. Thank you!
> > > > --
> > > > Best regards,
> > > > Yue
> > > >
> >
> >
> >
> > --
> > Best,
> > Yanfei
> >
>


[jira] [Created] (FLINK-35220) the error of mysql cdc

2024-04-23 Thread xiaotouming (Jira)
xiaotouming created FLINK-35220:
---

 Summary: the error of mysql cdc 
 Key: FLINK-35220
 URL: https://issues.apache.org/jira/browse/FLINK-35220
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: xiaotouming


When we listen for incremental data on a table,there is a error occur once。

the log is:

2024-04-15 04:52:36
java.lang.RuntimeException: One or more fetchers have encountered exception
    at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:261)
    at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:169)
    at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:131)
    at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:157)
    at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:419)
    at 
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
    at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:550)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788)
    at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
    at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: SplitFetcher thread 0 received 
unexpected exception while polling the records
    at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:165)
    at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:114)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more
Caused by: 
com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.errors.ConnectException:
 An exception occurred in the change event producer. This connector will be 
stopped.
    at 
io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:50)
    at 
com.ververica.cdc.connectors.mysql.debezium.task.context.MySqlErrorHandler.setProducerThrowable(MySqlErrorHandler.java:85)
    at 
io.debezium.connector.mysql.MySqlStreamingChangeEventSource$ReaderThreadLifecycleListener.onCommunicationFailure(MySqlStreamingChangeEventSource.java:1544)
    at 
com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1079)
    at 
com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:631)
    at 
com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:932)
    ... 1 more
Caused by: io.debezium.DebeziumException: Failed to deserialize data of 
EventHeaderV4\{timestamp=171312635, eventType=EXT_UPDATE_ROWS, 
serverId=1655776775, headerLength=19, dataLength=3291, nextPosition=647993741, 
flags=0}
    at 
io.debezium.connector.mysql.MySqlStreamingChangeEventSource.wrap(MySqlStreamingChangeEventSource.java:1488)
    ... 5 more
Caused by: 
com.github.shyiko.mysql.binlog.event.deserialization.EventDataDeserializationException:
 Failed to deserialize data of EventHeaderV4\{timestamp=171312635, 
eventType=EXT_UPDATE_ROWS, serverId=1655776775, headerLength=19, 
dataLength=3291, nextPosition=647993741, flags=0}
    at 
com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.deserializeEventData(EventDeserializer.java:341)
    at 
com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.nextEvent(EventDeserializer.java:244)
    at 
io.debezium.connector.mysql.MySqlStreamingChangeEventSource$1.nextEvent(MySqlStreamingChangeEventSource.java:259)
    at 
com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1051)
    ... 3 more
Caused by: java.io.EOFException: Failed to read remaining 1 of 5 bytes from 
position 15847639. Block length: 173. Initial block length: 3287.
    at 
com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.fill(ByteArrayInputStream.java:115)
    at 
com.github.shyiko.mysql.binlog.io.ByteArray

Re: [DISCUSS] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-04-23 Thread Zakelly Lan
Hi Yue,

Thanks for this proposal!

Given the great improvement we could have, the slight regression in write
performance is a worthwhile trade-off, particularly as the mem-table
operations contribute only a minor part to the overall overhead. So +1 for
this.


Best,
Zakelly

On Tue, Apr 23, 2024 at 12:53 PM Yun Tang  wrote:

> Hi Yue,
>
> Thanks for driving this work.
>
> It has been three years since last major upgrade of FRocksDB. And it would
> be great improvement of Flink's state-backend with this upgrade. +1 for
> this work.
>
>
> Best
> Yun Tang
> 
> From: Yanfei Lei 
> Sent: Tuesday, April 23, 2024 12:50
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0
>
> Hi Yue & Roman,
>
> Thanks for initiating this FLIP and all the efforts for the upgrade.
>
> 8.10.0 introduces some new features, making it possible for Flink to
> implement some new exciting features, and the upgrade also makes
> FRocksDB easier to maintain, +1 for upgrading.
>
> I read the FLIP and have a minor comment, it would be better to add
> some description about the environment/configuration of the nexmark's
> result.
>
> Roman Khachatryan  于2024年4月23日周二 12:07写道:
>
> >
> > Hi,
> >
> > Thanks for writing the proposal and preparing the upgrade.
> >
> > FRocksDB  definitely needs to be kept in sync with the upstream and the
> new
> > APIs are necessary for faster rescaling.
> > We're already using a similar version internally.
> >
> > I reviewed the FLIP and it looks good to me (disclaimer: I took part in
> > some steps of this effort).
> >
> >
> > Regards,
> > Roman
> >
> > On Mon, Apr 22, 2024, 08:11 yue ma  wrote:
> >
> > > Hi Flink devs,
> > >
> > > I would like to start a discussion on FLIP-447: Upgrade FRocksDB from
> > > 6.20.3 to 8.10.0
> > >
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> > >
> > > This FLIP proposes upgrading the version of FRocksDB in the Flink
> Project
> > > from 6.20.3 to 8.10.0.
> > > The FLIP mainly introduces the main benefits of upgrading FRocksDB,
> > > including the use of IngestDB which can improve Rescaling performance
> by
> > > more than 10 times in certain scenarios, as well as other potential
> > > optimization points such as async_io, blob db, and tiered storage.The
> > > FLIP also presented test results based on RocksDB 8.10, including
> > > StateBenchmark and Nexmark tests.
> > > Overall, upgrading FRocksDB may result in a small regression of write
> > > performance( which is a very small part of the overall overhead), but
> it
> > > can bring many important performance benefits.
> > > So we hope to upgrade the version of FRocksDB through this FLIP.
> > >
> > > Looking forward to everyone's feedback and suggestions. Thank you!
> > > --
> > > Best regards,
> > > Yue
> > >
>
>
>
> --
> Best,
> Yanfei
>


[jira] [Created] (FLINK-35219) [Flink SQL]Support deserialize json string into Map

2024-04-23 Thread zhihao zhang (Jira)
zhihao zhang created FLINK-35219:


 Summary: [Flink SQL]Support deserialize json string into Map
 Key: FLINK-35219
 URL: https://issues.apache.org/jira/browse/FLINK-35219
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: zhihao zhang


like Spark's `from_json` sql 
function:https://spark.apache.org/docs/latest/api/sql/index.html#from_json




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-23 Thread Martijn Visser
>From a procedural point of view, we shouldn't make FLIPs sub-tasks for
existing FLIPs that have been voted/are released. That will only cause
confusion down the line. A new FLIP should take existing functionality
(like FLIP-304) into account, and propose how to improve on what that
original FLIP has introduced or how you're going to leverage what's already
there.

On Tue, Apr 23, 2024 at 11:42 AM ramkrishna vasudevan <
ramvasu.fl...@gmail.com> wrote:

> Hi Gyula and Ahmed,
>
> I totally agree that there is an interlap in the final goal that both the
> FLIPs are achieving here and infact FLIP-304 is more comprehensive for job
> failures.
>
> But as a proposal to move forward can we make Swathi's FLIP/JIRA as a sub
> task for FLIP-304 and continue with the PR since the main aim is to get the
> cluster failure pushed to the termination log for K8s based deployments.
> And once it is completed we can work to make FLIP-304 to support job
> failure propagation to termination log?
>
> Regards
> Ram
>
> On Thu, Apr 18, 2024 at 10:07 PM Swathi C 
> wrote:
>
> > Hi Gyula and  Ahmed,
> >
> > Thanks for reviewing this.
> >
> > @gyula.f...@gmail.com  , currently since our aim
> as
> > part of this FLIP was only to fail the cluster when job manager/flink has
> > issues such that the cluster would no longer be usable, hence, we
> proposed
> > only related to that.
> > Your right, that it covers only job main class errors, job manager run
> time
> > failures, if the Job manager wants to write any metadata to any other
> > system ( ABFS, S3 , ... )  and the job failures will not be covered.
> >
> > FLIP-304 is mainly used to provide Failure enrichers for job failures.
> > Since, this FLIP is mainly for flink Job manager failures, let us know if
> > we can leverage the goodness of both and try to extend FLIP-304 and add
> our
> > plugin implementation to cover the job level issues ( propagate this info
> > to the /dev/termination-log such that, the container status reports it
> for
> > flink on K8S by implementing Failure Enricher interface and
> > processFailure() to do this ) and use this FLIP proposal for generic
> flink
> > cluster (Job manager/cluster ) failures.
> >
> > Regards,
> > Swathi C
> >
> > On Thu, Apr 18, 2024 at 7:36 PM Ahmed Hamdy 
> wrote:
> >
> > > Hi Swathi!
> > > Thanks for the proposal.
> > > Could you please elaborate what this FLIP offers more than Flip-304[1]?
> > > Flip 304 proposes a Pluggable mechanism for enriching Job failures, If
> I
> > am
> > > not mistaken this proposal looks like a subset of it.
> > >
> > > 1-
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers
> > >
> > > Best Regards
> > > Ahmed Hamdy
> > >
> > >
> > > On Thu, 18 Apr 2024 at 08:23, Gyula Fóra  wrote:
> > >
> > > > Hi Swathi!
> > > >
> > > > Thank you for creating this proposal. I really like the general idea
> of
> > > > increasing the K8s native observability of Flink job errors.
> > > >
> > > > I took a quick look at your reference PR, the termination log related
> > > logic
> > > > is contained completely in the ClusterEntrypoint. What type of errors
> > > will
> > > > this actually cover?
> > > >
> > > > To me this seems to cover only:
> > > >  - Job main class errors (ie startup errors)
> > > >  - JobManager failures
> > > >
> > > > Would regular job errors (that cause only job failover but not JM
> > errors)
> > > > be reported somehow with this plugin?
> > > >
> > > > Thanks
> > > > Gyula
> > > >
> > > > On Tue, Apr 16, 2024 at 8:21 AM Swathi C 
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to start a discussion on FLIP-XXX : [Plugin] Enhancing
> > > Flink
> > > > > Failure Management in Kubernetes with Dynamic Termination Log
> > > > Integration.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing
> > > > >
> > > > >
> > > > > This FLIP proposes an improvement plugin and focuses mainly on
> Flink
> > on
> > > > > K8S but can be used as a generic plugin and add further
> enhancements.
> > > > >
> > > > > Looking forward to everyone's feedback and suggestions. Thank you
> !!
> > > > >
> > > > > Best Regards,
> > > > > Swathi Chandrashekar
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS][QUESTION] Drop jdk 8 support for Flink connector Opensearch

2024-04-23 Thread Sergey Nuyanzin
thanks for your response, Danny

yeah, I think we could have 2 releases: one for v1 (jdk8) and one for v2
(jdk11)
I guess we could even have same version for both since anyway artifact
names are different

On Thu, Apr 18, 2024 at 4:01 PM Danny Cranmer 
wrote:

> Hey Sergey,
>
> Given the circumstances I think it is ok to drop JDK 8 support with the
> opensearch v2.0.0 connector. However, in parallel we should still support
> the 1.x line for Flink 1.x series with JDK 8. This would mean two releases:
> 1/ flink-connector-opensearch v2.0.0 for Flink 1.18/1.19, opensearch
> 1.x/2.x and JDK 11
> 2/ flink-connector-opensearch v1.2.0 (or maybe just 1.1.0-1.19) for Flink
> 1.18/1.19, opensearch 1.x and JDK 8
>
> What do you think?
>
> Thanks,
> Danny
>
> On Wed, Apr 17, 2024 at 10:07 AM Sergey Nuyanzin 
> wrote:
>
> > Hi everyone
> >
> > I'm working on support for Opensearch v2.x for Flink connector
> > Opensearch[1].
> > Unfortunately after several breaking changes (e.g. [2], [3]) on
> Opensearch
> > side it is not possible
> > anymore to use the same connector built for both Opensearch v1 and v2.
> > This makes us to go in a similar way as for Elasticsearch 6/7 and build a
> > dedicated Opensearch v2 module.
> >
> > However the main pain point here is that Opensearch 2.x is built with
> jdk11
> > and requires jdk11 to build and use Flink connector as well.
> > Also in README[4] of most of the connectors it is mentioned explicitly
> that
> > jdk11 is required to build connectors.
> >
> > At the same time it looks like we need to release a connector for
> > Opensearch v1 with jdk8 and for Opensearch v2 with jdk11.
> >
> > The suggestion is to drop support of jdk8 for the Opensearch connector to
> > make the release/testing for both modules (for Opensearch v1 and
> Openseach
> > v2) easier.
> >
> > Other opinions are welcome
> >
> > [1] https://github.com/apache/flink-connector-opensearch/pull/38
> > [2] opensearch-project/OpenSearch#9082
> > [3] opensearch-project/OpenSearch#5902
> > [4]
> >
> >
> https://github.com/apache/flink-connector-opensearch/blob/main/README.md?plain=1#L18
> >
> > --
> > Best regards,
> > Sergey
> >
>


-- 
Best regards,
Sergey


[jira] [Created] (FLINK-35218) Duplicated values caused by expired state TTL

2024-04-23 Thread garcia (Jira)
garcia created FLINK-35218:
--

 Summary: Duplicated values caused by expired state TTL   
 Key: FLINK-35218
 URL: https://issues.apache.org/jira/browse/FLINK-35218
 Project: Flink
  Issue Type: Bug
Reporter: garcia
 Attachments: image-2024-04-23-15-34-32-860.png

Hi,

We utilize the state TTL to clean our Flink input tables through the 
`table.exec.state.ttl` configuration. 
However, we encountered an issue when the TTL expires, as illustrated in our 
scenario:

Given this input_table
{code:java}

{
  "$schema": "http://json-schema.org/draft-07/schema";,
  "$id": "http://example.com/example.json";,
  "type": "object",
  "title": "bet placed schema",
  "required": [
    "placement_date"
  ],
  "properties": {
    "bet_id": {
      "$id": "#/properties/bet_id",
      "type": "string"
    },
    "regulator": {
      "$id": "#/properties/regulator",
      "type": "string"
    },
    "match_id": {
      "$id": "#/properties/match_id",
      "type": "integer"
    },
    "combo_id": {
      "$id": "#/properties/combo_id",
      "type": "integer"
    },
    "is_live": {
      "$id": "#/properties/is_live",
      "type": "boolean"
    },
    "offer_catalog": {
      "$id": "#/properties/offer_catalog",
      "type": "string"
    },
    "combo_selection_nbr": {
      "$id": "#/properties/combo_selection_nbr",
      "type": "integer"
    }
  },
  "additionalProperties": true
} {code}
This configuration: 
{code:java}
import org.apache.flink.streaming.api.environment.{StreamExecutionEnvironment 
=> JavaStreamExecutionEnvironment}
import org.apache.flink.streaming.api.scala.{DataStream, 
StreamExecutionEnvironment}val streamEnv = new 
StreamExecutionEnvironment(JavaStreamExecutionEnvironment.getExecutionEnvironment(conf))val
 tableEnv = StreamTableEnvironment.create(env)
   tableEnv.getConfig.getConfiguration
     .setString("table.local-time-zone", "UTC")
   tableEnv.getConfig.getConfiguration
     .setString("table.exec.mini-batch.enabled", "true")
   tableEnv.getConfig.getConfiguration
     .setString("table.exec.mini-batch.allow-latency", "5 s")
   tableEnv.getConfig.getConfiguration
     .setString("table.exec.mini-batch.size", "5000")
   tableEnv.getConfig.getConfiguration
     .setString("table.exec.state.ttl", TimeUnit.MILLISECONDS.convert(1, 
TimeUnit.MINUTES).toString) {code}
And this query (simplified query): 


{code:java}
WITH exploded_combos AS (
    select
        event_timestamp,
        CAST(JSON_VALUE(combos.combo, '$.match_id') AS BIGINT) as match_id,
        CAST(
            JSON_VALUE(combos.combo, '$.combo_selection_id') AS BIGINT
        ) as combo_id,
        CAST(JSON_VALUE(combos.combo, '$.is_live') AS BOOLEAN) as is_live,
        CAST(RegulatorToCatalog(regulator) AS VARCHAR) as offer_catalog,
        CARDINALITY(
            JsonToArray(JSON_QUERY(combos.combo, '$.bet_selections'))
        ) as combo_selections_nbr,
        combo_bet_selections
    from
        bet_placed_view
        CROSS JOIN UNNEST(JsonToArray(combo_bet_selections)) AS combos(combo)
),
agg_match AS (
    SELECT
        match_id,
        LOWER(offer_catalog) as offer_catalog,
        MAX(event_timestamp) AS last_event_time_utc,
        COUNT(*) AS bet_count
    FROM
        exploded_combos
    WHERE
        match_id IS NOT NULL
        AND combo_id IS NOT NULL
        AND offer_catalog IS NOT NULL
        AND combo_bet_selections IS NOT NULL
    GROUP BY
        match_id,
        LOWER(offer_catalog)
),
agg_combo AS (
    SELECT
        match_id,
        combo_id,
        combo_selections_nbr,
        is_live,
        LOWER(offer_catalog) AS offer_catalog,
        MAX(event_timestamp) AS last_event_time_utc,
        COUNT(*) as bet_count
    FROM
        exploded_combos
    WHERE
        match_id IS NOT NULL
        AND combo_id IS NOT NULL
        AND (
            combo_selections_nbr = 3
            OR combo_selections_nbr = 2
        )
        AND offer_catalog IS NOT NULL
    GROUP BY
        match_id,
        combo_id,
        combo_selections_nbr,
        is_live,
        LOWER(offer_catalog)
),
ranked_filtered_agg_combo_main_page AS (
    SELECT
        match_id,
        combo_id,
        offer_catalog,
        bet_count,
        ROW_NUMBER() OVER (
            PARTITION BY match_id,
            offer_catalog
            ORDER BY
                bet_count DESC,
                combo_id DESC
        ) AS rank_combo
    FROM
        agg_combo
    WHERE
        combo_selections_nbr = 3
),
joined_filtered_agg_match_main_page AS (
    SELECT
        ranked_filtered_agg_combo_main_page.match_id,
        ranked_filtered_agg_combo_main_page.offer_catalog,
        ranked_filtered_agg_combo_main_page.bet_count,
        ranked_filtered_agg_combo_main_page.combo_id,
        ROW_NUMBER() OVER (
            PARTITION BY agg_match.offer_catalog
            ORDER BY
                agg_match.bet_count DESC,
           

[jira] [Created] (FLINK-35217) Missing fsync in FileSystemCheckpointStorage

2024-04-23 Thread Marc Aurel Fritz (Jira)
Marc Aurel Fritz created FLINK-35217:


 Summary: Missing fsync in FileSystemCheckpointStorage
 Key: FLINK-35217
 URL: https://issues.apache.org/jira/browse/FLINK-35217
 Project: Flink
  Issue Type: Bug
  Components: FileSystems, Runtime / Checkpointing
Affects Versions: 1.19.0, 1.18.0, 1.17.0
Reporter: Marc Aurel Fritz


While running Flink on a system with unstable power supply checkpoints were 
regularly corrupted in the form of "_metadata" files with a file size of 0 
bytes. In all cases the previous checkpoint data had already been deleted, 
causing progress to be lost completely.

Further investigation revealed that the "FileSystemCheckpointStorage" doesn't 
perform "fsync" when writing a new checkpoint to disk. This means the old 
checkpoint gets removed without making sure that the new one is durably 
persisted on disk. "strace" on the jobmanager's process confirms this behavior:
 # The checkpoint chk-60's in-progress metadata is written at "openat"
 # The checkpoint chk-60's in-progress metadata is atomically renamed at 
"rename"
 # The old checkpoint chk-59 is deleted at "unlink"

For durable persistence an "fsync" call is missing before step 3.
Full "strace" log:
{code:java}
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
0x7fd2ad5fc970) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
0x7fd2ad5fca00) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc", 
{st_mode=S_IFDIR|0755, st_size=42, ...}) = 0
[pid 51618] 11:44:30 
mkdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 0777) = 0
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata", 
0x7fd2ad5fc860) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata", 
0x7fd2ad5fc740) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
 0x7fd2ad5fc7d0) = -1 ENOENT (No such file or directory)
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51618] 11:44:30 openat(AT_FDCWD, 
"/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
 O_WRONLY|O_CREAT|O_EXCL, 0666) = 168
[pid 51618] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
 {st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
[pid 51618] 11:44:30 
rename("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/._metadata.inprogress.bf9518dc-2100-4524-9e67-e42913c2b8e8",
 "/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-60/_metadata") = 0
[pid 51644] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata", 
{st_mode=S_IFREG|0644, st_size=23378, ...}) = 0
[pid 51644] 11:44:30 
unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59/_metadata")
 = 0
[pid 51644] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51644] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51644] 11:44:30 openat(AT_FDCWD, 
"/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168
[pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0, 
...}, AT_EMPTY_PATH) = 0
[pid 51644] 11:44:30 
stat("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
{st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 51644] 11:44:30 openat(AT_FDCWD, 
"/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59", 
O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 168
[pid 51644] 11:44:30 newfstatat(168, "", {st_mode=S_IFDIR|0755, st_size=0, 
...}, AT_EMPTY_PATH) = 0
[pid 51644] 11:44:30 
unlink("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = -1 
EISDIR (Is a directory)
[pid 51644] 11:44:30 
rmdir("/opt/flink/statestore/e1c541c4568515e77df32d82727e20dc/chk-59") = 0 
{code}
To fix this I'm currently testing the following commit: 
[https://github.com/Planet-X/flink/commit/24196cc897533b654f44e2b612543ff023cdb123]

"strace" can confirm that "fsync" is now called before the previous checkpoint 
is removed at "unlink":
{code:java}
[pid 40393] 11:30:17 
stat("/opt

[jira] [Created] (FLINK-35216) Support for RETURNING clause of JSON_QUERY

2024-04-23 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-35216:


 Summary: Support for RETURNING clause of JSON_QUERY
 Key: FLINK-35216
 URL: https://issues.apache.org/jira/browse/FLINK-35216
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.20.0


SQL standard says JSON_QUERY should support RETURNING clause similar to 
JSON_VALUE. Calcite supports the clause for JSON_VALUE already, but not for the 
JSON_QUERY.

{code}
 ::=
  JSON_QUERY 
  
  [  ]
  [  WRAPPER ]
  [  QUOTES [ ON SCALAR STRING ] ]
  [  ON EMPTY ]
  [  ON ERROR ]
  

 ::=
  RETURNING 
  [ FORMAT  ]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [SUMMARY] Flink 1.20 Release Sync 04/23/2024

2024-04-23 Thread Rui Fan
Hi Muhammet,

Thanks for your reminder.

Each meeting will start from 10am (UTC+2) and 4pm (UTC+8).
And the release sync happens bi-weekly at first, and will be
adjusted to weekly as we approach the feature freeze date.

Best,
Rui

On Tue, Apr 23, 2024 at 5:34 PM Muhammet Orazov 
wrote:

> Hey Rui,
>
> Thanks for the summary!
>
> Could you please write what time are you meeting on 5th May?
>
> Thanks and best,
> Muhammet
>
> On 2024-04-23 09:13, Rui Fan wrote:
> > Dear devs,
> >
> > Today is the second meeting for Flink 1.20 release cycle. I'd like to
> > share
> > the information synced in the meeting.
> >
> > - Features:
> >   There're 8 weeks until the feature freeze date, so far we've had 6
> >   flips/features and the status looks good. It is encouraged to
> > continuously
> >   updating the 1.20 wiki page[1] for contributors.
> >
> > - Blockers:
> >   - [Closed] Change on getTransitivePredecessors breaks connectors
> > FLINK-35009
> >   - [Doing] 2 tests fail FLINK-35041, FLINK-34997
> >   - [Doing] 2 benchmarks performance regression FLINK-35040,
> > FLINK-35215
> >
> > - Sync meeting[2]:
> >  The next meeting is 05/07/2024, please feel free to join us.
> >
> > Lastly, we encourage attendees to fill out the topics to be discussed
> > at
> > the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
> > everyone to understand the background of the topics.
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
> > [2] https://meet.google.com/mtj-huez-apu
> >
> > Best,
> >
> > Weijie, Ufuk, Robert and Rui
>


Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure Management in Kubernetes with Dynamic Termination Log Integration

2024-04-23 Thread ramkrishna vasudevan
Hi Gyula and Ahmed,

I totally agree that there is an interlap in the final goal that both the
FLIPs are achieving here and infact FLIP-304 is more comprehensive for job
failures.

But as a proposal to move forward can we make Swathi's FLIP/JIRA as a sub
task for FLIP-304 and continue with the PR since the main aim is to get the
cluster failure pushed to the termination log for K8s based deployments.
And once it is completed we can work to make FLIP-304 to support job
failure propagation to termination log?

Regards
Ram

On Thu, Apr 18, 2024 at 10:07 PM Swathi C  wrote:

> Hi Gyula and  Ahmed,
>
> Thanks for reviewing this.
>
> @gyula.f...@gmail.com  , currently since our aim as
> part of this FLIP was only to fail the cluster when job manager/flink has
> issues such that the cluster would no longer be usable, hence, we proposed
> only related to that.
> Your right, that it covers only job main class errors, job manager run time
> failures, if the Job manager wants to write any metadata to any other
> system ( ABFS, S3 , ... )  and the job failures will not be covered.
>
> FLIP-304 is mainly used to provide Failure enrichers for job failures.
> Since, this FLIP is mainly for flink Job manager failures, let us know if
> we can leverage the goodness of both and try to extend FLIP-304 and add our
> plugin implementation to cover the job level issues ( propagate this info
> to the /dev/termination-log such that, the container status reports it for
> flink on K8S by implementing Failure Enricher interface and
> processFailure() to do this ) and use this FLIP proposal for generic flink
> cluster (Job manager/cluster ) failures.
>
> Regards,
> Swathi C
>
> On Thu, Apr 18, 2024 at 7:36 PM Ahmed Hamdy  wrote:
>
> > Hi Swathi!
> > Thanks for the proposal.
> > Could you please elaborate what this FLIP offers more than Flip-304[1]?
> > Flip 304 proposes a Pluggable mechanism for enriching Job failures, If I
> am
> > not mistaken this proposal looks like a subset of it.
> >
> > 1-
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers
> >
> > Best Regards
> > Ahmed Hamdy
> >
> >
> > On Thu, 18 Apr 2024 at 08:23, Gyula Fóra  wrote:
> >
> > > Hi Swathi!
> > >
> > > Thank you for creating this proposal. I really like the general idea of
> > > increasing the K8s native observability of Flink job errors.
> > >
> > > I took a quick look at your reference PR, the termination log related
> > logic
> > > is contained completely in the ClusterEntrypoint. What type of errors
> > will
> > > this actually cover?
> > >
> > > To me this seems to cover only:
> > >  - Job main class errors (ie startup errors)
> > >  - JobManager failures
> > >
> > > Would regular job errors (that cause only job failover but not JM
> errors)
> > > be reported somehow with this plugin?
> > >
> > > Thanks
> > > Gyula
> > >
> > > On Tue, Apr 16, 2024 at 8:21 AM Swathi C 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I would like to start a discussion on FLIP-XXX : [Plugin] Enhancing
> > Flink
> > > > Failure Management in Kubernetes with Dynamic Termination Log
> > > Integration.
> > > >
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing
> > > >
> > > >
> > > > This FLIP proposes an improvement plugin and focuses mainly on Flink
> on
> > > > K8S but can be used as a generic plugin and add further enhancements.
> > > >
> > > > Looking forward to everyone's feedback and suggestions. Thank you !!
> > > >
> > > > Best Regards,
> > > > Swathi Chandrashekar
> > > >
> > >
> >
>


Re: [SUMMARY] Flink 1.20 Release Sync 04/23/2024

2024-04-23 Thread Muhammet Orazov

Hey Rui,

Thanks for the summary!

Could you please write what time are you meeting on 5th May?

Thanks and best,
Muhammet

On 2024-04-23 09:13, Rui Fan wrote:

Dear devs,

Today is the second meeting for Flink 1.20 release cycle. I'd like to 
share

the information synced in the meeting.

- Features:
  There're 8 weeks until the feature freeze date, so far we've had 6
  flips/features and the status looks good. It is encouraged to 
continuously

  updating the 1.20 wiki page[1] for contributors.

- Blockers:
  - [Closed] Change on getTransitivePredecessors breaks connectors
FLINK-35009
  - [Doing] 2 tests fail FLINK-35041, FLINK-34997
  - [Doing] 2 benchmarks performance regression FLINK-35040, 
FLINK-35215


- Sync meeting[2]:
 The next meeting is 05/07/2024, please feel free to join us.

Lastly, we encourage attendees to fill out the topics to be discussed 
at

the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
everyone to understand the background of the topics.

[1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
[2] https://meet.google.com/mtj-huez-apu

Best,

Weijie, Ufuk, Robert and Rui


Re: [SUMMARY] Flink 1.20 Release Sync 04/23/2024

2024-04-23 Thread weijie guo
Thanks Rui for the summary!

Best regards,

Weijie


Rui Fan <1996fan...@gmail.com> 于2024年4月23日周二 17:13写道:

> Dear devs,
>
> Today is the second meeting for
> Flink 1.20 release cycle. I'd like to share
> the information synced in the meeting.
>
> - Features:
>   There're 8 weeks until the feature freeze date, so far we've had 6
>   flips/features and the status looks good. It is encouraged to
> continuously
>   updating the 1.20 wiki page[1] for contributors.
>
> - Blockers:
>   - [Closed] Change on getTransitivePredecessors breaks connectors
> FLINK-35009
>   - [Doing] 2 tests fail FLINK-35041, FLINK-34997
>   - [Doing] 2 benchmarks performance regression FLINK-35040, FLINK-35215
>
> - Sync meeting[2]:
>  The next meeting is 05/07/2024, please feel free to join us.
>
> Lastly, we encourage attendees to fill out the topics to be discussed at
> the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
> everyone to understand the background of the topics.
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
> [2] https://meet.google.com/mtj-huez-apu
>
> Best,
>
> Weijie, Ufuk, Robert and Rui
>


[SUMMARY] Flink 1.20 Release Sync 04/23/2024

2024-04-23 Thread Rui Fan
Dear devs,

Today is the second meeting for Flink 1.20 release cycle. I'd like to share
the information synced in the meeting.

- Features:
  There're 8 weeks until the feature freeze date, so far we've had 6
  flips/features and the status looks good. It is encouraged to continuously
  updating the 1.20 wiki page[1] for contributors.

- Blockers:
  - [Closed] Change on getTransitivePredecessors breaks connectors
FLINK-35009
  - [Doing] 2 tests fail FLINK-35041, FLINK-34997
  - [Doing] 2 benchmarks performance regression FLINK-35040, FLINK-35215

- Sync meeting[2]:
 The next meeting is 05/07/2024, please feel free to join us.

Lastly, we encourage attendees to fill out the topics to be discussed at
the bottom of 1.20 wiki page[1] a day in advance, to make it easier for
everyone to understand the background of the topics.

[1] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
[2] https://meet.google.com/mtj-huez-apu

Best,

Weijie, Ufuk, Robert and Rui


Re: [DISCUSS] FLIP-446: Kubernetes Operator State Snapshot CRD

2024-04-23 Thread Robert Metzger
Thank you all for the great discussion.

> 2. I agree with you that "completed" is not very clear, but I would
suggest the name "alreadyExists". WDYT?

I'm fine with "alreadyExists" and for 3. also with "backoffLimit".


>  really like the idea of having something like Pod Conditions, but I
think it wouldn't add too much value here

Okay, let's leave it out.

I'm happy to start a VOTE and +1 the FLIP as is.

Looking forward to having this feature in the operator!


On Sat, Apr 20, 2024 at 4:31 PM Mate Czagany  wrote:

> Hi,
>
> Thanks for your comments, Gyula, I really appreciate it!
>
> I have updated the following things in the FLIP, please comment on these
> changes if you have any suggestions or concerns:
> - Added path field to FlinkStateSnapshotReference
> - Added two examples at the bottom.
> - Added error handling section and the new fields associated
> ("backoffLimit" and "failures") to the interfaces.
> - Renamed field "completed" to "alreadyExists".
>
> Regarding the separate resources, I don't think that any of the two
> solutions would bring too much (dis)advantage to the table, so I am still
> neutral, and waiting for others to chime in as well with their thoughts and
> feedback!
>
> Regards,
> Mate
>
>
> Gyula Fóra  ezt írta (időpont: 2024. ápr. 19., P,
> 21:43):
>
> > Hey!
> >
> > Regarding the question of initialSavepointPath and
> > flinkStateSnapshotReference new object, I think we could simply keep an
> > extra field as part of the flinkStateSnapshotReference object called
> path.
> >
> > Then the fields could be:
> > namespace, name, path
> >
> > If path is defined we would use that (to support the simple way also)
> > otherwise use the resource. I would still deprecate the
> > initialSavepointPath field in the jobSpec.
> >
> > Regarding the Savepoint/Checkpoint vs FlinkStateSnapshot.
> > What we need:
> >  1. Easy way to list all state snapshots (to select latest)
> >  2. Easy way to reference a savepoint/checkpoint from a jobspec
> >  3. Differentiate state snapshot types (in some cases users may prefer to
> > use checkpoint/savepoint for certain upgrades) -> we should add a label
> or
> > something for easy selection
> >  4. Be able to delete savepoints (and checkpoints maybe)
> >
> > I am personally slightly more in favor of having a single resource as
> that
> > ticks all the boxes, while having 2 separate resources will make both
> > listing and referencing harder. We would have to introduce state type in
> > the reference (name + namespace would not be enough to uniquely identify
> a
> > state snapshot)
> >
> > I wonder if I am missing any good argument against the single
> > FlinkStateSnapshot here.
> >
> > Cheers,
> > Gyula
> >
> >
> > On Fri, Apr 19, 2024 at 9:09 PM Mate Czagany  wrote:
> >
> >> Hi Robert and Thomas,
> >>
> >> Thank you for sharing your thoughts, I will try to address your
> questions
> >> and suggestions:
> >>
> >> 1. I would really love to hear others' inputs as well about separating
> >> the snapshot CRD into two different CRDs instead for savepoints and
> >> checkpoints. I think the main upside is that we would not need the
> >> mandatory savepoint or checkpoint field inside the spec. The two CRs
> could
> >> share the same status fields, and their specs would be different.
> >> I personally like both solutions, and would love to hear others'
> thoughts
> >> as well.
> >>
> >> 2. I agree with you that "completed" is not very clear, but I would
> >> suggest the name "alreadyExists". WDYT?
> >>
> >> 3. I think having a retry loop inside the operator does not add too much
> >> complexity to the FLIP. On failure, we check if we have reached the max
> >> retries. If we did, the state will be set to "FAILED", else it will be
> set
> >> to "TRIGGER_PENDING", causing the operator to retry the task. The
> "error"
> >> field will always be populated with the latest error. Kubernetes Jobs
> >> already has a similar field called "backoffLimit", maybe we could use
> the
> >> same name, with the same logic applied, WDYT?
> >> About error events, I think we should keep the "error" field, and upon
> >> successful snapshot, we clear it. I will add to the FLIP that there
> will be
> >> an event generated for each unsuccessful snapshots.
> >>
> >> 4. I really like the idea of having something like Pod Conditions, but I
> >> think it wouldn't add too much value here, because the only 2 stages
> >> important to the user are "Triggered" and "Completed", and those
> timestamps
> >> will already be included in the status field. I think it would make more
> >> sense to implement this if there were more lifecycle stages.
> >>
> >> 5. There will be a new field in JobSpec called
> >> "flinkStateSnapshotReference" to reference a FlinkStateSnapshot to
> restore
> >> from.
> >>
> >> > How do you see potential effects on API server performance wrt. number
> >> of
> >> objects vs mutations? Is the proposal more or less neutral in that
> regard?
> >>
> >> While I am not an expert in Kuber