[jira] [Created] (FLINK-33073) Implement end-to-end tests for the Kinesis Streams Sink

2023-09-11 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33073:
---

 Summary: Implement end-to-end tests for the Kinesis Streams Sink
 Key: FLINK-33073
 URL: https://issues.apache.org/jira/browse/FLINK-33073
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / AWS
Reporter: Hong Liang Teoh
 Fix For: 2.0.0


*What*

Implement end-to-end tests for KinesisStreamsSink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33072) Implement end-to-end tests for AWS Kinesis Connectors

2023-09-11 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-33072:
---

 Summary: Implement end-to-end tests for AWS Kinesis Connectors
 Key: FLINK-33072
 URL: https://issues.apache.org/jira/browse/FLINK-33072
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / AWS
Reporter: Hong Liang Teoh
 Fix For: 2.0.0


*What*

We want to implement end-to-end tests that target real Kinesis Data Streams.

*Why*

This solidifies our testing to ensure we pick up any integration issues with 
Kinesis Data Streams API.

We especially want to test happy cases and failure cases to ensure those cases 
are handled as expected by the KDS connector.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33071) Log checkpoint statistics

2023-09-11 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-33071:
--

 Summary: Log checkpoint statistics 
 Key: FLINK-33071
 URL: https://issues.apache.org/jira/browse/FLINK-33071
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Checkpointing, Runtime / Metrics
Affects Versions: 1.18.0
Reporter: Piotr Nowojski
Assignee: Piotr Nowojski


This is a stop gap solution until we have a proper way of solving FLINK-23411.

The plan is to dump JSON serialised checkpoint statistics into Flink JM's log, 
with a {{DEBUG}} level. This could be used to analyse what has happened with a 
certain checkpoint in the past.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-09-11 Thread Jim Hughes
Hi Jing and Jark!

I can definitely appreciate the desire to have fewer configurations.

Do you have a suggested alternative for platform providers to limit or
restrict the hints that Bonnie is talking about?

As one possibility, maybe one configuration could be set to control all
hints.

Cheers,

Jim

On Sat, Sep 9, 2023 at 6:16 AM Jark Wu  wrote:

> I agree with Jing,
>
> My biggest concern is this makes the boundary of adding an option very
> unclear.
> It's not a strong reason to add a config just because of it doesn't affect
> existing
> users. Does this mean that in the future we might add an option to disable
> each feature?
>
> Flink already has a very long list of configurations [1][2] and this is
> very scary
> and not easy to use. We should try to remove the unnecessary configuration
> from
> the list in Flink 2.0. However, from my perspective, adding this option
> makes us far
> away from this direction.
>
> Best,
> Jark
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/
>
> On Sat, 9 Sept 2023 at 17:33, Jing Ge  wrote:
>
> > Hi,
> >
> > Thanks for bringing this to our attention. At the first glance, it looks
> > reasonable to offer a new configuration to enable/disable SQL hints
> > globally. However, IMHO, it is not the right timing to do it now, because
> > we should not only think as platform providers but also as end users(the
> > number of end users are much bigger than platform providers):
> >
> > 1. Users don't need it because users have the choice to use hints or not,
> > just like Jark pointed out. With this configuration, there will be a
> fight
> > between platform providers and users which will cause more confusions and
> > conflicts. And users will probably win, IMHO, because they are the end
> > customers that use Flink to create business values.
> > 2. SQL hints could be considered as an additional feature for users to
> > control, to optimize the execution plan without touching the internal
> > logic, i.e. features for advanced use cases and i.e. don't use it if you
> > don't understand it.
> > 3. Before the system is smart enough to take over(where we are now,
> > fortunately and unfortunately :-))), there should be a way for users to
> do
> > such tuning, even if it is a temporary phase from a
> > long-term's perspective, i.e. just because it is a temporary solution,
> does
> > not mean it is not necessary for now.
> > 4. What if users write wrong hints? Well, the code review process is
> > recommended. Someone who truly understands hints should double check it
> > before hints are merged to the master or submitted to the production env.
> > Just like a common software development process.
> >
> > Just my two cents.
> >
> > Best regards,
> > Jing
> >
> > On Thu, Sep 7, 2023 at 10:02 PM Bonnie Arogyam Varghese
> >  wrote:
> >
> > > Hi Liu,
> > >  The default will be set to enabled which is the current behavior. The
> > > option will allow users/platform providers to disable it if they want
> to.
> > >
> > > On Wed, Sep 6, 2023 at 6:39 PM liu ron  wrote:
> > >
> > > > Hi, Boonie
> > > >
> > > > I'm with Jark on why disable hint is needed if it won't affect
> > security.
> > > If
> > > > users don't need to use hint, then they won't care about it and I
> don't
> > > > think it's going to be a nuisance. On top of that, Lookup Join Hint
> is
> > > very
> > > > useful for streaming jobs, and disabling the hint would result in
> users
> > > not
> > > > being able to use it.
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Bonnie Arogyam Varghese 
> 于2023年9月6日周三
> > > > 23:52写道:
> > > >
> > > > > Hi Liu Ron,
> > > > >  To answer your question,
> > > > >Security might not be the main reason for disabling this option
> > but
> > > > > other arguments brought forward by Timo. Let me know if you have
> any
> > > > > further questions or concerns.
> > > > >
> > > > > On Tue, Sep 5, 2023 at 9:35 PM Bonnie Arogyam Varghese <
> > > > > bvargh...@confluent.io> wrote:
> > > > >
> > > > > > It looks like it will be nice to have a config to disable hints.
> > Any
> > > > > other
> > > > > > thoughts/concerns before we can close this discussion?
> > > > > >
> > > > > > On Fri, Aug 18, 2023 at 7:43 AM Timo Walther  >
> > > > wrote:
> > > > > >
> > > > > >>  > lots of the streaming SQL syntax are extensions of SQL
> standard
> > > > > >>
> > > > > >> That is true. But hints are kind of a special case because they
> > are
> > > > not
> > > > > >> even "part of Flink SQL" that's why they are written in a
> comment
> > > > > syntax.
> > > > > >>
> > > > > >> Anyway, I feel hints could be sometimes confusing for users
> > because
> > > > most
> > > > > >> of them have no effect for streaming and long-term we could also
> > set
> > > > > >> some hints via the CompiledPlan. And if you have multiple teams,
> > > > > >> non-skilled users should not play around with hints and leave

[jira] [Created] (FLINK-33070) Add doc for 'unnest'

2023-09-11 Thread Feng Jin (Jira)
Feng Jin created FLINK-33070:


 Summary: Add doc for 'unnest' 
 Key: FLINK-33070
 URL: https://issues.apache.org/jira/browse/FLINK-33070
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Runtime
Reporter: Feng Jin


Row and column transformation is a commonly used approach. In Flink SQL, we can 
use unnest for this purpose.

However, the usage and support of unnest are not explained in the documentation.

 

 I think we can at least add it to the built-in functions section 
(https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/#scalar-functions)
 , or we provide some examples. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-hbase v3.0.0, release candidate 2

2023-09-11 Thread Danny Cranmer
Hey Martijn, thanks for picking this up.

+1 (binding)

- Release notes look good
- Sigs and checksums look good for source archive and Maven repo
- Verified there are no binaries in the source archive
- Tag exists in Github
- Reviewed website PR
- Contents of maven repo looks complete
- Source archive builds with Maven
- Reviewed NOTICE Files
  - Note: Copyright header year needs updating

Thanks,
Danny

On Tue, Sep 5, 2023 at 11:36 AM Ferenc Csaky 
wrote:

> Hi,
>
> Thanks Martijn for initiating the release!
>
> +1 (non-binding)
>
> - checked signatures and checksums
> - checked source has no binaries
> - checked LICENSE and NOTICE files
> - approved web PR
>
> Cheers,
> Ferenc
>
>
>
>
> --- Original Message ---
> On Monday, September 4th, 2023 at 12:54, Samrat Deb 
> wrote:
>
>
> >
> >
> > Hi,
> >
> > +1 (non-binding)
> >
> > Verified NOTICE files
> > Verified CheckSum and signatures
> > Glanced through PR[1] , Looks good to me
> >
> > Bests,
> > Samrat
> >
> > [1]https://github.com/apache/flink-web/pull/591
> >
> >
> > > On 04-Sep-2023, at 2:22 PM, Ahmed Hamdy hamdy10...@gmail.com wrote:
> > >
> > > Hi Martijn,
> > > +1 (non-binding)
> > >
> > > - verified Checksums and signatures
> > > - no binaries in source
> > > - Checked NOTICE files contains migrated artifacts
> > > - tag is correct
> > > - Approved Web PR
> > >
> > > Best Regards
> > > Ahmed Hamdy
> > >
> > > On Fri, 1 Sept 2023 at 15:35, Martijn Visser martijnvis...@apache.org
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the release candidate #2 for the version
> 3.0.0,
> > > > as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > > * JIRA release notes [1],
> > > > * the official Apache source release to be deployed to
> dist.apache.org
> > > > [2],
> > > > which are signed with the key with fingerprint
> > > > A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
> > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > * source code tag v3.0.0-rc2 [5],
> > > > * website pull request listing the new release [6].
> > > >
> > > > This replaces the old, cancelled vote of RC1 [7]. This version is the
> > > > externalized version which is compatible with Flink 1.16 and 1.17.
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Release Manager
> > > >
> > > > [1]
> > > >
> > > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352578
> > > > [2]
> > > >
> > > >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-hbase-3.0.0-rc2
> > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1650
> > > > [5]
> > > >
> https://github.com/apache/flink-connector-hbase/releases/tag/v3.0.0-rc2
> > > > [6] https://github.com/apache/flink-web/pull/591
> > > > [7] https://lists.apache.org/thread/wbl6sc86q9s5mmz5slx4z09svh91cpr0
>


Re: [VOTE] Apache Flink Stateful Functions Release 3.3.0, release candidate #1

2023-09-11 Thread Danny Cranmer
Hey Martijn, thanks for driving this.

-1 (binding) due to the concerns highlighted below

- NOTICE files are present
  - Note: The copyright year is out of data (2020)
  - Concern: we bundle AnchorJS (MIT) v3.1.0 and this is not listed in the
NOTICE file
  - Concern: "statefun-sdk-java" bundles
"com.google.auto.service:auto-service-annotations:jar:1.0-rc6" but does not
declare it in the NOTICE
  - Concern: "statefun-flink-distribution"
- bundles "org.apache.kafka:kafka-clients:3.2.3" but declares
"org.apache.kafka:kafka-clients:2.4.1"
- bundles "com.github.luben:zstd-jni:1.5.2-1" but declares
"com.github.luben:zstd-jni:1.4.3-1"
- bundles "com.fasterxml.jackson.core:jackson-core:2.13.4" but declares
"com.fasterxml.jackson.core:jackson-core:2.12.1"
- bundles "com.fasterxml.jackson.core:jackson-annotations:2.13.4" but
declares "com.fasterxml.jackson.core:jackson-annotations:2.12.1"
- bundles "com.fasterxml.jackson.core:jackson-databind:2.13.4.2" but
declares "com.fasterxml.jackson.core:jackson-databind:2.12.1"
- bundles
"com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:2.13.4" but
declares "com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:2.12.1"
- bundles "commons-io:commons-io:jar:2.11.0" but declares
"commons-io:commons-io:jar:2.8.0"
- bundles "commons-codec:commons-codec:1.15" but declares
"commons-codec:commons-codec:1.13"
- bundles "com.esotericsoftware.minlog:minlog:1.2" but does not declare
it
- bundles "com.ibm.icu:icu4j:jar:67.1" but does not declare it
- bundles "org.objenesis:objenesis:jar:2.1" but does not declare it
- bundles "com.esotericsoftware.kryo:kryo:2.24.0" but does not declare
it
- bundles "commons-collections:commons-collections:3.2.2" but does not
declare it
- bundles "org.apache.commons:commons-compress:1.21" but does not
declare it
- Verified nexus repo contents are similar to 3.2.0
- Sigs and checksums of dist look good
- Verified there are no binaries in the source archive
- Maven build+tests complete successfully
- Reviewed the Docker PR
- Reviewed website PR
- Release notes look good
- Tag is present in Github
- LICENSE file is present
- Source files include Apache copyright header
  - Note: Github workflow configuration files do not, however there is an
explicit exclude in the pom, so assuming this is intentional

Thanks,
Danny

On Fri, Sep 8, 2023 at 12:53 PM Martijn Visser 
wrote:

> Hi everyone,
>
> Please review [1] and vote on the release candidate #1 for the version
> 3.3.0 of Apache Flink Stateful Functions,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Stateful Functions canonical source distribution, to be deployed to the
> release repository at dist.apache.org
> b) Stateful Functions Python SDK distributions to be deployed to PyPI
> c) Maven artifacts to be deployed to the Maven Central Repository
> d) Dockerfiles for new images to be deployed to Docker Hub
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a) and b) can be found in the corresponding dev
> repository at dist.apache.org [2]
> * All artifacts for c) can be found at the Apache Nexus Repository [3]
> * PR for new Dockerfiles for this release [4]
>
> All artifacts are signed with the key with fingerprint
> A5F3BCE4CBE993573EC5966A65321B8382B219AF [5]
>
> Other links for your review:
> * JIRA release notes [6]
> * source code tag "release-3.3.0-rc1" [7]
> * PR to update the website Downloads page to include Stateful Functions
> links [8]
>
> **Vote Duration**
>
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release manager
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Stateful+Functions+Release
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-statefun-3.3.0-rc1/
> [3]
> https://repository.apache.org/content/repositories/orgapacheflink-1652/
> [4] https://github.com/apache/flink-statefun-docker/pull/20
> [5] https://dist.apache.org/repos/dist/release/flink/KEYS
> [6]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351276
> [7] https://github.com/apache/flink-statefun/tree/release-3.3.0-rc1
> [8] https://github.com/apache/flink-web/pull/674
>


Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-11 Thread Jark Wu
Hi Dong,

Please see my comments inline.

>  As a result, the proposed job-level
> config will be applied only in the changelog stage. So there is no
> difference between these two approaches in this particular case, right?

How the job-level config can be applied ONLY in the changelog stage?
I think it is only possible if it is implemented by the CDC source itself,
because the framework doesn't know which stage of the source is.
Know that the CDC source may emit watermarks with a very small lag
in the snapshot stage, and the job-level config may turn the backlog
status into false.

> On the other hand, per-source config will be necessary if users want to
> apply different watermark lag thresholds for different sources in the same
> job.

We also have different watermark delay definitions for each source,
I think this's also reasonable and necessary to have different watermark
lags.


> Each source can have its own rule that specifies when the backlog can be
true
> (e.g. MySql CDC says the backlog should be true during the snapshot
stage).
> And we can have a job-level config that specifies when the backlog should
> be true. Note that it is designed in such a way that none of these rules
> specify when the backlog should be false. That is why there is no conflict
> by definition.

IIUC, FLIP-309 provides `setIsProcessingBacklog` to specify when the backlog
is true and when is FALSE. This conflicts with the job-level config as it
will turn
the status into true.

> If I understand your comments correctly, you mean that we might have a
> Flink SQL DDL with user-defined watermark expressions. And users also want
> to set the backlog to true if the watermark generated by that
> user-specified expression exceeds a threshold.

No. I mean the source may not support generating watermarks, so the
watermark
expression is applied in a following operator (instead of in the source
operator).
This will result in the watermark lag doesn't work in this case and confuse
users.

> You are right that this is a limitation. However, this is only a
short-term
> limitation which we added to make sure that we can focus on the capability
> to switch from backlog=true to backlog=false. In the future, we will
remove
> this limitation and also support switching from backlog=false to
> backlog=true.

I can understand it may be difficult to support runtime mode switching back
and forth.
However, I think this should be a limitation of FLIP-327, not FLIP-328.
IIUC,
FLIP-309 doesn't have this limitation, right? I just don't understand
what's the
challenge to switch a flag?

Best,
Jark


On Sun, 10 Sept 2023 at 19:44, Dong Lin  wrote:

> Hi Jark,
>
> Thanks for the comments. Please see my comments inline.
>
> On Sat, Sep 9, 2023 at 4:13 PM Jark Wu  wrote:
>
> > Hi Xuannan,
> >
> > I leave my comments inline.
> >
> > > In the case where a user wants to
> > > use a CDC source and also determine backlog status based on watermark
> > > lag, we still need to define the rule when that occurs
> >
> > This rule should be defined by the source itself (who knows backlog
> best),
> > not by the framework. In the case of CDC source, it reports
> isBacklog=true
> > during snapshot stage, and report isBacklog=false during changelog stage
> if
> > watermark-lag is within the threshold.
> >
>
> I am not sure I fully understand the difference between adding a job-level
> config vs. adding a per-source config.
>
> In the case of CDC, its watermark lag should be either unde-defined or
> really large in the snapshot stage. As a result, the proposed job-level
> config will be applied only in the changelog stage. So there is no
> difference between these two approaches in this particular case, right?
>
> There are two advantages of the job-level config over per-source config:
>
> 1) Configuration is simpler. For example, suppose a user has a Flink job
> that consumes records from multiple Kafka sources and wants to determine
> backlog status for these Kafka sources using the same watermark lag
> threshold, there is no need for users to repeatedly specify this threshold
> for each source.
>
> 2) There is a smaller number of public APIs overall. In particular, instead
> of repeatedly adding a setProcessingBacklogWatermarkLagThreshold() API for
> every source operator that has even-time watermark lag defined, we only
> need to add one job-level config. Less public API means better simplicity
> and maintainability in general.
>
> On the other hand, per-source config will be necessary if users want to
> apply different watermark lag thresholds for different sources in the same
> job. Personally, I find this a bit counter-intuitive for users to specify
> different watermark lag thresholds in the same job.
>
> Do you think there is any real-word use-case that requires this? Could you
> provide a specific use-case where per-source config can provide an
> advantage over the job-level config?
>
>
> > I think it's not intuitive to combine it with the logical OR operation.

Re: [DISCUSS] Flink annotation strategy/consensus

2023-09-11 Thread Jing Ge
Hi Shammon,

Thanks for asking. @PublicEvolving is actually a very useful design that
developers can offer APIs as publicly accessible but still have changes to
introduce modifications afterwards between minor releases. Compared to it,
all @Public APIs can only be changed between major releases.
Beyond the FLIP-197 mentioned by Becket, you might want to check
FLIP-196[1] for further details.

Best regards,
Jing


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees


On Mon, Sep 11, 2023 at 9:46 AM Becket Qin  wrote:

> Hi Shammon,
>
> Thanks for the reply.
>
> Do we really need to have `@Internal` methods in an `@Public` interface or
> > class? In general, if a class or interface is marked as `@Public `, it is
> > better that their public methods should also be `@Public`, because even
> if
> > marked as `@Internal`, users are not aware of it when using it, which
> could
> > be strange.
>
> It is more like a legacy issue that the public and internal usage share the
> same concrete class. e.g. DataStream.getId() is for internal usage, but
> happens to be in DataStream which is a public class. This should be avoided
> in the future. It is a good practice to create separate interfaces should
> be created for the users in this case.
>
> Regarding the API stability promotion, you may want to check the
> FLIP-197[1].
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process
>
> On Mon, Sep 11, 2023 at 11:43 AM Shammon FY  wrote:
>
> > Thanks Jing for starting this discussion.
> >
> > For @Becket
> > > 1. All the public methods / classes / interfaces MUST be annotated with
> > one of the @Experimental / @PublicEvolving / @Public. In practice, all
> the
> > methods by default inherit the annotation from the containing class,
> unless
> > annotated otherwise. e.g. an @Internal method in a @Public class.
> >
> > Do we really need to have `@Internal` methods in an `@Public` interface
> or
> > class? In general, if a class or interface is marked as `@Public `, it is
> > better that their public methods should also be `@Public`, because even
> if
> > marked as `@Internal`, users are not aware of it when using it, which
> could
> > be strange.
> >
> > @Jing Besides `@Internal`, I have some cents about `@PublicEvolving` and
> > `@Public`. Currently when we add an interface which will be used by
> > external systems, we often annotate it as `@PublicEvolving`. But when
> > should this interface be marked as `@Public`? I find it is difficult to
> > determine this. Is `@PublicEvolving` really necessary? Should we directly
> > remove `@PublicEvolving` and use `@Public` instead? I think it would be
> > simpler.
> >
> > Best,
> > Shammon FY
> >
> >
> > On Mon, Sep 11, 2023 at 11:05 AM Becket Qin 
> wrote:
> >
> > > Hi Jing,
> > >
> > > Thanks for bringing up the discussion. My two cents:
> > >
> > > 1. All the public methods / classes / interfaces MUST be annotated with
> > one
> > > of the @Experimental / @PublicEvolving / @Public. In practice, all the
> > > methods by default inherit the annotation from the containing class,
> > unless
> > > annotated otherwise. e.g. an @Internal method in a @Public class.
> > >
> > > 2. I agree it would be too verbose to annotate every internal method /
> > > class / interface. Currently we already treat the methods / interfaces
> /
> > > classes without annotations as effectively @Internal.
> > >
> > > 3. Per our discussion in the other thread, @Deprecated SHOULD coexist
> > with
> > > one of the @Experimental / @PublicEvolving / @Public. In that
> > > case, @Deprecated overrides the other annotation, which means that
> public
> > > API will not evolve and will be removed according to the deprecation
> > > process.
> > >
> > > 4. The internal methods / classes / interfaces SHOULD NOT be marked as
> > > deprecated. Instead, an immediate refactor should be done to remove the
> > > "deprecated" internal methods / classes / interfaces, and migrate the
> > code
> > > to its successor. Otherwise, technical debts will build up.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > >
> > > On Sat, Sep 9, 2023 at 5:29 AM Jing Ge 
> > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > While I was joining the flink-avro enhancement and cleanup discussion
> > > > driven by Becket[1], I realized that there are some issues with the
> > > current
> > > > Flink API annotation usage in the source code.
> > > >
> > > > As far as I am concerned, Flink wants to control the
> access/visibility
> > of
> > > > APIs across modules and for downstreams. Since no OSGI is used(it
> > should
> > > > not be used because of its complexity, IMHO), Flink decided to use a
> > very
> > > > lightweight but manual solution: customized annotation like
> @Internal,
> > > > @Experimental, @PublicEvolving,
> > > > etc. This is a Flink only concept on top of JDK annotation and is
> > > 

Re: [DISCUSS] FLIP-307: Flink connector Redshift

2023-09-11 Thread Martijn Visser
Hi Samrat,

I'm still having doubts about the dependency on the JDBC connector. When a
user specifies 'read mode', it will use the JDBC connector under the hood.
Why not integrate Redshift then directly in the JDBC connector itself? It
removes the need for a dependency on the JDBC driver, especially keeping in
mind that this driver uses the old SourceFunction/SinkFunction interfaces
because it hasn't been migrated yet.

Best regards,

Martijn

On Mon, Sep 11, 2023 at 8:54 AM Samrat Deb  wrote:

> Hi Leonard,
>
> > Do we have to rely on the latest version of JDBC Connector here?
>
> No, there's no need for us to depend on the latest version of the JDBC
> Connector. Redshift has its dedicated JDBC driver [1], which includes
> custom modifications tailored to Redshift's specific implementation needs.
> This driver is the most suitable choice for our purposes.
>
>
> > Could you collect the APIs that Redshift generally needs to use?
>
> I am actively working on it and making progress towards creating the POC.
>
> Bests,
> Samrat
>
> [1]
>
> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
>
> On Mon, Sep 11, 2023 at 12:02 PM Samrat Deb  wrote:
>
> > Hello Danny,
> >
> > I wanted to express my gratitude for your valuable feedback and
> insightful
> > suggestions.
> >
> > I will be revising the FLIP to incorporate all of your queries and review
> > suggestions. Additionally, I plan to provide a Proof of Concept (POC) for
> > the connector by the end of this week. This POC will address the points
> > you've raised and ensure that the FLIP aligns with your recommendations.
> >
> > Thank you once again for your input.
> >
> > Bests,
> > Samrat
> >
> > On Thu, Sep 7, 2023 at 10:21 PM Danny Cranmer 
> > wrote:
> >
> >> Hello Leonard,
> >>
> >> > Do we have to rely on the latest version of JDBC Connector here? I
> >> understand that as long as the version of flink minor is the same as the
> >> JDBC Connector, Could you collect the APIs that Redshift generally needs
> >> to
> >> use?
> >>
> >> I agree we do not necessarily need to rely on the latest patch version,
> >> only the same minor. The main issue for me is the dependency introduces
> a
> >> blocker following a new Flink version. For example, when Flink 1.18.0 is
> >> released we cannot release the AWS connectors until the JDBC is
> complete.
> >> But I think this is a good tradeoff.
> >>
> >> > Splitting a separate redshift repository does not solve this coupling
> >> problem
> >>
> >> Arguably it solves the AWS<>JDBC coupling problem, but creates a new,
> more
> >> complex one!
> >>
> >> Thanks,
> >>
> >> On Thu, Sep 7, 2023 at 5:26 PM Leonard Xu  wrote:
> >>
> >> > Thanks Samrat and  Danny for driving this FLIP.
> >> >
> >> > >> an effective approach is to utilize the latest version of
> >> > flink-connector-jdbc
> >> > > as a Maven dependency
> >> > >
> >> > > When we have stable source/sink APIs and the connector versions are
> >> > > decoupled from Flink this makes sense. But right now this would mean
> >> that
> >> > > the JDBC connector will block the AWS connector for each new Flink
> >> > version
> >> > > support release (1.18, 1.19, 1.20, 2.0 etc). That being said, I
> cannot
> >> > > think of a cleaner alternative, without pulling the core JDBC bits
> out
> >> > into
> >> > > a dedicated project that is decoupled from and released
> independently
> >> of
> >> > > Flink. Splitting flink-connector-redshift into a dedicated repo
> would
> >> > > decouple AWS/JDBC, but obviously introduce a new connector that is
> >> > blocked
> >> > > by both AWS and JDBC.
> >> >
> >> > Do we have to rely on the latest version of JDBC Connector here? I
> >> > understand that as long as the version of flink minor is the same as
> the
> >> > JDBC Connector, Could you collect the APIs that Redshift generally
> >> needs to
> >> > use?
> >> >
> >> > Assuming that AWS Connector(Redshift) depends on JDBC Connector and
> >> wants
> >> > a higher version of JDBC Connector, I understand that the correct
> >> approach
> >> > is to promote the release of JDBC Connector and looks like we have no
> >> more
> >> > options.
> >> >
> >> > Splitting a separate redshift repository does not solve this coupling
> >> > problem, from a user perspective, redshift should also be in the AWS
> >> > Connector repo.
> >> >
> >> > Best,
> >> > Leonard
> >>
> >
>


Re: [DISCUSS] Flink annotation strategy/consensus

2023-09-11 Thread Becket Qin
Hi Shammon,

Thanks for the reply.

Do we really need to have `@Internal` methods in an `@Public` interface or
> class? In general, if a class or interface is marked as `@Public `, it is
> better that their public methods should also be `@Public`, because even if
> marked as `@Internal`, users are not aware of it when using it, which could
> be strange.

It is more like a legacy issue that the public and internal usage share the
same concrete class. e.g. DataStream.getId() is for internal usage, but
happens to be in DataStream which is a public class. This should be avoided
in the future. It is a good practice to create separate interfaces should
be created for the users in this case.

Regarding the API stability promotion, you may want to check the
FLIP-197[1].

Thanks,

Jiangjie (Becket) Qin

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process

On Mon, Sep 11, 2023 at 11:43 AM Shammon FY  wrote:

> Thanks Jing for starting this discussion.
>
> For @Becket
> > 1. All the public methods / classes / interfaces MUST be annotated with
> one of the @Experimental / @PublicEvolving / @Public. In practice, all the
> methods by default inherit the annotation from the containing class, unless
> annotated otherwise. e.g. an @Internal method in a @Public class.
>
> Do we really need to have `@Internal` methods in an `@Public` interface or
> class? In general, if a class or interface is marked as `@Public `, it is
> better that their public methods should also be `@Public`, because even if
> marked as `@Internal`, users are not aware of it when using it, which could
> be strange.
>
> @Jing Besides `@Internal`, I have some cents about `@PublicEvolving` and
> `@Public`. Currently when we add an interface which will be used by
> external systems, we often annotate it as `@PublicEvolving`. But when
> should this interface be marked as `@Public`? I find it is difficult to
> determine this. Is `@PublicEvolving` really necessary? Should we directly
> remove `@PublicEvolving` and use `@Public` instead? I think it would be
> simpler.
>
> Best,
> Shammon FY
>
>
> On Mon, Sep 11, 2023 at 11:05 AM Becket Qin  wrote:
>
> > Hi Jing,
> >
> > Thanks for bringing up the discussion. My two cents:
> >
> > 1. All the public methods / classes / interfaces MUST be annotated with
> one
> > of the @Experimental / @PublicEvolving / @Public. In practice, all the
> > methods by default inherit the annotation from the containing class,
> unless
> > annotated otherwise. e.g. an @Internal method in a @Public class.
> >
> > 2. I agree it would be too verbose to annotate every internal method /
> > class / interface. Currently we already treat the methods / interfaces /
> > classes without annotations as effectively @Internal.
> >
> > 3. Per our discussion in the other thread, @Deprecated SHOULD coexist
> with
> > one of the @Experimental / @PublicEvolving / @Public. In that
> > case, @Deprecated overrides the other annotation, which means that public
> > API will not evolve and will be removed according to the deprecation
> > process.
> >
> > 4. The internal methods / classes / interfaces SHOULD NOT be marked as
> > deprecated. Instead, an immediate refactor should be done to remove the
> > "deprecated" internal methods / classes / interfaces, and migrate the
> code
> > to its successor. Otherwise, technical debts will build up.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Sat, Sep 9, 2023 at 5:29 AM Jing Ge 
> wrote:
> >
> > > Hi devs,
> > >
> > > While I was joining the flink-avro enhancement and cleanup discussion
> > > driven by Becket[1], I realized that there are some issues with the
> > current
> > > Flink API annotation usage in the source code.
> > >
> > > As far as I am concerned, Flink wants to control the access/visibility
> of
> > > APIs across modules and for downstreams. Since no OSGI is used(it
> should
> > > not be used because of its complexity, IMHO), Flink decided to use a
> very
> > > lightweight but manual solution: customized annotation like @Internal,
> > > @Experimental, @PublicEvolving,
> > > etc. This is a Flink only concept on top of JDK annotation and is
> > therefore
> > > orthogonal to @Deprecated or any other annotations offered by JDK.
> After
> > > this concept has been used, APIs without one of these annotations are
> in
> > > the kind of gray area which means they have no contract in the context
> of
> > > this new concept. Without any given metadata they could be considered
> > > as @Internal or @Experimental, because changes are allowed to be
> applied
> > at
> > > any time. But there is no clear definition and therefore different
> people
> > > will understand it differently.
> > >
> > > There are two options to improve it, as far as I could figure out:
> > >
> > > option 1: All APIs must have one of those annotations. We should put
> some
> > > effort into going through all source code and add missing annotations.
> > > There were 

Re: [DISCUSS] FLIP-307: Flink connector Redshift

2023-09-11 Thread Samrat Deb
Hi Leonard,

> Do we have to rely on the latest version of JDBC Connector here?

No, there's no need for us to depend on the latest version of the JDBC
Connector. Redshift has its dedicated JDBC driver [1], which includes
custom modifications tailored to Redshift's specific implementation needs.
This driver is the most suitable choice for our purposes.


> Could you collect the APIs that Redshift generally needs to use?

I am actively working on it and making progress towards creating the POC.

Bests,
Samrat

[1]
https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html

On Mon, Sep 11, 2023 at 12:02 PM Samrat Deb  wrote:

> Hello Danny,
>
> I wanted to express my gratitude for your valuable feedback and insightful
> suggestions.
>
> I will be revising the FLIP to incorporate all of your queries and review
> suggestions. Additionally, I plan to provide a Proof of Concept (POC) for
> the connector by the end of this week. This POC will address the points
> you've raised and ensure that the FLIP aligns with your recommendations.
>
> Thank you once again for your input.
>
> Bests,
> Samrat
>
> On Thu, Sep 7, 2023 at 10:21 PM Danny Cranmer 
> wrote:
>
>> Hello Leonard,
>>
>> > Do we have to rely on the latest version of JDBC Connector here? I
>> understand that as long as the version of flink minor is the same as the
>> JDBC Connector, Could you collect the APIs that Redshift generally needs
>> to
>> use?
>>
>> I agree we do not necessarily need to rely on the latest patch version,
>> only the same minor. The main issue for me is the dependency introduces a
>> blocker following a new Flink version. For example, when Flink 1.18.0 is
>> released we cannot release the AWS connectors until the JDBC is complete.
>> But I think this is a good tradeoff.
>>
>> > Splitting a separate redshift repository does not solve this coupling
>> problem
>>
>> Arguably it solves the AWS<>JDBC coupling problem, but creates a new, more
>> complex one!
>>
>> Thanks,
>>
>> On Thu, Sep 7, 2023 at 5:26 PM Leonard Xu  wrote:
>>
>> > Thanks Samrat and  Danny for driving this FLIP.
>> >
>> > >> an effective approach is to utilize the latest version of
>> > flink-connector-jdbc
>> > > as a Maven dependency
>> > >
>> > > When we have stable source/sink APIs and the connector versions are
>> > > decoupled from Flink this makes sense. But right now this would mean
>> that
>> > > the JDBC connector will block the AWS connector for each new Flink
>> > version
>> > > support release (1.18, 1.19, 1.20, 2.0 etc). That being said, I cannot
>> > > think of a cleaner alternative, without pulling the core JDBC bits out
>> > into
>> > > a dedicated project that is decoupled from and released independently
>> of
>> > > Flink. Splitting flink-connector-redshift into a dedicated repo would
>> > > decouple AWS/JDBC, but obviously introduce a new connector that is
>> > blocked
>> > > by both AWS and JDBC.
>> >
>> > Do we have to rely on the latest version of JDBC Connector here? I
>> > understand that as long as the version of flink minor is the same as the
>> > JDBC Connector, Could you collect the APIs that Redshift generally
>> needs to
>> > use?
>> >
>> > Assuming that AWS Connector(Redshift) depends on JDBC Connector and
>> wants
>> > a higher version of JDBC Connector, I understand that the correct
>> approach
>> > is to promote the release of JDBC Connector and looks like we have no
>> more
>> > options.
>> >
>> > Splitting a separate redshift repository does not solve this coupling
>> > problem, from a user perspective, redshift should also be in the AWS
>> > Connector repo.
>> >
>> > Best,
>> > Leonard
>>
>


Re: [DISCUSS] FLIP-307: Flink connector Redshift

2023-09-11 Thread Samrat Deb
Hello Danny,

I wanted to express my gratitude for your valuable feedback and insightful
suggestions.

I will be revising the FLIP to incorporate all of your queries and review
suggestions. Additionally, I plan to provide a Proof of Concept (POC) for
the connector by the end of this week. This POC will address the points
you've raised and ensure that the FLIP aligns with your recommendations.

Thank you once again for your input.

Bests,
Samrat

On Thu, Sep 7, 2023 at 10:21 PM Danny Cranmer 
wrote:

> Hello Leonard,
>
> > Do we have to rely on the latest version of JDBC Connector here? I
> understand that as long as the version of flink minor is the same as the
> JDBC Connector, Could you collect the APIs that Redshift generally needs to
> use?
>
> I agree we do not necessarily need to rely on the latest patch version,
> only the same minor. The main issue for me is the dependency introduces a
> blocker following a new Flink version. For example, when Flink 1.18.0 is
> released we cannot release the AWS connectors until the JDBC is complete.
> But I think this is a good tradeoff.
>
> > Splitting a separate redshift repository does not solve this coupling
> problem
>
> Arguably it solves the AWS<>JDBC coupling problem, but creates a new, more
> complex one!
>
> Thanks,
>
> On Thu, Sep 7, 2023 at 5:26 PM Leonard Xu  wrote:
>
> > Thanks Samrat and  Danny for driving this FLIP.
> >
> > >> an effective approach is to utilize the latest version of
> > flink-connector-jdbc
> > > as a Maven dependency
> > >
> > > When we have stable source/sink APIs and the connector versions are
> > > decoupled from Flink this makes sense. But right now this would mean
> that
> > > the JDBC connector will block the AWS connector for each new Flink
> > version
> > > support release (1.18, 1.19, 1.20, 2.0 etc). That being said, I cannot
> > > think of a cleaner alternative, without pulling the core JDBC bits out
> > into
> > > a dedicated project that is decoupled from and released independently
> of
> > > Flink. Splitting flink-connector-redshift into a dedicated repo would
> > > decouple AWS/JDBC, but obviously introduce a new connector that is
> > blocked
> > > by both AWS and JDBC.
> >
> > Do we have to rely on the latest version of JDBC Connector here? I
> > understand that as long as the version of flink minor is the same as the
> > JDBC Connector, Could you collect the APIs that Redshift generally needs
> to
> > use?
> >
> > Assuming that AWS Connector(Redshift) depends on JDBC Connector and wants
> > a higher version of JDBC Connector, I understand that the correct
> approach
> > is to promote the release of JDBC Connector and looks like we have no
> more
> > options.
> >
> > Splitting a separate redshift repository does not solve this coupling
> > problem, from a user perspective, redshift should also be in the AWS
> > Connector repo.
> >
> > Best,
> > Leonard
>