Re: [ANNOUNCE] Release 1.18.0, release candidate #1

2023-10-05 Thread Sergey Nuyanzin
Thanks for creating RC1

* Downloaded artifacts
* Built from sources
* Verified checksums and gpg signatures
* Verified versions in pom files
* Checked NOTICE, LICENSE files

The strange thing I faced is
CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished
fails on AZP [1]

which looks like it is related to [2], [3] fixed  in 1.18.0 (not 100%
sure).


[1] https://issues.apache.org/jira/browse/FLINK-33186
[2] https://issues.apache.org/jira/browse/FLINK-32996
[3] https://issues.apache.org/jira/browse/FLINK-32907

On Tue, Oct 3, 2023 at 2:53 PM Ferenc Csaky 
wrote:

> Thanks everyone for the efforts!
>
> Checked the following:
>
> - Downloaded artifacts
> - Built Flink from source
> - Verified checksums/signatures
> - Verified NOTICE, LICENSE files
> - Deployed dummy SELECT job via SQL gateway on standalone cluster, things
> seemed fine according to the log files
>
> +1 (non-binding)
>
> Best,
> Ferenc
>
>
> --- Original Message ---
> On Friday, September 29th, 2023 at 22:12, Gabor Somogyi <
> gabor.g.somo...@gmail.com> wrote:
>
>
> >
> >
> > Thanks for the efforts!
> >
> > +1 (non-binding)
> >
> > * Verified versions in the poms
> > * Built from source
> > * Verified checksums and signatures
> > * Started basic workloads with kubernetes operator
> > * Verified NOTICE and LICENSE files
> >
> > G
> >
> > On Fri, Sep 29, 2023, 18:16 Matthias Pohl matthias.p...@aiven.io.invalid
> >
> > wrote:
> >
> > > Thanks for creating RC1. I did the following checks:
> > >
> > > * Downloaded artifacts
> > > * Built Flink from sources
> > > * Verified SHA512 checksums GPG signatures
> > > * Compared checkout with provided sources
> > > * Verified pom file versions
> > > * Went over NOTICE file/pom files changes without finding anything
> > > suspicious
> > > * Deployed standalone session cluster and ran WordCount example in
> batch
> > > and streaming: Nothing suspicious in log files found
> > >
> > > +1 (binding)
> > >
> > > On Fri, Sep 29, 2023 at 10:34 AM Etienne Chauchot echauc...@apache.org
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thanks to the team for this RC.
> > > >
> > > > I did a quick check of this RC against user pipelines (1) coded with
> > > > DataSet (even if deprecated and soon removed), DataStream and SQL
> APIs
> > > >
> > > > based on the small scope of this test, LGTM
> > > >
> > > > +1 (non-binding)
> > > >
> > > > [1] https://github.com/echauchot/tpcds-benchmark-flink
> > > >
> > > > Best
> > > > Etienne
> > > >
> > > > Le 28/09/2023 à 19:35, Jing Ge a écrit :
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > The RC1 for Apache Flink 1.18.0 has been created. The related
> voting
> > > > > process will be triggered once the announcement is ready. The RC1
> has
> > > > > all
> > > > > the artifacts that we would typically have for a release, except
> for
> > > > > the
> > > > > release note and the website pull request for the release
> announcement.
> > > > >
> > > > > The following contents are available for your review:
> > > > >
> > > > > - Confirmation of no benchmarks regression at the thread[1].
> > > > > - The preview source release and binary convenience releases [2],
> which
> > > > > are signed with the key with fingerprint 96AE0E32CBE6E0753CE6 [3].
> > > > > - all artifacts that would normally be deployed to the Maven
> > > > > Central Repository [4].
> > > > > - source code tag "release-1.18.0-rc1" [5]
> > > > >
> > > > > Your help testing the release will be greatly appreciated! And
> we'll
> > > > > create the rc1 release and the voting thread as soon as all the
> efforts
> > > > > are
> > > > > finished.
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/yxyphglwwvq57wcqlfrnk3qo9t3sr2ro
> > > > > [2]https://dist.apache.org/repos/dist/dev/flink/flink-1.18.0-rc1/
> > > > > [3]https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > [4]
> > > > >
> https://repository.apache.org/content/repositories/orgapacheflink-1657
> > > > > [5]https://github.com/apache/flink/releases/tag/release-1.18.0-rc1
> > > > >
> > > > > Best regards,
> > > > > Qingsheng, Sergei, Konstantin and Jing
>


-- 
Best regards,
Sergey


Re: [ANNOUNCE] Release 1.18.0, release candidate #0

2023-10-05 Thread Martijn Visser
Hi David,

It’s a deliberate choice to decouple the connectors. We shouldn’t block
Flink 1.18 on connector statuses. There’s already work being done to fix
the Flink Kafka connector. Any Flink connector comes after the new minor
version, similar to how it has been for all other connectors with Flink
1.17.

Best regards,

Martijn Visser

Op do 5 okt 2023 om 11:33 schreef David Radley 

> Hi Jing,
> Yes I agree that if we can get them resolved then that would be ideal.
>
> I guess the worry is that at 1.17, we had a released Flink core and Kafka
> connector.
> At 1.18 we will have a released Core Flink but no new Kafka connector. So
> the last released Kafka connector would now be
> https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka/3.0.0-1.17
> which should be the same as the Kafka connector in 1.17. I guess this is
> the combination that people would pick up to deploy in production – and I
> assume this has been tested.
>
> This issues with the nightly builds refers to kafka connector main
> branch.  If they are not regressions, you are suggesting that pragmatically
> we go forward with the release; I think that makes sense to do, but do
> these issues effect 3.0.0.-1.117.
>
> I suspect we should release a new Kafka connector asap, so we have a
> matching connector built outside of the Flink repo. We may want to not
> include the Flink core version in the connector – or we might end up
> wanting to release a Kafka connector when there are no changes just to have
> a match with the Flink core version.
>
> Kind regards, David.
>
>
>
> From: Jing Ge 
> Date: Wednesday, 4 October 2023 at 17:36
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [ANNOUNCE] Release 1.18.0, release candidate #0
> Hi David,
>
> First of all, we should have enough time to wait for those issues to
> be resolved. Secondly, it makes less sense to block upstream release by
> downstream build issues. In case, those issues might need more time, we
> should move forward with the Flink release without waiting for them. WDYT?
>
> Best regards,
> Jing
>
> On Wed, Oct 4, 2023 at 6:15 PM David Radley 
> wrote:
>
> > Hi ,
> > As release 1.18 removes  the kafka connector from the core Flink
> > repository, I assume we will wait until the kafka connector nightly build
> > issues https://issues.apache.org/jira/browse/FLINK-33104  and
> > https://issues.apache.org/jira/browse/FLINK-33017  are resolved before
> > releasing 1.18?
> >
> >  Kind regards, David.
> >
> >
> > From: Jing Ge 
> > Date: Wednesday, 27 September 2023 at 15:11
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] Re: [ANNOUNCE] Release 1.18.0, release candidate #0
> > Hi Folks,
> >
> > @Ryan FYI: CI passed and the PR has been merged. Thanks!
> >
> > If there are no more other concerns, I will start publishing 1.18-rc1.
> >
> > Best regards,
> > Jing
> >
> > On Mon, Sep 25, 2023 at 1:40 PM Jing Ge  wrote:
> >
> > > Hi Ryan,
> > >
> > > Thanks for reaching out. It is fine to include it but we need to wait
> > > until the CI passes. I am not sure how long it will take, since there
> > seems
> > > to be some infra issues.
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Mon, Sep 25, 2023 at 11:34 AM Ryan Skraba
> > 
> > > wrote:
> > >
> > >> Hello!  There's a security fix that probably should be applied to 1.18
> > >> in the next RC1 : https://github.com/apache/flink/pull/23461   (bump
> to
> > >> snappy-java).  Do you think this would be possible to include?
> > >>
> > >> All my best, Ryan
> > >>
> > >> [1]: https://issues.apache.org/jira/browse/FLINK-33149   "Bump
> > >> snappy-java to 1.1.10.4"
> > >>
> > >>
> > >>
> > >> On Mon, Sep 25, 2023 at 3:54 PM Jing Ge 
> > >> wrote:
> > >> >
> > >> > Thanks Zakelly for the update! Appreciate it!
> > >> >
> > >> > @Piotr Nowojski  If you do not have any other
> > >> > concerns, I will move forward to create 1.18 rc1 and start voting.
> > WDYT?
> > >> >
> > >> > Best regards,
> > >> > Jing
> > >> >
> > >> > On Mon, Sep 25, 2023 at 2:20 AM Zakelly Lan 
> > >> wrote:
> > >> >
> > >> > > Hi Jing and everyone,
> > >> > >
> > >> > > I have conducted three rounds of benchmarking with Java11,
> comparing
> > >> > > release 1.18 (commit: deb07e99560[1]) with commit 6d62f9918ea[2].
> > The
> > >> > > results are attached[3]. Most of the tests show no obvious
> > regression.
> > >> > > However, I did observe significant change in several tests. Upon
> > >> > > reviewing the historical results from the previous pipeline, I
> also
> > >> > > discovered a substantial variance in those tests, as shown in the
> > >> > > timeline pictures included in the sheet[3]. I believe this
> variance
> > >> > > has existed for a long time and requires further investigation,
> and
> > >> > > fully measuring the variance requires more rounds (15 or more). I
> > >> > > think for now it is not a blocker for release 1.18. WDYT?
> > >> > >
> > >> > >
> > >> > > Best,
> > >> > > Zakelly
> > >> > >
> > >> > > [1]
> > >> > >
> > >>
> >
> 

Re: PyFlink MapState with Types.ROW() throws exception

2023-10-05 Thread Elkhan Dadashov
After digging into the flink-python code, It seems if
`PYFLINK_GATEWAY_DISABLED` is set to false in an environment variable, then
using Types.LIST(Types.ROW([...])) does not have any issue, once Java
Gateway is launched.

It was unexpected for Flink local run to set this flag to false explicitly.

This is a workaround for this issue:

 def open(self, runtime_context: RuntimeContext):
state_ttl_config = (
StateTtlConfig.new_builder(Time.seconds(1))
.set_update_type(StateTtlConfig.UpdateType.OnReadAndWrite)
.disable_cleanup_in_background()
.build()
)
import os
os.environ["PYFLINK_GATEWAY_DISABLED"] = "0"

On Wed, Oct 4, 2023 at 1:48 PM Elkhan Dadashov 
wrote:

> Hi Flinkers,
>
> I'm trying to use MapState, where the value will be a list of  'pyflink.common.types.Row'> type elements.
>
> Wanted to check if anyone else faced the same issue while trying to use
> MapState in PyFlink with complex types.
>
> Here is the code:
>
> from pyflink.common import Time
> from pyflink.common.typeinfo import Types
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.datastream.functions import (
> KeyedCoProcessFunction,
> KeySelector,
> RuntimeContext,
> )
> from pyflink.datastream.state import (
> MapStateDescriptor,
> StateTtlConfig,
> ValueStateDescriptor,
> ListStateDescriptor
> )
> from pyflink.table import DataTypes, StreamTableEnvironment
>
>
> class MyKeyedCoProcessFunction(KeyedCoProcessFunction):
> def __init__(self):
> self.my_map_state = None
>
> def open(self, runtime_context: RuntimeContext):
> state_ttl_config = (
> StateTtlConfig.new_builder(Time.seconds(1))
> .set_update_type(StateTtlConfig.UpdateType.OnReadAndWrite)
> .disable_cleanup_in_background()
> .build()
> )
>
> my_map_state_descriptor = MapStateDescriptor(
> "my_map_state",
> Types.SQL_TIMESTAMP(),
> Types.LIST(Types.ROW([
> Types.STRING(),
> Types.STRING(),
> Types.STRING(),
> Types.STRING(),
> Types.STRING(),
> Types.STRING(),
> Types.STRING(),
> Types.STRING(),
> Types.SQL_TIMESTAMP(),
> Types.SQL_TIMESTAMP(),
> Types.SQL_TIMESTAMP(),
> Types.BIG_INT()
> ]))
> )
> my_map_state_descriptor.enable_time_to_live(state_ttl_config)
> self.my_map_state =
> runtime_context.get_map_state(my_map_state_descriptor)
>
> But while running this code, it fails with this exception at job startup
> (at runtime_context.get_map_state(my_map_state_descriptor)), even without
> trying to add anything to the state.
>
> File "pyflink/fn_execution/beam/beam_operations_fast.pyx", line 249, in
> pyflink.fn_execution.beam.beam_operations_fast.StatefulFunctionOperation
> .__init__
> File "pyflink/fn_execution/beam/beam_operations_fast.pyx", line 132, in
> pyflink.fn_execution.beam.beam_operations_fast.FunctionOperation.__init__
> File
> "/usr/local/lib64/python3.9/site-packages/pyflink/fn_execution/datastream/process/operations.py",
> line 127, in open
> self.open_func()
> File
> "/usr/local/lib64/python3.9/site-packages/pyflink/fn_execution/datastream/process/operations.py",
> line 296, in open_func
> process_function.open(runtime_context)
> File "/tmp/ipykernel_83481/1603226134.py", line 57, in open
> File
> "/usr/local/lib64/python3.9/site-packages/pyflink/fn_execution/datastream/process/runtime_context.py",
> line 125, in get_map_state
> map_coder = from_type_info(state_descriptor.type_info) # type: MapCoder
> File
> "/usr/local/lib64/python3.9/site-packages/pyflink/fn_execution/coders.py",
> line 812, in from_type_info
> from_type_info(type_info._key_type_info),
> from_type_info(type_info._value_type_info))
> File
> "/usr/local/lib64/python3.9/site-packages/pyflink/fn_execution/coders.py",
> line 809, in from_type_info
> return GenericArrayCoder(from_type_info(type_info.elem_type))
> File
> "/usr/local/lib64/python3.9/site-packages/pyflink/fn_execution/coders.py",
> line 819, in from_type_info
> [f for f in type_info.get_field_names()])
> File "/usr/local/lib64/python3.9/site-packages/pyflink/common/typeinfo.py",
> line 377, in get_field_names
> j_field_names = self.get_java_type_info().getFieldNames()
> File "/usr/local/lib64/python3.9/site-packages/pyflink/common/typeinfo.py",
> line 391, in get_java_type_info
> j_types_array = get_gateway()\
> File "/usr/local/lib64/python3.9/site-packages/pyflink/java_gateway.py",
> line 62, in get_gateway
> _gateway = launch_gateway()
> File "/usr/local/lib64/python3.9/site-packages/pyflink/java_gateway.py",
> line 86, in launch_gateway
> raise Exception("It's launching the PythonGatewayServer during Python UDF
> execution "
> Exception: It's launching the 

Re: [DISCUSS] Java Record support

2023-10-05 Thread Gyula Fóra
Hi,
I have opened a draft PR [1] that shows the minimal required changes and a
suggested unit test setup for Java version specific tests.
There is still some work to be done (run all benchmarks, add more tests for
compatibility/migration)

If you have time please review / comment on the approach here or on Github.

Cheers,
Gyula

[1] https://github.com/apache/flink/pull/23490

On Wed, Oct 4, 2023 at 7:09 PM Peter Huang 
wrote:

> +1 for the convenience of users.
>
> On Wed, Oct 4, 2023 at 8:05 AM Matthias Pohl  .invalid>
> wrote:
>
> > +1 Sounds like a good idea.
> >
> > On Wed, Oct 4, 2023 at 5:04 PM Gyula Fóra  wrote:
> >
> > > I will share my initial implementation soon, it seems to be pretty
> > > straightforward.
> > >
> > > Biggest challenge so far is setting tests so we can still compile
> against
> > > older versions but have tests for records . But I have working proposal
> > for
> > > that as well.
> > >
> > > Gyula
> > >
> > > On Wed, 4 Oct 2023 at 16:45, Chesnay Schepler 
> > wrote:
> > >
> > > > Kryo isn't required for this; newer versions do support records but
> we
> > > > want something like a PojoSerializer for records to be performant.
> > > >
> > > > The core challenges are
> > > > a) detecting records during type extraction
> > > > b) ensuring parameters are passed to the constructor in the right
> > order.
> > > >
> > > >  From what I remember from my own experiments this shouldn't exactly
> > > > /difficult/, but just a bit tedious to integrate into the Type
> > > > extraction stack.
> > > >
> > > > On 04/10/2023 16:14, Őrhidi Mátyás wrote:
> > > > > +1 This would be great
> > > > >
> > > > > On Wed, Oct 4, 2023 at 7:04 AM Gyula Fóra 
> > > wrote:
> > > > >
> > > > > Hi All!
> > > > >
> > > > > Flink 1.18 contains experimental Java 17 support but it misses
> > out
> > > > > on Java
> > > > > records which can be one of the nice benefits of actually using
> > > > > newer java
> > > > > versions.
> > > > >
> > > > > There is already a Jira to track this feature [1] but I am not
> > > > > aware of any
> > > > > previous efforts so far.
> > > > >
> > > > > Since records have pretty strong guarantees and many users
> would
> > > > > probably
> > > > > want to migrate from their POJOs, I think we should enhance the
> > > > > current
> > > > > Pojo TypeInfo/Serializer to accommodate for the records.
> > > > >
> > > > > I experimented with this locally and the changes are not huge
> as
> > > > > we only
> > > > > need to allow instantiating records through the constructor
> > instead
> > > > of
> > > > > setters. This would mean that the serialization format is
> > basically
> > > > > equivalent to the same non-record pojo, giving us backward
> > > > > compatibility
> > > > > and all the features of the Pojo serializer for basically free.
> > > > >
> > > > > We should make sure to not introduce any performance regression
> > in
> > > > the
> > > > > PojoSerializer but I am happy to open a preview PR if there is
> > > > > interest.
> > > > >
> > > > > There were mentions of upgrading Kryo to support this but I
> think
> > > > that
> > > > > would add unnecessary complexity.
> > > > >
> > > > > What do you all think?
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/FLINK-32380
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to FLIP-27

2023-10-05 Thread João Boto
If there is no more questions or concerns, I will start the voting thread 
tomorrow


On 2022/06/27 13:09:51 Roc Marshal wrote:
> Hi, all,
> 
> 
> 
> 
> I would like to open a discussion on porting JDBC Source to new Source API 
> (FLIP-27[1]).
> 
> Martijn Visser, Jing Ge and I had a preliminary discussion on the JIRA 
> FLINK-25420[2] and planed to start the discussion about the source part first.
> 
> 
> 
> Please let me know:
> 
> - The issues about old Jdbc source you encountered;
> - The new feature or design you want;
> - More suggestions from other dimensions...
> 
> 
> 
> You could find more details in FLIP-239[3].
> 
> Looking forward to your feedback.
> 
> 
> 
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> 
> [2] https://issues.apache.org/jira/browse/FLINK-25420
> 
> [3] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386271
> 
> 
> 
> 
> Best regards,
> 
> Roc Marshal


[DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-10-05 Thread Péter Váry
Hi Team,

In my previous email[1] I have described our challenges migrating the
existing Iceberg SinkFunction based implementation, to the new SinkV2 based
implementation.

As a result of the discussion around that topic, I have created the
FLIP-371 [2] to address the Committer related changes, and now I have
created a companion FLIP-372 [3] to address the WithPreCommitTopology
related issues.

FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the
type of the Committable

The main goal of the FLIP-372 is to extend the currently existing
TwoPhaseCommittingSink API by adding the possibility to have a
PreCommitTopology where the input of and the output types of the pre commit
transformation are different.

Here is the FLIP: FLIP-372: Allow TwoPhaseCommittingSink
WithPreCommitTopology to alter the type of the Committable


Please use this thread to discuss the FLIP related questions, proposals,
and feedback.

Thanks,
Peter

- [1] https://lists.apache.org/thread/h3kg7jcgjstpvwlhnofq093vk93ylgsn
- [2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink
- [3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable


https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable


Re: [DISCUSS] FLIP-371: SinkV2 Committer imporvements

2023-10-05 Thread Gabor Somogyi
Thanks for the efforts Peter!

I've just analyzed it through and I think it's useful feature.

+1 from my side.

G


On Thu, Oct 5, 2023 at 12:35 PM Péter Váry 
wrote:

> For the record, after the rename, the new FLIP link is:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink
>
> Thanks,
> Peter
>
> Péter Váry  ezt írta (időpont: 2023. okt. 5.,
> Cs, 11:02):
>
> > Thanks Gordon for the comments!
> >
> >1. I have changed the FLIP name to the one proposed by you.
> >2. In the Iceberg sink we need access only to the Flink metrics. We do
> >not specifically need the job ID in the Committer after the SinkV2
> >migration (more about that later). This is the reason why I have
> stated in
> >the FLIP, that *"We should also provide similar context information as
> >we do in the Writer’s case, which should be discussed further on the
> >mailing list."*. What I did is: I have just copied most of the
> >org.apache.flink.api.connector.sink2.InitContext [1] properties to the
> >CommitterInitContext and removed the ones which seemed excessive to
> me.
> >The API in the FLIP is only a draft, and I am open to any suggestions.
> >
> > In the new Iceberg Sink we need unique ids for the Committables, but we
> > generate them in the SinkV2Aggregator [2] which is placed into the
> > PreCommitTopology. The aggregator has access to the job ID, operator ID
> and
> > checkpoint ID. So no new info is needed on the Committer side there.
> >
> > Thanks,
> > Peter
> >
> > - [1]
> >
> https://github.com/apache/flink/blob/cd95b560d0c11a64b42bf6b98107314d32a4de86/flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java#L95-L131
> > - [2]
> >
> https://github.com/apache/iceberg/pull/8653/files#diff-bfd6a564485ec60b2d53f7aa800451548d4713c5f797e76bcff95d2c8ae05ed1R77-R81
> >
> > Tzu-Li (Gordon) Tai  ezt írta (időpont: 2023. okt.
> > 5., Cs, 8:16):
> >
> >> Thanks Peter for starting the FLIP.
> >>
> >> Overall, this seems pretty straightforward and overdue, +1.
> >>
> >> Two quick question / comments:
> >>
> >>1. Can you rename the FLIP to something less generic? Perhaps
> "Provide
> >>initialization context for Committer creation in
> >> TwoPhaseCommittingSink"?
> >>2. Can you describe why the job ID is needed to be exposed? Although
> >>it's out of scope for this FLIP, I'm wondering if there's a need to
> do
> >> the
> >>same for the sink writer InitContext.
> >>
> >> Thanks,
> >> Gordon
> >>
> >> On Wed, Oct 4, 2023 at 11:20 AM Martijn Visser <
> martijnvis...@apache.org>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > Peter, Marton, Gordon and I had an offline sync on SinkV2 and I'm
> >> > happy with this first FLIP on the topic. +1
> >> >
> >> > Best regards,
> >> >
> >> > Martijn
> >> >
> >> > On Wed, Oct 4, 2023 at 5:48 PM Márton Balassi <
> balassi.mar...@gmail.com
> >> >
> >> > wrote:
> >> > >
> >> > > Thanks, Peter. I agree that this is needed for Iceberg and
> beneficial
> >> for
> >> > > other connectors too.
> >> > >
> >> > > +1
> >> > >
> >> > > On Wed, Oct 4, 2023 at 3:56 PM Péter Váry <
> >> peter.vary.apa...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi Team,
> >> > > >
> >> > > > In my previous email[1] I have described our challenges migrating
> >> the
> >> > > > existing Iceberg SinkFunction based implementation, to the new
> >> SinkV2
> >> > based
> >> > > > implementation.
> >> > > >
> >> > > > As a result of the discussion around that topic, I have created
> the
> >> > first
> >> > > > [2] of the FLIP-s addressing the missing features there.
> >> > > >
> >> > > > The main goal of the FLIP-371 is to extend the currently existing
> >> > Committer
> >> > > > API by providing an initial context on Committer creation. This
> >> context
> >> > > > will contain - among other, less interesting data -
> >> > > > the SinkCommitterMetricGroup which could be used to store the
> >> generic
> >> > > > commit related metrics, and also provide a way for the Committer
> to
> >> > publish
> >> > > > its own metrics.
> >> > > >
> >> > > > The feature has already been requested through FLINK-25857 [3].
> >> > > >
> >> > > > Please use this thread to discuss the FLIP related questions,
> >> > proposals,
> >> > > > feedback.
> >> > > >
> >> > > > Thanks,
> >> > > > Peter
> >> > > >
> >> > > > - [1]
> >> https://lists.apache.org/thread/h3kg7jcgjstpvwlhnofq093vk93ylgsn
> >> > > > - [2]
> >> > > >
> >> > > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+SinkV2+Committer+imporvements
> >> > > > - [3] https://issues.apache.org/jira/browse/FLINK-25857
> >> > > >
> >> >
> >>
> >
>


Re: [DISCUSS] FLIP-371: SinkV2 Committer imporvements

2023-10-05 Thread Péter Váry
For the record, after the rename, the new FLIP link is:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink

Thanks,
Peter

Péter Váry  ezt írta (időpont: 2023. okt. 5.,
Cs, 11:02):

> Thanks Gordon for the comments!
>
>1. I have changed the FLIP name to the one proposed by you.
>2. In the Iceberg sink we need access only to the Flink metrics. We do
>not specifically need the job ID in the Committer after the SinkV2
>migration (more about that later). This is the reason why I have stated in
>the FLIP, that *"We should also provide similar context information as
>we do in the Writer’s case, which should be discussed further on the
>mailing list."*. What I did is: I have just copied most of the
>org.apache.flink.api.connector.sink2.InitContext [1] properties to the
>CommitterInitContext and removed the ones which seemed excessive to me.
>The API in the FLIP is only a draft, and I am open to any suggestions.
>
> In the new Iceberg Sink we need unique ids for the Committables, but we
> generate them in the SinkV2Aggregator [2] which is placed into the
> PreCommitTopology. The aggregator has access to the job ID, operator ID and
> checkpoint ID. So no new info is needed on the Committer side there.
>
> Thanks,
> Peter
>
> - [1]
> https://github.com/apache/flink/blob/cd95b560d0c11a64b42bf6b98107314d32a4de86/flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java#L95-L131
> - [2]
> https://github.com/apache/iceberg/pull/8653/files#diff-bfd6a564485ec60b2d53f7aa800451548d4713c5f797e76bcff95d2c8ae05ed1R77-R81
>
> Tzu-Li (Gordon) Tai  ezt írta (időpont: 2023. okt.
> 5., Cs, 8:16):
>
>> Thanks Peter for starting the FLIP.
>>
>> Overall, this seems pretty straightforward and overdue, +1.
>>
>> Two quick question / comments:
>>
>>1. Can you rename the FLIP to something less generic? Perhaps "Provide
>>initialization context for Committer creation in
>> TwoPhaseCommittingSink"?
>>2. Can you describe why the job ID is needed to be exposed? Although
>>it's out of scope for this FLIP, I'm wondering if there's a need to do
>> the
>>same for the sink writer InitContext.
>>
>> Thanks,
>> Gordon
>>
>> On Wed, Oct 4, 2023 at 11:20 AM Martijn Visser 
>> wrote:
>>
>> > Hi all,
>> >
>> > Peter, Marton, Gordon and I had an offline sync on SinkV2 and I'm
>> > happy with this first FLIP on the topic. +1
>> >
>> > Best regards,
>> >
>> > Martijn
>> >
>> > On Wed, Oct 4, 2023 at 5:48 PM Márton Balassi > >
>> > wrote:
>> > >
>> > > Thanks, Peter. I agree that this is needed for Iceberg and beneficial
>> for
>> > > other connectors too.
>> > >
>> > > +1
>> > >
>> > > On Wed, Oct 4, 2023 at 3:56 PM Péter Váry <
>> peter.vary.apa...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Team,
>> > > >
>> > > > In my previous email[1] I have described our challenges migrating
>> the
>> > > > existing Iceberg SinkFunction based implementation, to the new
>> SinkV2
>> > based
>> > > > implementation.
>> > > >
>> > > > As a result of the discussion around that topic, I have created the
>> > first
>> > > > [2] of the FLIP-s addressing the missing features there.
>> > > >
>> > > > The main goal of the FLIP-371 is to extend the currently existing
>> > Committer
>> > > > API by providing an initial context on Committer creation. This
>> context
>> > > > will contain - among other, less interesting data -
>> > > > the SinkCommitterMetricGroup which could be used to store the
>> generic
>> > > > commit related metrics, and also provide a way for the Committer to
>> > publish
>> > > > its own metrics.
>> > > >
>> > > > The feature has already been requested through FLINK-25857 [3].
>> > > >
>> > > > Please use this thread to discuss the FLIP related questions,
>> > proposals,
>> > > > feedback.
>> > > >
>> > > > Thanks,
>> > > > Peter
>> > > >
>> > > > - [1]
>> https://lists.apache.org/thread/h3kg7jcgjstpvwlhnofq093vk93ylgsn
>> > > > - [2]
>> > > >
>> > > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+SinkV2+Committer+imporvements
>> > > > - [3] https://issues.apache.org/jira/browse/FLINK-25857
>> > > >
>> >
>>
>


Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

2023-10-05 Thread Yuepeng Pan
Hi, Zhu Zhu,

Thanks for your feedback!

> I think we can introduce a new config option
> `taskmanager.load-balance.mode`,
> which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> can be superseded by the "Slots" mode and get deprecated. In the future
> it can support more mode, e.g. "CpuCores", to work better for jobs with
> fine-grained resources. The proposed config option
> `slot.request.max-interval`
> then can be renamed to `taskmanager.load-balance.request-stablizing-timeout`
> to show its relation with the feature. The proposed `slot.sharing-strategy`
> is not needed, because the configured "Tasks" mode will do the work.

The new proposed configuration option sounds good to me. 

I have a small question, If we set our configuration value to 'Tasks,' it will 
initiate two processes: balancing the allocation of task quantities at the slot 
level and balancing the number of tasks across TaskManagers (TMs).
Alternatively, if we configure it as 'Slots,' the system will employ the 
LocalPreferred allocation policy (which is the default) when assigning tasks to 
slots, and it will ensure that the number of slots used across TMs is balanced.
Does  this configuration essentially combine a balanced selection strategy 
across two dimensions into fixed configuration items, right?

I would appreciate it if you could correct me if I've made any errors.

Best,
Yuepeng.


RE: [ANNOUNCE] Release 1.18.0, release candidate #0

2023-10-05 Thread David Radley
Hi Jing,
Yes I agree that if we can get them resolved then that would be ideal.

I guess the worry is that at 1.17, we had a released Flink core and Kafka 
connector.
At 1.18 we will have a released Core Flink but no new Kafka connector. So the 
last released Kafka connector would now be 
https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka/3.0.0-1.17
 which should be the same as the Kafka connector in 1.17. I guess this is the 
combination that people would pick up to deploy in production – and I assume 
this has been tested.

This issues with the nightly builds refers to kafka connector main branch.  If 
they are not regressions, you are suggesting that pragmatically we go forward 
with the release; I think that makes sense to do, but do these issues effect 
3.0.0.-1.117.

I suspect we should release a new Kafka connector asap, so we have a matching 
connector built outside of the Flink repo. We may want to not include the Flink 
core version in the connector – or we might end up wanting to release a Kafka 
connector when there are no changes just to have a match with the Flink core 
version.

Kind regards, David.



From: Jing Ge 
Date: Wednesday, 4 October 2023 at 17:36
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [ANNOUNCE] Release 1.18.0, release candidate #0
Hi David,

First of all, we should have enough time to wait for those issues to
be resolved. Secondly, it makes less sense to block upstream release by
downstream build issues. In case, those issues might need more time, we
should move forward with the Flink release without waiting for them. WDYT?

Best regards,
Jing

On Wed, Oct 4, 2023 at 6:15 PM David Radley  wrote:

> Hi ,
> As release 1.18 removes  the kafka connector from the core Flink
> repository, I assume we will wait until the kafka connector nightly build
> issues https://issues.apache.org/jira/browse/FLINK-33104  and
> https://issues.apache.org/jira/browse/FLINK-33017  are resolved before
> releasing 1.18?
>
>  Kind regards, David.
>
>
> From: Jing Ge 
> Date: Wednesday, 27 September 2023 at 15:11
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Re: [ANNOUNCE] Release 1.18.0, release candidate #0
> Hi Folks,
>
> @Ryan FYI: CI passed and the PR has been merged. Thanks!
>
> If there are no more other concerns, I will start publishing 1.18-rc1.
>
> Best regards,
> Jing
>
> On Mon, Sep 25, 2023 at 1:40 PM Jing Ge  wrote:
>
> > Hi Ryan,
> >
> > Thanks for reaching out. It is fine to include it but we need to wait
> > until the CI passes. I am not sure how long it will take, since there
> seems
> > to be some infra issues.
> >
> > Best regards,
> > Jing
> >
> > On Mon, Sep 25, 2023 at 11:34 AM Ryan Skraba
> 
> > wrote:
> >
> >> Hello!  There's a security fix that probably should be applied to 1.18
> >> in the next RC1 : https://github.com/apache/flink/pull/23461   (bump to
> >> snappy-java).  Do you think this would be possible to include?
> >>
> >> All my best, Ryan
> >>
> >> [1]: https://issues.apache.org/jira/browse/FLINK-33149   "Bump
> >> snappy-java to 1.1.10.4"
> >>
> >>
> >>
> >> On Mon, Sep 25, 2023 at 3:54 PM Jing Ge 
> >> wrote:
> >> >
> >> > Thanks Zakelly for the update! Appreciate it!
> >> >
> >> > @Piotr Nowojski  If you do not have any other
> >> > concerns, I will move forward to create 1.18 rc1 and start voting.
> WDYT?
> >> >
> >> > Best regards,
> >> > Jing
> >> >
> >> > On Mon, Sep 25, 2023 at 2:20 AM Zakelly Lan 
> >> wrote:
> >> >
> >> > > Hi Jing and everyone,
> >> > >
> >> > > I have conducted three rounds of benchmarking with Java11, comparing
> >> > > release 1.18 (commit: deb07e99560[1]) with commit 6d62f9918ea[2].
> The
> >> > > results are attached[3]. Most of the tests show no obvious
> regression.
> >> > > However, I did observe significant change in several tests. Upon
> >> > > reviewing the historical results from the previous pipeline, I also
> >> > > discovered a substantial variance in those tests, as shown in the
> >> > > timeline pictures included in the sheet[3]. I believe this variance
> >> > > has existed for a long time and requires further investigation, and
> >> > > fully measuring the variance requires more rounds (15 or more). I
> >> > > think for now it is not a blocker for release 1.18. WDYT?
> >> > >
> >> > >
> >> > > Best,
> >> > > Zakelly
> >> > >
> >> > > [1]
> >> > >
> >>
> https://github.com/apache/flink/commit/deb07e99560b45033a629afc3f90666ad0a32feb
> >> > > [2]
> >> > >
> >>
> https://github.com/apache/flink/commit/6d62f9918ea2cbb8a10c705a25a4ff6deab60711
> >> > > [3]
> >> > >
> >>
> https://docs.google.com/spreadsheets/d/1V0-duzNTgu7H6R7kioF-TAPhlqWl7Co6Q9ikTBuaULo/edit?usp=sharing
> >> > >
> >> > > On Sun, Sep 24, 2023 at 11:29 AM ConradJam 
> >> wrote:
> >> > > >
> >> > > > +1 for testing with Java 17
> >> > > >
> >> > > > Jing Ge  于2023年9月24日周日 09:40写道:
> >> > > >
> >> > > > > +1 for testing with Java 17 too. Thanks Zakelly for your effort!
> >> > > > >
> >> > > > > Best regards,
> >> > > > > 

[jira] [Created] (FLINK-33195) ElasticSearch Connector should directly depend on 3rd-party libs instead of flink-shaded repo

2023-10-05 Thread Jing Ge (Jira)
Jing Ge created FLINK-33195:
---

 Summary: ElasticSearch Connector should directly depend on 
3rd-party libs instead of flink-shaded repo
 Key: FLINK-33195
 URL: https://issues.apache.org/jira/browse/FLINK-33195
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / ElasticSearch
Affects Versions: 1.18.0
Reporter: Jing Ge






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-371: SinkV2 Committer imporvements

2023-10-05 Thread Péter Váry
Thanks Gordon for the comments!

   1. I have changed the FLIP name to the one proposed by you.
   2. In the Iceberg sink we need access only to the Flink metrics. We do
   not specifically need the job ID in the Committer after the SinkV2
   migration (more about that later). This is the reason why I have stated in
   the FLIP, that *"We should also provide similar context information as
   we do in the Writer’s case, which should be discussed further on the
   mailing list."*. What I did is: I have just copied most of the
   org.apache.flink.api.connector.sink2.InitContext [1] properties to the
   CommitterInitContext and removed the ones which seemed excessive to me.
   The API in the FLIP is only a draft, and I am open to any suggestions.

In the new Iceberg Sink we need unique ids for the Committables, but we
generate them in the SinkV2Aggregator [2] which is placed into the
PreCommitTopology. The aggregator has access to the job ID, operator ID and
checkpoint ID. So no new info is needed on the Committer side there.

Thanks,
Peter

- [1]
https://github.com/apache/flink/blob/cd95b560d0c11a64b42bf6b98107314d32a4de86/flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java#L95-L131
- [2]
https://github.com/apache/iceberg/pull/8653/files#diff-bfd6a564485ec60b2d53f7aa800451548d4713c5f797e76bcff95d2c8ae05ed1R77-R81

Tzu-Li (Gordon) Tai  ezt írta (időpont: 2023. okt. 5.,
Cs, 8:16):

> Thanks Peter for starting the FLIP.
>
> Overall, this seems pretty straightforward and overdue, +1.
>
> Two quick question / comments:
>
>1. Can you rename the FLIP to something less generic? Perhaps "Provide
>initialization context for Committer creation in
> TwoPhaseCommittingSink"?
>2. Can you describe why the job ID is needed to be exposed? Although
>it's out of scope for this FLIP, I'm wondering if there's a need to do
> the
>same for the sink writer InitContext.
>
> Thanks,
> Gordon
>
> On Wed, Oct 4, 2023 at 11:20 AM Martijn Visser 
> wrote:
>
> > Hi all,
> >
> > Peter, Marton, Gordon and I had an offline sync on SinkV2 and I'm
> > happy with this first FLIP on the topic. +1
> >
> > Best regards,
> >
> > Martijn
> >
> > On Wed, Oct 4, 2023 at 5:48 PM Márton Balassi 
> > wrote:
> > >
> > > Thanks, Peter. I agree that this is needed for Iceberg and beneficial
> for
> > > other connectors too.
> > >
> > > +1
> > >
> > > On Wed, Oct 4, 2023 at 3:56 PM Péter Váry  >
> > > wrote:
> > >
> > > > Hi Team,
> > > >
> > > > In my previous email[1] I have described our challenges migrating the
> > > > existing Iceberg SinkFunction based implementation, to the new SinkV2
> > based
> > > > implementation.
> > > >
> > > > As a result of the discussion around that topic, I have created the
> > first
> > > > [2] of the FLIP-s addressing the missing features there.
> > > >
> > > > The main goal of the FLIP-371 is to extend the currently existing
> > Committer
> > > > API by providing an initial context on Committer creation. This
> context
> > > > will contain - among other, less interesting data -
> > > > the SinkCommitterMetricGroup which could be used to store the generic
> > > > commit related metrics, and also provide a way for the Committer to
> > publish
> > > > its own metrics.
> > > >
> > > > The feature has already been requested through FLINK-25857 [3].
> > > >
> > > > Please use this thread to discuss the FLIP related questions,
> > proposals,
> > > > feedback.
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > > - [1]
> https://lists.apache.org/thread/h3kg7jcgjstpvwlhnofq093vk93ylgsn
> > > > - [2]
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+SinkV2+Committer+imporvements
> > > > - [3] https://issues.apache.org/jira/browse/FLINK-25857
> > > >
> >
>


Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-10-05 Thread Jing Ge
Hi Dawid,

Please don't get me wrong. I just described the facts, shared different
opinions, and tried to make sure we are on the same page. My intention is
clearly not to block your effort. If you, after hearing all the different
opinions, still think your solution is the right approach, please go ahead.

For me, Ignoring and throwing exceptions are very different in this case.
Just like the names tell us, ignoring means ignoring the hints, everything
is valid, including the hints. But Throwing exceptions means something is
wrong, either the platform should do something or the exception should be
shown to users in a professional manner to let users be aware of it.

Again, if I am not mistaken, I didn't see any consensus during the
discussion about the throwing exceptions and I already answered your
elaboration that the platform provider wants to control it in my previous
reply. It is up to users not platform providers. Hopefully you have already
read it. Anyway, I just tried to share my thoughts, not to block you.
Apologize, if it caused any misunderstanding.

Best regards,
Jing

On Thu, Oct 5, 2023 at 8:03 AM Dawid Wysakowicz 
wrote:

> Hey Jing,
>
> If you went through the discussion, you would see it has never been
> shifted towards "ignore". The only concern in the discussion was we'd
> have too many options and that lookup joins require them. It was never
> questioned we should not throw an exception that was suggested in the
> first message:
>
> Description: Enable or disable the QUERY hint, if disabled, an
> exception would be thrown if any QUERY hints are specified
> Note: The default value will be set to true.
>
> until you commented on the PR, which confused me. The oracle's
> OPTIMIZER_IGNORE_HINTS was shown as an example of a system that does let
> you disable hints. It was never said let's support the same behaviour.
> Just that, yeah, I am fine now with the adding the option. Let's name it
> the same way.
>
> On one
> hand, there is no clear reason why we should disable(throwing
> exception) it
> globally, and on the other hand, some functionalities, e.g. lookup join
> pointed out by Ron, are depending on it.
>
> What's the difference between ignore and throw in this case? They
> wouldn't work in either case.
>
> Would you like to elaborate the
> must-have requirement for the "disabled" scenario?
>
> As platform provider we'd like to take as much control of the query as
> possible. Query hints expose internals which we would like to take
> control of. Moreover we don't suggest to disable them by default. We
> just propose to have that possibility if you need it. If you don't want
> to, you don't need to use it. I also don't get the argument we'd have
> too many options. We already have many detailed options and on the other
> hand query hints themselves make the system more complicated to operate.
>
> I think my proposal to support both THROW and IGNORE semantics should
> satisfy both mine requirements and your concerns, so to be honest I am
> disappointed that you don't want to reach a compromise, but you just
> block the effort.
>
> Best,
>
> Dawid
>
>
> On 05/10/2023 05:08, Jing Ge wrote:
> > Hi Dawid,
> >
> > Thanks for the clarification. If you could go through the discussion, you
> > would be aware that the focus has been moved from "disable" to "ignore".
> > There was an alignment only on "ignore hints". Your suggestion bypassed
> the
> > alignment and mixed everything together. That confused me a bit. On one
> > hand, there is no clear reason why we should disable(throwing exception)
> it
> > globally, and on the other hand, some functionalities, e.g. lookup join
> > pointed out by Ron, are depending on it. Would you like to elaborate the
> > must-have requirement for the "disabled" scenario? Thanks!
> >
> > Best regards,
> > Jing
> >
> > On Thu, Oct 5, 2023 at 12:23 AM Sergey Nuyanzin
> wrote:
> >
> >> Hi Dawid,
> >>
> >> Thanks for bringing this.
> >> I would agree with enum approach
> >> ignored option would allow to follow Oracle's behavior as well
> >>
> >>> table.optimizer.query-options = ENABLED/DISABLED/IGNORED
> >> nit: Can we have "hint" in config option name
> >> e.g. table.optimizer.query-options-hints ?
> >>
> >>
> >> On Tue, Oct 3, 2023 at 5:58 PM Dawid Wysakowicz
> >> wrote:
> >>
> >>> Hey all,
> >>> My understanding was that from the first message we were discussing
> >>> throwing an exception. Oracle was only shown as an example of a system
> >>> that have a flag for hints behaviour.
> >>>
> >>> Let's get back to the discussion and agree on the behavior. My
> >>> suggestion is to introduce an enum instead of a boolean flag since
> >>> apparently there are different requirements. My take is that it is
> worth
> >>> to have an option to throw an exception if hints are disabled and are
> >>> provided in the SQL query. This is the same behavior as disabling
> >>> OPTIONS hint we already have[1]
> >>>
> >>> Since you both @Jing and 

Re: Flink and Flink shaded dependency

2023-10-05 Thread Jing Ge
Hi Chesnay,

Thanks for joining this discussion and sharing your thoughts!


> Connectors shouldn't depend on flink-shaded.
>

Perfect! We are on the same page. If you could read through the discussion,
you would realize that, currently, there are many connectors depend on
flink-shaded.


> Connectors are small enough in scope that depending directly on
> guava/jackson/etc. is a fine approach, and they have plenty of other
> dependencies that they need to manage anyway; let's treat these the same
> way.
>

It is even better, if we could do that. Jira tickest[1] are created.


> As for class-loading, there has been a long-standing goal of each
> connector being loaded in their own classloader. That still is the north
> star and the only reasonable way to ensure that multiple connectors can
> be safely used with SQL.
>

What is the current status? Do we have any Jira ticket for that?

Best regards,
Jing

[1] https://issues.apache.org/jira/browse/FLINK-33190

On Wed, Oct 4, 2023 at 4:43 PM Chesnay Schepler  wrote:

> There is no "monolithic" flink-shaded dependency.
> Connectors shouldn't depend on anything that Flink provides, but be
> self-contained as Martijn pointed out.
>
>

> Connectors shouldn't depend on flink-shaded.
>


> The overhead and/or risks of doing/supporting that right now far
> outweigh the benefits.
> ( Because we either have to encode the full version for all dependencies
> into the package, or accept the risk of minor/patch dependency clashes)
>


> Connectors are small enough in scope that depending directly on
> guava/jackson/etc. is a fine approach, and they have plenty of other
> dependencies that they need to manage anyway; let's treat these the same
> way.
>


> Naturally this is also an argument against flink-shaded-connectors; on
> top of that we already experience repo creep and managing releases is
> difficult enough as-is.
>
>

> As for class-loading, there has been a long-standing goal of each
> connector being loaded in their own classloader. That still is the north
> star and the only reasonable way to ensure that multiple connectors can
> be safely used with SQL.
>


> On 02/10/2023 18:32, Jing Ge wrote:
> > Hi Sergey,
> >
> > Thanks for sharing your thoughts. It could somehow help but didn't get to
> > the root of this issue.
> >
> > According to the documentation, Flink shaded is used to provide a single
> > instance of a shaded dependency across sub-modules in Flink repo. Shaded
> > namespaces should be used where shaded dependencies are configured. After
> > connectors have been externalized, it ends up with more repos depending
> on
> > one shaded jar, e.g. guava. This is a "monolithic" dependency setup that
> > makes it difficult to change the root(flink-shade), because any changes
> of
> > the root have to be propagated to all downstream repos. Even worse is
> that
> > not every downstream repo is known while modifying the root.
> >
> > Since all externalized connectors have their own repos and are not
> > sub-modules of Flink anymore, I would suggest the following upgrade:
> >
> > 1. Connectors should use their own classloader instead of Flink's
> > classloader. This will break the monolithic dependency. Connectors and
> > Flink can use different versions of flink-shaded.
> > 2. [optional] It would be even better that all connector repos depend on
> > their own individual shaded repo, e.g. flink-connector-shaded.
> flink-shaded
> > should only be used by Flink.
> >
> > WDYT?
> >
> > Best regards,
> > Jing
> >
> >
> > On Thu, Sep 14, 2023 at 11:28 PM Sergey Nuyanzin 
> > wrote:
> >
> >> Yes, that's a reasonable question, thanks for raising it.
> >>
> >> I think this is not only about flink-shaded, rather about dependencies
> in
> >> general
> >>
> >> I guess there is no rule of thumb, or at least I'm not aware of
> >> Here are my thoughts
> >> 1. If bumping dependency doesn't require breaking changes and passes
> >> existing tests then just bump it
> >> 2. In case there are breaking changes we could consider doing this
> within
> >> next major release
> >> for minor release
> >>  a. try to answer a question whether it impacts Flink or not
> >>  b. in case it impacts Flink and fix itself is relatively small
> then to
> >> avoid breaking change
> >> we could copy classes with solutions to Flink repo like it
> usually
> >> happens with Calcite related fixes.
> >> The problem of this approach is that I guess it will not work
> for
> >> non jvm deps like e.g. RocksDB
> >>  c. In case no way to do it without breaking changes for minor
> release
> >> then probably need sort of announcement motivating to move to another
> major
> >> version where the issue is fixed
> >>
> >> looking forward to seeing other opinions about that
> >>
> >> On Wed, Sep 13, 2023 at 9:47 PM Jing Ge 
> >> wrote:
> >>
> >>> Hi Sergey,
> >>>
> >>> Thanks for doing the analysis and providing the great insight. I did my
> >> own
> >>> analysis and got the same 

Re: [DISCUSS] FLIP-371: SinkV2 Committer imporvements

2023-10-05 Thread Tzu-Li (Gordon) Tai
Thanks Peter for starting the FLIP.

Overall, this seems pretty straightforward and overdue, +1.

Two quick question / comments:

   1. Can you rename the FLIP to something less generic? Perhaps "Provide
   initialization context for Committer creation in TwoPhaseCommittingSink"?
   2. Can you describe why the job ID is needed to be exposed? Although
   it's out of scope for this FLIP, I'm wondering if there's a need to do the
   same for the sink writer InitContext.

Thanks,
Gordon

On Wed, Oct 4, 2023 at 11:20 AM Martijn Visser 
wrote:

> Hi all,
>
> Peter, Marton, Gordon and I had an offline sync on SinkV2 and I'm
> happy with this first FLIP on the topic. +1
>
> Best regards,
>
> Martijn
>
> On Wed, Oct 4, 2023 at 5:48 PM Márton Balassi 
> wrote:
> >
> > Thanks, Peter. I agree that this is needed for Iceberg and beneficial for
> > other connectors too.
> >
> > +1
> >
> > On Wed, Oct 4, 2023 at 3:56 PM Péter Váry 
> > wrote:
> >
> > > Hi Team,
> > >
> > > In my previous email[1] I have described our challenges migrating the
> > > existing Iceberg SinkFunction based implementation, to the new SinkV2
> based
> > > implementation.
> > >
> > > As a result of the discussion around that topic, I have created the
> first
> > > [2] of the FLIP-s addressing the missing features there.
> > >
> > > The main goal of the FLIP-371 is to extend the currently existing
> Committer
> > > API by providing an initial context on Committer creation. This context
> > > will contain - among other, less interesting data -
> > > the SinkCommitterMetricGroup which could be used to store the generic
> > > commit related metrics, and also provide a way for the Committer to
> publish
> > > its own metrics.
> > >
> > > The feature has already been requested through FLINK-25857 [3].
> > >
> > > Please use this thread to discuss the FLIP related questions,
> proposals,
> > > feedback.
> > >
> > > Thanks,
> > > Peter
> > >
> > > - [1] https://lists.apache.org/thread/h3kg7jcgjstpvwlhnofq093vk93ylgsn
> > > - [2]
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+SinkV2+Committer+imporvements
> > > - [3] https://issues.apache.org/jira/browse/FLINK-25857
> > >
>


[jira] [Created] (FLINK-33194) AWS Connector should directly depend on 3rd-party libs instead of flink-shaded repo

2023-10-05 Thread Jing Ge (Jira)
Jing Ge created FLINK-33194:
---

 Summary: AWS Connector should directly depend on 3rd-party libs 
instead of flink-shaded repo
 Key: FLINK-33194
 URL: https://issues.apache.org/jira/browse/FLINK-33194
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / AWS
Affects Versions: 1.18.0
Reporter: Jing Ge






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33193) JDBC Connector should directly depend on 3rd-party libs instead of flink-shaded repo

2023-10-05 Thread Jing Ge (Jira)
Jing Ge created FLINK-33193:
---

 Summary: JDBC Connector should directly depend on 3rd-party libs 
instead of flink-shaded repo
 Key: FLINK-33193
 URL: https://issues.apache.org/jira/browse/FLINK-33193
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Affects Versions: 1.18.0
Reporter: Jing Ge






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-10-05 Thread Dawid Wysakowicz

Hey Jing,

If you went through the discussion, you would see it has never been 
shifted towards "ignore". The only concern in the discussion was we'd 
have too many options and that lookup joins require them. It was never 
questioned we should not throw an exception that was suggested in the 
first message:


   Description: Enable or disable the QUERY hint, if disabled, an
   exception would be thrown if any QUERY hints are specified
   Note: The default value will be set to true.

until you commented on the PR, which confused me. The oracle's 
OPTIMIZER_IGNORE_HINTS was shown as an example of a system that does let 
you disable hints. It was never said let's support the same behaviour. 
Just that, yeah, I am fine now with the adding the option. Let's name it 
the same way.


   On one
   hand, there is no clear reason why we should disable(throwing exception) it
   globally, and on the other hand, some functionalities, e.g. lookup join
   pointed out by Ron, are depending on it.

What's the difference between ignore and throw in this case? They 
wouldn't work in either case.


   Would you like to elaborate the
   must-have requirement for the "disabled" scenario?

As platform provider we'd like to take as much control of the query as 
possible. Query hints expose internals which we would like to take 
control of. Moreover we don't suggest to disable them by default. We 
just propose to have that possibility if you need it. If you don't want 
to, you don't need to use it. I also don't get the argument we'd have 
too many options. We already have many detailed options and on the other 
hand query hints themselves make the system more complicated to operate.


I think my proposal to support both THROW and IGNORE semantics should 
satisfy both mine requirements and your concerns, so to be honest I am 
disappointed that you don't want to reach a compromise, but you just 
block the effort.


Best,

Dawid


On 05/10/2023 05:08, Jing Ge wrote:

Hi Dawid,

Thanks for the clarification. If you could go through the discussion, you
would be aware that the focus has been moved from "disable" to "ignore".
There was an alignment only on "ignore hints". Your suggestion bypassed the
alignment and mixed everything together. That confused me a bit. On one
hand, there is no clear reason why we should disable(throwing exception) it
globally, and on the other hand, some functionalities, e.g. lookup join
pointed out by Ron, are depending on it. Would you like to elaborate the
must-have requirement for the "disabled" scenario? Thanks!

Best regards,
Jing

On Thu, Oct 5, 2023 at 12:23 AM Sergey Nuyanzin  wrote:


Hi Dawid,

Thanks for bringing this.
I would agree with enum approach
ignored option would allow to follow Oracle's behavior as well


table.optimizer.query-options = ENABLED/DISABLED/IGNORED

nit: Can we have "hint" in config option name
e.g. table.optimizer.query-options-hints ?


On Tue, Oct 3, 2023 at 5:58 PM Dawid Wysakowicz
wrote:


Hey all,
My understanding was that from the first message we were discussing
throwing an exception. Oracle was only shown as an example of a system
that have a flag for hints behaviour.

Let's get back to the discussion and agree on the behavior. My
suggestion is to introduce an enum instead of a boolean flag since
apparently there are different requirements. My take is that it is worth
to have an option to throw an exception if hints are disabled and are
provided in the SQL query. This is the same behavior as disabling
OPTIONS hint we already have[1]

Since you both @Jing and @Sergey would rather like to have an option to
ignore them we can introduce

table.optimizer.query-options = ENABLED/DISABLED/IGNORED

ENABLED: hints just work

DISABLED: throw an exception

IGNORED: ignore hints

Are you two fine with that option @Jing @Sergey?

Since this thread had a few misunderstandings already, I'd suggest to
convert it to a FLIP and follow with a VOTE shortly. @Bonnie would you
like to help with that?

Best,

Dawid

[1]



https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-dynamic-table-options-enabled

On 02/10/2023 18:18, Jing Ge wrote:

Hi,

I have the same concern as Sergey and would like to make sure the

purpose

of this discussion is to globally ignore hints without changing any

other

behaviours, if I am not mistaken. Thanks!

Best regards,
Jing

On Mon, Oct 2, 2023 at 3:40 PM Sergey Nuyanzin

wrote:

Hi Bonnie,

I think changing it to something like .enabled
could lead to misunderstanding
for instance when we set this flag to false what should it mean?
1. Just ignore hints and execute a query like the same query however

with

removed hints
2. Fail on validation because hints are disabled
3. something else

I tend to think that we are talking about just ignoring hints, so

option 1

(and Oracle follows option 1 as well as mentioned at [1]).
So I would suggest to keep ignore in property name to make it more

clear

Please let me know if I miss