[jira] [Created] (FLINK-32593) DelimitedInputFormat will cause record loss for multi-bytes delimit when a delimit is seperated to two splits

2023-07-14 Thread Zhaofu Liu (Jira)
Zhaofu Liu created FLINK-32593:
--

 Summary: DelimitedInputFormat will cause record loss for 
multi-bytes delimit when a delimit is seperated to two splits
 Key: FLINK-32593
 URL: https://issues.apache.org/jira/browse/FLINK-32593
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.17.1, 1.16.2, 1.16.1
Reporter: Zhaofu Liu
 Attachments: 5parallel.dat, image-2023-07-15-10-30-03-740.png

Run the following test to reproduce this bug.
{code:java}
// code placeholder
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.io.DelimitedInputFormat;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.junit.Test;

import javax.xml.bind.DatatypeConverter;
import java.io.IOException;

public class MyTest {

  @Test
  public void myTest() throws Exception {
final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(5);

String path = 
MyTest.class.getClassLoader().getResource("5parallel.dat").getPath();

final DelimitedInputFormat inputFormat = new TestInputFormat();
// The delimiter is "B87E7E7E"
inputFormat.setDelimiter(new byte[]{(byte) 184, (byte) 126, (byte) 126, 
(byte) 126});
// Set buffer size less than default value of 1M for easily debugging
inputFormat.setBufferSize(128);

DataStreamSource source = env.readFile(inputFormat, path);

source.map(new MapFunction() {
  @Override
  public Object map(byte[] value) throws Exception {
System.out.println(DatatypeConverter.printHexBinary(value));
return value;
  }
}).setParallelism(1);

env.execute();
  }

  private class TestInputFormat extends DelimitedInputFormat {
@Override
public byte[] readRecord(byte[] reuse, byte[] bytes, int offset, int 
numBytes) throws IOException {
  final int delimiterLen = this.getDelimiter().length;

  if (numBytes > 0) {
byte[] record = new byte[delimiterLen + numBytes];
System.arraycopy(this.getDelimiter(), 0, record, 0, delimiterLen);
System.arraycopy(bytes, offset, record, delimiterLen, numBytes);
return record;
  }

  return new byte[0];
}
  }
}
 {code}
 

The actually output result is:
{code:java}
// code placeholder
B87E7E7E1A00EB900A4EDC6850160070F6BED4631321ADDC6F06DC137C221E99
B87E7E7E1A00EB900A4EDC6A51160070F61A8AFE022A3EC67718002A217C2181
B87E7E7E1A00EB900A4EDC6D5516 {code}
 

The expected output result shoud be:
{code:java}
// code placeholder
B87E7E7E1A00EB900A4EDC6850160070F6BED4631321ADDC6F06DC137C221E99
B87E7E7E1A00EB900A4EDC6B52150070F6BE468EFD20BEEEB756E03FD7F653D0
B87E7E7E1A00EB900A4EDC6D5516
B87E7E7E1A00EB900A4EDC6A51160070F61A8AFE022A3EC67718002A217C2181 {code}
The view of a delimit is seperated to two splits (The tail of line 2 and head 
of line 3):

!image-2023-07-15-10-30-03-740.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-309: Enable operators to trigger checkpoints dynamically

2023-07-14 Thread Piotr Nowojski
Hi All,

We had a lot of off-line discussions. As a result I would suggest dropping
the idea of introducing an end-to-end-latency concept, until
we can properly implement it, which will require more designing and
experimenting. I would suggest starting with a more manual solution,
where the user needs to configure concrete parameters, like
`execution.checkpointing.max-interval` or `execution.flush-interval`.

FLIP-309 looks good to me, I would just rename
`execution.checkpointing.interval-during-backlog` to
`execution.checkpointing.max-interval`.

I would also reference future work, that a solution that would allow set
`isProcessingBacklog` for sources like Kafka will be introduced via
FLIP-328 [1].

Best,
Piotrek

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-328%3A+Allow+source+operators+to+determine+isProcessingBacklog+based+on+watermark+lag

śr., 12 lip 2023 o 03:49 Dong Lin  napisał(a):

> Hi Piotr,
>
> I think I understand your motivation for suggeseting
> execution.slow-end-to-end-latency now. Please see my followup comments
> (after the previous email) inline.
>
> On Wed, Jul 12, 2023 at 12:32 AM Piotr Nowojski 
> wrote:
>
> > Hi Dong,
> >
> > Thanks for the updates, a couple of comments:
> >
> > > If a record is generated by a source when the source's
> > isProcessingBacklog is true, or some of the records used to
> > > derive this record (by an operator) has isBacklog = true, then this
> > record should have isBacklog = true. Otherwise,
> > > this record should have isBacklog = false.
> >
> > nit:
> > I think this conflicts with "Rule of thumb for non-source operators to
> set
> > isBacklog = true for the records it emits:"
> > section later on, when it comes to a case if an operator has mixed
> > isBacklog = false and isBacklog = true inputs.
> >
> > > execution.checkpointing.interval-during-backlog
> >
> > Do we need to define this as an interval config parameter? Won't that add
> > an option that will be almost instantly deprecated
> > because what we actually would like to have is:
> > execution.slow-end-to-end-latency and execution.end-to-end-latency
> >
>
> I guess you are suggesting that we should allow users to specify a higher
> end-to-end latency budget for those records that are emitted by two-phase
> commit sink, than those records that are emitted by none-two-phase commit
> sink.
>
> My concern with this approach is that it will increase the complexity of
> the definition of "processing latency requirement", as well as the
> complexity of the Flink runtime code that handles it. Currently, the
> FLIP-325 defines end-to-end latency as an attribute of the records that is
> statically assigned when the record is generated at the source, regardless
> of how it will be emitted later in the topology. If we make the changes
> proposed above, we would need to define the latency requirement w.r.t. the
> attribute of the operators that it travels through before its result is
> emitted, which is less intuitive and more complex.
>
> For now, it is not clear whether it is necessary to have two categories of
> latency requirement for the same job. Maybe it is reasonable to assume that
> if a job has two-phase commit sink and the user is OK to emit some results
> at 1 minute interval, then more likely than not the user is also OK to emit
> all results at 1 minute interval, include those that go through
> none-two-phase commit sink?
>
> If we do want to support different end-to-end latency depending on whether
> the operator is emitted by two-phase commit sink, I would prefer to still
> use execution.checkpointing.interval-during-backlog instead of
> execution.slow-end-to-end-latency. This allows us to keep the concept of
> end-to-end latency simple. Also, by explicitly including "checkpointing
> interval" in the name of the config that directly affects checkpointing
> interval, we can make it easier and more intuitive for users to understand
> the impact and set proper value for such configs.
>
> What do you think?
>
> Best,
> Dong
>
>
> > Maybe we can introduce only `execution.slow-end-to-end-latency` (% maybe
> a
> > better name), and for the time being
> > use it as the checkpoint interval value during backlog?
>
>
> > Or do you envision that in the future users will be configuring only:
> > - execution.end-to-end-latency
> > and only optionally:
> > - execution.checkpointing.interval-during-backlog
> > ?
> >
> > Best Piotrek
> >
> > PS, I will read the summary that you have just published later, but I
> think
> > we don't need to block this FLIP on the
> > existence of that high level summary.
> >
> > wt., 11 lip 2023 o 17:49 Dong Lin  napisał(a):
> >
> > > Hi Piotr and everyone,
> > >
> > > I have documented the vision with a summary of the existing work in
> this
> > > doc. Please feel free to review/comment/edit this doc. Looking forward
> to
> > > working with you together in this line of work.
> > >
> > >
> > >
> >
> 

Re: [VOTE] Graduate the FileSink to @PublicEvolving

2023-07-14 Thread Jing Ge
Hi,

After talking with Dong offline, it turns out @PublicEvolving are intended
for public use and should have stable behavior. FileSink is not ready for
that according to the data loss issue raised in the discussion [1].

This vote is cancelled.

Best regards,
Jing

[1] https://lists.apache.org/thread/wxoo7py5pqqlz37l4w8jrq6qdvsdq5wc


On Tue, Jul 11, 2023 at 9:56 PM Jing Ge  wrote:

> Hi,
>
> Sorry for the typo. The title is correct. The VOTE is for graduating the
> "FileSink" to @PublicEvolving.
>
> Best regards,
> Jing
>
>
> On Mon, Jul 10, 2023 at 1:10 PM Jing Ge  wrote:
>
>> Hi,
>>
>> I'd like to start the VOTE for graduating the FlinkSink
>> to @PublicEvolving. The discussion thread can be found at [1]
>>
>> The vote will be open until at least July 13 12pm GMT(72 hours) unless
>> there is an objection or insufficient votes.
>>
>> Thanks,
>>
>> Jing Ge
>>
>> [1] https://lists.apache.org/thread/wxoo7py5pqqlz37l4w8jrq6qdvsdq5wc
>>
>


[DISCUSS] Connectors, Formats, and even User Code should also be pluggable.

2023-07-14 Thread zhiqiang li
Hi devs,

I have observed that in [1], connectors and formats are pluggable, allowing 
user code to be easily integrated. The advantages of having pluggable 
connectors are evident, as it helps avoid conflicts between different versions 
of jar packages. If classloader isolation is not used, shading becomes 
necessary to resolve conflicts, resulting in a significant waste of development 
time. I understand that implementing this change may require numerous API 
modifications, so I would like to discuss in this email.

> Plugins cannot access classes from other plugins or from Flink that have not 
> been specifically whitelisted.
> This strict isolation allows plugins to contain conflicting versions of the 
> same library without the need to relocate classes or to converge to common 
> versions.
> Currently, file systems and metric reporters are pluggable but, in the 
> future, connectors, formats, and even user code should also be pluggable.

[2] It is my preliminary idea.

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/
[2] 
https://docs.google.com/document/d/1XP2fBpcntK0YIdQ_Ax7JV2MhNdebvkFxSiNJRp6WQ24/edit?usp=sharing


Best,
Zhiqiang



Issue with flink 1.16 and hive dialect

2023-07-14 Thread ramkrishna vasudevan
Hi All,

I am not sure if this was already discussed in this forum.
In our set up with 1.16.0 flink we have ensured that the setup has all the
necessary things for Hive catalog to work.

The flink dialect works fine functionally (with some issues will come to
that later).

But when i follow the steps here in
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/hive-compatibility/hive-dialect/queries/overview/#examples
I am getting an exception once i set to hive dialect
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:161)
[flink-sql-client-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
Caused by: java.lang.ClassCastException: class
jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class
java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader
and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
at
org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:413)
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
at
org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:389)
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
at
org.apache.flink.table.planner.delegation.hive.HiveSessionState.(HiveSessionState.java:80)
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
at
org.apache.flink.table.planner.delegation.hive.HiveSessionState.startSessionState(HiveSessionState.java:128)
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
at
org.apache.flink.table.planner.delegation.hive.HiveParser.parse(HiveParser.java:210)
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
at
org.apache.flink.table.client.gateway.local.LocalExecutor.parseStatement(LocalExecutor.java:172)
~[flink-sql-client-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]

I have ensured the dialect related steps are completed followed including
https://issues.apache.org/jira/browse/FLINK-25128

In the flink catalog - if we create a table
> CREATE TABLE testsource(
>
>  `date` STRING,
>  `geo_altitude` FLOAT
> )
> PARTITIONED by ( `date`)
>
> WITH (
>
> 'connector' = 'hive',
> 'sink.partition-commit.delay'='1 s',
> 'sink.partition-commit.policy.kind'='metastore,success-file'
> );

The parition always gets created on the last set of columns and not on the
columns that we specify. Is this a known bug?

Regards
Ram


Re: [DISCUSS] FLIP-330: Support specifying record timestamp requirement

2023-07-14 Thread Yunfeng Zhou
Hi Matt,

1. I tried to add back the tag serialization process back to my POC
code and run the benchmark again, this time the performance
improvement is roughly reduced by half. It seems that both the
serialization and the judgement process have a major contribution to
the overhead reduction in the specific scenario, but in a production
environment where distributed cluster is deployed, I believe the
reduction in serialization would be a more major reason for
performance improvement.

2. According to the latency-tracking section in Flink document[1], it
seems that users would only enable latency markers for debugging
purposes, instead of using it in production code. Could you please
illustrate a bit more about the scenarios that would be limited when
latency markers are disabled?

3. I plan to benchmark the performance of this POC against TPC-DS and
hope that it could cover the common use cases that you are concerned
about. I believe there would still be performance improvement when the
size of each StreamRecord increases, though the improvement will not
be as obvious as that currently in FLIP.

Best regards,
Yunfeng

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#end-to-end-latency-tracking

On Thu, Jul 13, 2023 at 5:51 PM Matt Wang  wrote:
>
> Hi Yunfeng,
>
> Thanks for the proposal. The POC showed a performance improvement of 20%, 
> which is very exciting. But I have some questions:
> 1. Is the performance improvement here mainly due to the reduction of 
> serialization, or is it due to the judgment consumption caused by tags?
> 2. Watermark is not needed in some scenarios, but the latency maker is a 
> useful function. If the latency maker cannot be used, it will greatly limit 
> the usage scenarios. Whether the solution design can retain the capability of 
> the latency marker;
> 3. The data of the POC test is of long type. Here I want to see how much 
> profit it will have if it is a string with a length of 100B or 1KB.
>
>
> --
>
> Best,
> Matt Wang
>
>
>  Replied Message 
> | From | Yunfeng Zhou |
> | Date | 07/13/2023 14:52 |
> | To |  |
> | Subject | Re: [DISCUSS] FLIP-330: Support specifying record timestamp 
> requirement |
> Hi Jing,
>
> Thanks for reviewing this FLIP.
>
> 1. I did change the names of some APIs in the FLIP compared with the
> original version according to which I implemented the POC. As the core
> optimization logic remains the same and the POC's performance can
> still reflect the current FLIP's expected improvement, I have not
> updated the POC code after that. I'll add a note on the benchmark
> section of the FLIP saying that the namings in the POC code might be
> outdated, and FLIP is still the source of truth for our proposed
> design.
>
> 2. This FLIP could bring a fixed reduction on the workload of the
> per-record serialization path in Flink, so if the absolute time cost
> by non-optimized components could be lower, the performance
> improvement of this FLIP would be more obvious. That's why I chose to
> enable object-reuse and to transmit Boolean values in serialization.
> If it would be more widely regarded as acceptable for a benchmark to
> adopt more commonly-applied behavior(for object reuse, I believe
> disable is more common), I would be glad to update the benchmark
> result to disable object reuse.
>
> Best regards,
> Yunfeng
>
>
> On Thu, Jul 13, 2023 at 6:37 AM Jing Ge  wrote:
>
> Hi Yunfeng,
>
> Thanks for the proposal. It makes sense to offer the optimization. I got
> some NIT questions.
>
> 1. I guess you changed your thoughts while coding the POC, I found
> pipeline.enable-operator-timestamp in the code but  is
> pipeline.force-timestamp-support defined in the FLIP
> 2. about the benchmark example, why did you enable object reuse? Since It
> is an optimization of serde, will the benchmark be better if it is
> disabled?
>
> Best regards,
> Jing
>
> On Mon, Jul 10, 2023 at 11:54 AM Yunfeng Zhou 
> wrote:
>
> Hi all,
>
> Dong(cc'ed) and I are opening this thread to discuss our proposal to
> support optimizing StreamRecord's serialization performance.
>
> Currently, a StreamRecord would be converted into a 1-byte tag (+
> 8-byte timestamp) + N-byte serialized value during the serialization
> process. In scenarios where timestamps and watermarks are not needed,
> and latency tracking is enabled, this process would include
> unnecessary information in the serialized byte array. This FLIP aims
> to avoid such overhead and increases Flink job's performance during
> serialization.
>
> Please refer to the FLIP document for more details about the proposed
> design and implementation. We welcome any feedback and opinions on
> this proposal.
>
> Best regards, Dong and Yunfeng
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-330%3A+Support+specifying+record+timestamp+requirement
>


[jira] [Created] (FLINK-32592) Mixed-up job execution on concurrent job submission

2023-07-14 Thread Fabio Wanner (Jira)
Fabio Wanner created FLINK-32592:


 Summary: Mixed-up job execution on concurrent job submission
 Key: FLINK-32592
 URL: https://issues.apache.org/jira/browse/FLINK-32592
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Affects Versions: 1.17.1, 1.15.4, 1.18.0
Reporter: Fabio Wanner


*Context*

We are using the flink-k8s-operator to deploy multiple jobs (up to 32) to a 
single session cluster. The job submissions done by the operator happen 
concurrently, basically at the same time.

Operator version: 1.5.0

Flink version:  1.15.4, 1.7.1, 1.18 (master@f37d41cf)

*Problem*

Rarely (~once every 50 deployments) one of the jobs will not be executed. In 
the following incident 4 jobs are deployed at the same time:
 * gorner-task-staging-e5730831
 * gorner-facility-staging-e5730831
 * gorner-aepp-staging-e5730831
 * gorner-session-staging-e5730831

 
The operator submits the job, they all get a reasonable jobID:
{code:java}
2023-07-14 10:25:35,295 o.a.f.k.o.s.AbstractFlinkService [INFO 
][aelps-staging/gorner-task-staging-e5730831] Submitting job: 
4968b186061e44390002 to session cluster.
2023-07-14 10:25:35,297 o.a.f.k.o.s.AbstractFlinkService [INFO 
][aelps-staging/gorner-facility-staging-e5730831] Submitting job: 
91a5260d916c4dff0002 to session cluster.
2023-07-14 10:25:35,301 o.a.f.k.o.s.AbstractFlinkService [INFO 
][aelps-staging/gorner-aepp-staging-e5730831] Submitting job: 
103c0446e14749a10002 to session cluster.
2023-07-14 10:25:35,302 o.a.f.k.o.s.AbstractFlinkService [INFO 
][aelps-staging/gorner-session-staging-e5730831] Submitting job: 
de59304d370b4b8e0002 to session cluster.
{code}
In the cluster the JarRunHandler's handleRequest() method will get the request, 
all 4 jobIDs are present (also all args, etc are correct):
{code:java}
2023-07-14 10:25:35,320 WARN  
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler   [] - handleRequest 
- requestBody.jobId: 4968b186061e44390002
2023-07-14 10:25:35,321 WARN  
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler   [] - handleRequest 
- requestBody.jobId: de59304d370b4b8e0002
2023-07-14 10:25:35,321 WARN  
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler   [] - handleRequest 
- requestBody.jobId: 91a5260d916c4dff0002
2023-07-14 10:25:35,321 WARN  
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler   [] - handleRequest 
- requestBody.jobId: 103c0446e14749a10002
{code}
But once the EmbeddedExecutor's submitAndGetJobClientFuture() method is called 
instead of getting 1 call per jobID we have 4 calls but one of the jobIDs twice:
{code:java}
2023-07-14 10:25:35,616 WARN  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
execute - optJobId: Optional[4968b186061e44390002]
2023-07-14 10:25:35,616 WARN  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
execute - optJobId: Optional[103c0446e14749a10002]
2023-07-14 10:25:35,616 WARN  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
execute - optJobId: Optional[de59304d370b4b8e0002]
2023-07-14 10:25:35,721 WARN  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
execute - optJobId: Optional[de59304d370b4b8e0002]
{code}
If this is important: the jobGraph obtained does not match the jobID. We get 2 
times de59304d370b4b8e0002 but the jobgraph for this jobID is never 
returned by getJobGraph() in EmbeddedExecutor.submitAndGetJobClientFuture().

This will then lead to the job already existing:
{code:java}
2023-07-14 10:25:35,616 WARN  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
execute - submittedJobIds: []
2023-07-14 10:25:35,616 WARN  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
execute - submittedJobIds: []
2023-07-14 10:25:35,616 WARN  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
execute - submittedJobIds: []
2023-07-14 10:25:35,721 WARN  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
execute - submittedJobIds: [de59304d370b4b8e0002]
{code}
But since the jobs are completely different the execution will fail. Depending 
on the timing with one of the following exceptions:
 * RestHandlerException: No jobs included in application
 * ClassNotFoundException: 
io.dectris.aelps.pipelines.gorner.facility.FacilityEventProcessor

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-322 Cooldown period for adaptive scheduler. Second vote.

2023-07-14 Thread Prabhu Joseph
*+1 (non-binding)*

Thanks for working on this. We have seen good improvement during the cool
down period with this feature.
Below are details on the test results from one of our clusters:

On a scale-out operation, 8 new nodes were added one by one with a gap of
~30 seconds. There were 8 restarts within 4 minutes with the default
behaviour,
whereas only one with this feature (cooldown period of 4 minutes).

The number of records processed by the job with this feature during the
restart window is higher (2909764), whereas it is only 1323960 with the
default
behaviour due to multiple restarts, where it spends most of the time
recovering, and also whatever work progressed by the tasks after the last
successful completed checkpoint is lost.

Metrics Default Adaptive Scheduler Adaptive Scheduler With Cooldown Period
Remarks
NumRecordsProcessed 1323960 2909764 1. NumRecordsProcessed metric indicates
the difference the cool down period brings in. When the job is doing
multiple restarts, the task spends most of the time recovering, and the
progress the task made will be lost during the restart.

2. There is only one restart with Cool Down Period which happened when the
8th node got added back.

Job Parallelism 13 -> 20 -> 27 -> 34 -> 41 -> 48 -> 55 → 62 → 69 13 → 69
NumRestarts 8 1








On Wed, Jul 12, 2023 at 8:03 PM Etienne Chauchot 
wrote:

> Hi all,
>
> I'm going on vacation tonight for 3 weeks.
>
> Even if the vote is not finished, as the implementation is rather quick
> and the design discussion had settled, I preferred I implementing
> FLIP-322 [1] to allow people to take a look while I'm off.
>
> [1] https://github.com/apache/flink/pull/22985
>
> Best
>
> Etienne
>
> Le 12/07/2023 à 09:56, Etienne Chauchot a écrit :
> >
> > Hi all,
> >
> > Would you mind casting your vote to this second vote thread (opened
> > after new discussions) so that the subject can move forward ?
> >
> > @David, @Chesnay, @Robert you took part to the discussions, can you
> > please sent your vote ?
> >
> > Thank you very much
> >
> > Best
> >
> > Etienne
> >
> > Le 06/07/2023 à 13:02, Etienne Chauchot a écrit :
> >>
> >> Hi all,
> >>
> >> Thanks for your feedback about the FLIP-322: Cooldown period for
> >> adaptive scheduler [1].
> >>
> >> This FLIP was discussed in [2].
> >>
> >> I'd like to start a vote for it. The vote will be open for at least 72
> >> hours (until July 9th 15:00 GMT) unless there is an objection or
> >> insufficient votes.
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-322+Cooldown+period+for+adaptive+scheduler
> >> [2] https://lists.apache.org/thread/qvgxzhbp9rhlsqrybxdy51h05zwxfns6
> >>
> >> Best,
> >>
> >> Etienne


[jira] [Created] (FLINK-32591) Update document of Kafka Source: Enable Dynamic Partition Discovery by Default in Kafka Source

2023-07-14 Thread Hongshun Wang (Jira)
Hongshun Wang created FLINK-32591:
-

 Summary: Update document of Kafka Source: Enable Dynamic Partition 
Discovery by Default in Kafka Source
 Key: FLINK-32591
 URL: https://issues.apache.org/jira/browse/FLINK-32591
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Reporter: Hongshun Wang


Based on Flip 288,  dynamic partition discovery is enabled by Default in Kafka 
Source  now. some corresponding document in Chinese and English should be 
modified:
 * "Partition discovery is *disabled* by default. You need to explicitly set 
the partition discovery interval to enable this feature" in 
[https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/]
 * 
h5. scan.topic-partition-discovery.interval is (none) in 
[https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Deprecating Public API for 2.0 requiring FLIPs

2023-07-14 Thread Matthias Pohl
>
> I'm not aware of any decision that has already been made by the community
> regarding after which 1.x minor release will we ship the 2.0 major release.
> I personally don't think it's feasible to avoid 1.19, and even 1.20
> depending on the progress.


Ok, thanks. That clarification helped. I had the same impression but wasn't
sure whether I missed something.

Best,
Matthias

On Fri, Jul 14, 2023 at 5:09 AM Xintong Song  wrote:

> Hi Matthias,
>
> I'm not aware of any decision that has already been made by the community
> regarding after which 1.x minor release will we ship the 2.0 major release.
> I personally don't think it's feasible to avoid 1.19, and even 1.20
> depending on the progress.
>
> I also don't think we should push all the deprecation works in 1.18. In the
> "Deprecate multiple APIs in 1.18" thread [1], I only listed APIs that
> giving the impression already deprecated but are actually not fully (either
> in code or in documentation), in order to clarify the status and to
> properly deprecate the onse should be. We should not decide when to
> deprecate an existing API based on whether we would have a 1.19 or 1.20
> minor release. Deciding to deprecate / remove an API definitely deserves
> thorough discussions and FLIP, which takes time, and I don't think we
> should compromise that for any reason.
>
> I think the potential conflict is between not being able to deprecate APIs
> very soon (needs more discussions, the new replacing API is not ready,
> etc.) and the wish to ship 2.0 on time. Assuming at some point we want to
> ship the 2.0 release, and there are some deprecated APIs that we want to
> remove but have not fulfilled the migration period required by FLIP-321
> [2], I see 3 options to move forward.
> 1. Not removing the deprecated APIs in 2.0, carrying them until 3.0. This
> might be suitable if there are not a lot of deprecated APIs that we need to
> carry and the maintenance overhead is acceptable.
> 2. Postpone the 2.0 release for another minor release. This might be
> suitable if there are a lot of deprecated APIs.
> 3. Considering such APIs as exceptions of FLIP-321. This might be suitable
> if there are only a few of such APIs but have significant maintenance
> overhead. As discussed in FLIP-321, the introduction of the migration
> period effectively means we need to plan for the deprecation / removal of
> APIs early. As it is newly introduced and we haven't given developers the
> chance to plan things ahead, it's probably fair to make exceptions for API
> removal in the 2.0 version bump.
>
> My options are slightly different from what Chesnay has proposed. But I do
> agree that none of these options are great. I guess that's the price for
> not having the deprecation process earlier. Trying to deprecate things
> early is still helpful, because it reduces the number of APIs we need to
> consider when deciding between the options. However, that must not come at
> the price of rush decisions. I'd suggest developers to design / discuss /
> vote the breaking changes at their pace, and we evaluate the status and
> choose between the options later (maybe around the time releasing 1.19).
>
> If there are some contributors who think it makes sense, I will raise it in
> > the 1.18 release channel to postpone 1.18 feature freeze again.
> >
> I don't think postponing 1.18 would help a lot in this case, unless we
> postponed it for another couple of months. I don't think all the API
> changing plans can be finalized in a couple of weeks.
>
> Best,
>
> Xintong
>
>
> [1] https://lists.apache.org/thread/3dw4f8frlg8hzlv324ql7n2755bzs9hy
> [2] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
>
> On Thu, Jul 13, 2023 at 9:41 PM Jing Ge 
> wrote:
>
> > Hi,
> >
> > Thanks Matthias for starting this discussion and thanks Chesnay for the
> > clarification.
> >
> > I don't want to hijack this discussion but I would suggest postponing the
> > 1.18 feature freeze over postponing the deprecations to 1.19. If there
> are
> > some contributors who think it makes sense, I will raise it in the 1.18
> > release channel to postpone 1.18 feature freeze again.
> >
> > Speaking of the FLIP rules, if we follow it exactly like it is defined,
> we
> > should also write FLIPs when graduating @PublicEvloving APIs to be
> @Public,
> > especially for those APIs who will replace some deprecated APIs.  Doing
> > that is to guarantee that new public APIs will cover all
> > functionalities(including the capability that APIs are easy enough to
> > implement) that deprecated APIs offer, so that migrations can be executed
> > smoothly. With this in mind, we will avoid the big issue we are facing
> now
> > wrt the new Source and Sink API [1].
> >
> > Best regards,
> > Jing
> >
> > [1] https://lists.apache.org/thread/734zhkvs59w2o4d1rsnozr1bfqlr6rgm
> >
> > On Thu, Jul 13, 2023 at 3:01 PM Chesnay Schepler 
> > wrote:
> >
> > > The issue isn't avoiding 1.19.
> > >
> > > The issue is that if things aren't