date:20230309

Re: StreamingFileSink's DefaultRollingPolicy

2023-03-09 Thread yuxia

Hi, sorry for late reply.

1: If the exisiting OnCheckpointRollingPolicy can't meet your requirement, you
can customize a RollingPolicy extending CheckpointRollingPolicy.

2: > Why DefaultRollingPolicy don't always roll inProgressParts on checkpoint?
I'm also not the creator of `DefaultRollingPolicy`. But I would like to share
my thoughts. In whatever filesystem, we hate small files.
If default rolling policy will always roll inProgressParts on checkpoint, it'll
be like to produce many small files which is harmful.
And if don't think it's a problem and still want to roll file on checkpoint,
you can still customize your rolling policy.

Btw, more exactly, for row-encoded sink output, it'll will use
DefaultRollingPolicy by default, for bulk-encoded sink output, it'll use
OnCheckpointRollingPolicy.

Best regards,
Yuxia

- 原始邮件 -
发件人: "Lee, Keith"
收件人: "dev"
发送时间: 星期二, 2023年 3 月 07日 下午 10:39:44
主题: StreamingFileSink's DefaultRollingPolicy

Hi,

StreamingFileSink’s DefaultRollingPolicy does not always roll in progress parts
to pending parts on checkpoints. Is this by design? It seems counter intuitive
to me that part files are not finished on checkpoint by default.

https://github.com/ueshin/apache-flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.java#L70-L72

1. This has impact on S3 implementation in that inProgressPart files are
MultiPartUploads which can
expire.
When the files do expire, jobs can no longer be started from savepoints as they
will run into missing FileNotFoundException on the in progress part file. This
leaves users no options but to restart job without savepoint.
2. Having reference to inProgressPart files within savepoints also prevents
users from restarting job from earlier savepoints, should the user deem it
appropriate to replay the stream and rewrite to output. The exception should be
clearer if the intention is to prevent user from starting from earlier
savepoint to avoid them accidentally replaying stream (therefore violating
end-to-end exactly once).

May I propose that we change DefaultRollingPolicy to always roll
inProgressParts to pending on checkpoint?

Thank you
Keith

[jira] [Created] (FLINK-31395) AbstractPartitionDiscoverer.discoverPartitions calls remove on immutable collection

2023-03-09 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-31395:
-

 Summary: AbstractPartitionDiscoverer.discoverPartitions calls 
remove on immutable collection
 Key: FLINK-31395
 URL: https://issues.apache.org/jira/browse/FLINK-31395
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.16.1
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=47009=logs=aa18c3f6-13b8-5f58-86bb-c1cffb239496=502fb6c0-30a2-5e49-c5c2-a00fa3acb203=8459

{{FlinkKafkaConsumerBaseTest.testClosePartitionDiscovererWithCancellation}} 
failed because of that.

{code}
[...]
Mar 10 01:48:27 Caused by: java.lang.RuntimeException: 
java.lang.UnsupportedOperationException
Mar 10 01:48:27 at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.runWithPartitionDiscovery(FlinkKafkaConsumerBase.java:846)
Mar 10 01:48:27 at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:828)
Mar 10 01:48:27 at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBaseTest.lambda$testNormalConsumerLifecycle$9(FlinkKafkaConsumerBaseTest.java:695)
Mar 10 01:48:27 at 
org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:49)
Mar 10 01:48:27 at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
Mar 10 01:48:27 at 
java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632)
Mar 10 01:48:27 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
Mar 10 01:48:27 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
Mar 10 01:48:27 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
Mar 10 01:48:27 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Mar 10 01:48:27 Caused by: java.lang.UnsupportedOperationException
Mar 10 01:48:27 at java.util.Collections$1.remove(Collections.java:4686)
Mar 10 01:48:27 at 
org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:165)
Mar 10 01:48:27 at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.lambda$createAndStartDiscoveryLoop$2(FlinkKafkaConsumerBase.java:880)
Mar 10 01:48:27 at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-09 Thread David Morávek

+1 (binding)

Best,
D.

On Fri, Mar 10, 2023 at 4:49 AM Yuxin Tan  wrote:

> Thanks, Weihua!
> +1 (non-binding)
>
> Best,
> Yuxin
>
>
> weijie guo  于2023年3月10日周五 11:29写道：
>
> > +1 (binding)
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Shammon FY  于2023年3月10日周五 11:02写道：
> >
> > > Thanks weihua, +1 (non-binding)
> > >
> > > Best,
> > > Shammon
> > >
> > > On Fri, Mar 10, 2023 at 10:32 AM Xintong Song 
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > >
> > > > On Thu, Mar 9, 2023 at 1:28 PM Weihua Hu 
> > wrote:
> > > >
> > > > > Hi Everyone,
> > > > >
> > > > > I would like to start the vote on FLIP-298: Unifying the
> > Implementation
> > > > > of SlotManager [1]. The FLIP was discussed in this thread [2].
> > > > >
> > > > > This FLIP aims to unify the implementation of SlotManager in
> > > > > order to reduce maintenance costs.
> > > > >
> > > > > The vote will last for at least 72 hours (03/14, 15:00 UTC+8)
> > > > > unless there is an objection or insufficient votes. Thank you all.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
> > > > > [2]
> https://lists.apache.org/thread/ocssfxglpc8z7cto3k8p44mrjxwr67r9
> > > > >
> > > > > Best,
> > > > > Weihua
> > > > >
> > > >
> > >
> >
>

[jira] [Created] (FLINK-31394) Fix spark jar name in the create release script for table store

2023-03-09 Thread zhuangchong (Jira)

zhuangchong created FLINK-31394:
---

 Summary: Fix spark jar name in the create release script for table 
store
 Key: FLINK-31394
 URL: https://issues.apache.org/jira/browse/FLINK-31394
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Affects Versions: table-store-0.4.0
Reporter: zhuangchong






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31393) HsFileDataManager use an incorrect default timeout

2023-03-09 Thread Weijie Guo (Jira)

Weijie Guo created FLINK-31393:
--

 Summary: HsFileDataManager use an incorrect default timeout
 Key: FLINK-31393
 URL: https://issues.apache.org/jira/browse/FLINK-31393
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.16.1, 1.17.0
Reporter: Weijie Guo
Assignee: Weijie Guo


For batch shuffle(i.e. hybrid shuffle & sort-merge shuffle), If there is a 
fierce contention of the batch shuffle read memory, it will throw a 
{{TimeoutException}} to fail downstream task to release memory. But for hybrid 
shuffle, It uses an incorrect default timeout(5ms), this will make the job very 
easy to fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31392) Refactor classes code of full-compaction

2023-03-09 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-31392:


 Summary: Refactor classes code of full-compaction
 Key: FLINK-31392
 URL: https://issues.apache.org/jira/browse/FLINK-31392
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0


Refactor classes code of full-compaction, this is to prepare some shared codes 
for lookup changelog producer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31391) Introduce lookup changelog producer

2023-03-09 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-31391:


 Summary: Introduce lookup changelog producer
 Key: FLINK-31391
 URL: https://issues.apache.org/jira/browse/FLINK-31391
 Project: Flink
  Issue Type: New Feature
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0


Currently, only full-compaction can produce changelog, some merge-engine must 
have changelog producing, for example, partial-update and aggregation. But 
full-compaction is very heavy, write amplification is big huge...

We should introduce a new changelog producer, supports lower latency to produce 
changelog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-301: Hybrid Shuffle supports Remote Storage

2023-03-09 Thread Weihua Hu

Thanks Yuxin for your explanation.

That sounds reasonable. Looking forward to the new shuffle.


Best,
Weihua


On Fri, Mar 10, 2023 at 11:48 AM Yuxin Tan  wrote:

> Hi, Weihua,
> Thanks for the questions and the ideas.
>
> > 1. How many performance regressions would there be if we only
> used remote storage?
>
> The new architecture can support to use remote storage only, but this
> FLIP target is to improve job stability. And the change in the FLIP has
> been significantly complex and the goal of the first version is to update
> Hybrid Shuffle to the new architecture and support remote storage as
> a supplement. The performance of this version is not the first priority,
> so we haven’t tested the performance of using only remote storage.
> If there are indeed regressions, we will keep optimizing the performance
> of the remote storages and improve it until only remote storage is
> available in the production environment.
>
> > 2. Shall we move the local data to remote storage if the producer is
> finished for a long time?
>
> I agree that it is a good idea, which can release task manager resources
> more timely. But moving data from TM local disk to remote storage needs
> more detailed discussion and design, and it is easier to implement it based
> on the new architecture. Considering the complexity, the target focus, and
> the iteration cycle of the FLIP, we decide that the details are not
> included
> in the first version. We will extend and implement them in the subsequent
> versions.
>
> Best,
> Yuxin
>
>
> Weihua Hu  于2023年3月9日周四 11:22写道：
>
> > Hi, Yuxin
> >
> > Thanks for driving this FLIP.
> >
> > The remote storage shuffle could improve the stability of Batch jobs.
> >
> > In our internal scenario, we use a hybrid cluster to run both
> > Streaming(high priority)
> > and Batch jobs(low priority). When there is not enough resources(such as
> > cpu usage
> > reaches a threshold), the batch containers will be evicted. So this will
> > cause some re-run
> > of batch tasks.
> >
> > It would be a great help if the remote storage could address this. So I
> > have a few questions.
> >
> > 1. How many performance regressions would there be if we only used remote
> > storage?
> >
> > 2. In current design, the shuffle data segment will write to one kind of
> > storage tier.
> > Shall we move the local data to remote storage if the producer is
> finished
> > for a long time?
> > So we can release the idle task manager with no shuffle data on it. This
> > may help to reduce
> > the resource usage when producer parallelism is larger than consume.
> >
> > Best,
> > Weihua
> >
> >
> > On Thu, Mar 9, 2023 at 10:38 AM Yuxin Tan 
> wrote:
> >
> > > Hi, Junrui,
> > > Thanks for the suggestions and ideas.
> > >
> > > > If they are fixed, I suggest that FLIP could provide clearer
> > > explanations.
> > > I have updated the FLIP and described the segment size more clearly.
> > >
> > > > can we provide configuration options for users to manually adjust the
> > > sizes?
> > > The segment size can be configured if necessary. But considering that
> if
> > we
> > > exposed these parameters prematurely, it may be difficult to modify the
> > > implementation later because the user has already used the configs. We
> > > can make these internal configs or fixed values when implementing the
> > first
> > > version, I think we can use either of these two ways, because they are
> > > internal and do not affect the public APIs.
> > >
> > > Best,
> > > Yuxin
> > >
> > >
> > > Junrui Lee  于2023年3月8日周三 00:24写道：
> > >
> > > > Hi Yuxin,
> > > >
> > > > This FLIP looks quite reasonable. Flink can solve the problem of
> Batch
> > > > shuffle by
> > > > combining local and remote storage, and can use fixed local disks for
> > > > better performance
> > > >  in most scenarios, while using remote storage as a supplement when
> > local
> > > > disks are not
> > > >  sufficient, avoiding wasteful costs and poor job stability.
> Moreover,
> > > the
> > > > solution also
> > > > considers the issue of dynamic switching, which can automatically
> > switch
> > > to
> > > > remote
> > > > storage when the local disk is full, saving costs, and automatically
> > > switch
> > > > back when
> > > > there is available space on the local disk.
> > > >
> > > > As Wencong Liu stated, an appropriate segment size is essential, as
> it
> > > can
> > > > significantly
> > > > affect shuffle performance. I also agree that the first version
> should
> > > > focus mainly on the
> > > > design and implementation. However, I have a small question about
> > FLIP. I
> > > > did not see
> > > > any information regarding the segment size of memory, local disk, and
> > > > remote storage
> > > > in this FLIP. Are these three values fixed at present? If they are
> > > fixed, I
> > > > suggest that FLIP
> > > > could provide clearer explanations. Moreover, although a dynamic
> > segment
> > > > size
> > > > mechanism is not necessary at the moment, can we provide
>

Re: [Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-09 Thread Yuxin Tan

Thanks, Weihua!
+1 (non-binding)

Best,
Yuxin


weijie guo  于2023年3月10日周五 11:29写道：

> +1 (binding)
>
> Best regards,
>
> Weijie
>
>
> Shammon FY  于2023年3月10日周五 11:02写道：
>
> > Thanks weihua, +1 (non-binding)
> >
> > Best,
> > Shammon
> >
> > On Fri, Mar 10, 2023 at 10:32 AM Xintong Song 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Thu, Mar 9, 2023 at 1:28 PM Weihua Hu 
> wrote:
> > >
> > > > Hi Everyone,
> > > >
> > > > I would like to start the vote on FLIP-298: Unifying the
> Implementation
> > > > of SlotManager [1]. The FLIP was discussed in this thread [2].
> > > >
> > > > This FLIP aims to unify the implementation of SlotManager in
> > > > order to reduce maintenance costs.
> > > >
> > > > The vote will last for at least 72 hours (03/14, 15:00 UTC+8)
> > > > unless there is an objection or insufficient votes. Thank you all.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
> > > > [2]https://lists.apache.org/thread/ocssfxglpc8z7cto3k8p44mrjxwr67r9
> > > >
> > > > Best,
> > > > Weihua
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-301: Hybrid Shuffle supports Remote Storage

2023-03-09 Thread Yuxin Tan

Hi, Weihua,
Thanks for the questions and the ideas.

> 1. How many performance regressions would there be if we only
used remote storage?

The new architecture can support to use remote storage only, but this
FLIP target is to improve job stability. And the change in the FLIP has
been significantly complex and the goal of the first version is to update
Hybrid Shuffle to the new architecture and support remote storage as
a supplement. The performance of this version is not the first priority,
so we haven’t tested the performance of using only remote storage.
If there are indeed regressions, we will keep optimizing the performance
of the remote storages and improve it until only remote storage is
available in the production environment.

> 2. Shall we move the local data to remote storage if the producer is
finished for a long time?

I agree that it is a good idea, which can release task manager resources
more timely. But moving data from TM local disk to remote storage needs
more detailed discussion and design, and it is easier to implement it based
on the new architecture. Considering the complexity, the target focus, and
the iteration cycle of the FLIP, we decide that the details are not
included
in the first version. We will extend and implement them in the subsequent
versions.

Best,
Yuxin


Weihua Hu  于2023年3月9日周四 11:22写道：

> Hi, Yuxin
>
> Thanks for driving this FLIP.
>
> The remote storage shuffle could improve the stability of Batch jobs.
>
> In our internal scenario, we use a hybrid cluster to run both
> Streaming(high priority)
> and Batch jobs(low priority). When there is not enough resources(such as
> cpu usage
> reaches a threshold), the batch containers will be evicted. So this will
> cause some re-run
> of batch tasks.
>
> It would be a great help if the remote storage could address this. So I
> have a few questions.
>
> 1. How many performance regressions would there be if we only used remote
> storage?
>
> 2. In current design, the shuffle data segment will write to one kind of
> storage tier.
> Shall we move the local data to remote storage if the producer is finished
> for a long time?
> So we can release the idle task manager with no shuffle data on it. This
> may help to reduce
> the resource usage when producer parallelism is larger than consume.
>
> Best,
> Weihua
>
>
> On Thu, Mar 9, 2023 at 10:38 AM Yuxin Tan  wrote:
>
> > Hi, Junrui,
> > Thanks for the suggestions and ideas.
> >
> > > If they are fixed, I suggest that FLIP could provide clearer
> > explanations.
> > I have updated the FLIP and described the segment size more clearly.
> >
> > > can we provide configuration options for users to manually adjust the
> > sizes?
> > The segment size can be configured if necessary. But considering that if
> we
> > exposed these parameters prematurely, it may be difficult to modify the
> > implementation later because the user has already used the configs. We
> > can make these internal configs or fixed values when implementing the
> first
> > version, I think we can use either of these two ways, because they are
> > internal and do not affect the public APIs.
> >
> > Best,
> > Yuxin
> >
> >
> > Junrui Lee  于2023年3月8日周三 00:24写道：
> >
> > > Hi Yuxin,
> > >
> > > This FLIP looks quite reasonable. Flink can solve the problem of Batch
> > > shuffle by
> > > combining local and remote storage, and can use fixed local disks for
> > > better performance
> > >  in most scenarios, while using remote storage as a supplement when
> local
> > > disks are not
> > >  sufficient, avoiding wasteful costs and poor job stability. Moreover,
> > the
> > > solution also
> > > considers the issue of dynamic switching, which can automatically
> switch
> > to
> > > remote
> > > storage when the local disk is full, saving costs, and automatically
> > switch
> > > back when
> > > there is available space on the local disk.
> > >
> > > As Wencong Liu stated, an appropriate segment size is essential, as it
> > can
> > > significantly
> > > affect shuffle performance. I also agree that the first version should
> > > focus mainly on the
> > > design and implementation. However, I have a small question about
> FLIP. I
> > > did not see
> > > any information regarding the segment size of memory, local disk, and
> > > remote storage
> > > in this FLIP. Are these three values fixed at present? If they are
> > fixed, I
> > > suggest that FLIP
> > > could provide clearer explanations. Moreover, although a dynamic
> segment
> > > size
> > > mechanism is not necessary at the moment, can we provide configuration
> > > options for users
> > >  to manually adjust these sizes? I think it might be useful.
> > >
> > > Best,
> > > Junrui.
> > >
> > > Yuxin Tan  于2023年3月7日周二 20:14写道：
> > >
> > > > Thanks for joining the discussion.
> > > >
> > > > @weijie guo
> > > > > 1. How to optimize the broadcast result partition?
> > > > For the partitions with multi-consumers, e.g., broadcast result
> > > partition,
> > > > partition reuse,
> >

Re: [Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-09 Thread weijie guo

+1 (binding)

Best regards,

Weijie


Shammon FY  于2023年3月10日周五 11:02写道：

> Thanks weihua, +1 (non-binding)
>
> Best,
> Shammon
>
> On Fri, Mar 10, 2023 at 10:32 AM Xintong Song 
> wrote:
>
> > +1 (binding)
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Thu, Mar 9, 2023 at 1:28 PM Weihua Hu  wrote:
> >
> > > Hi Everyone,
> > >
> > > I would like to start the vote on FLIP-298: Unifying the Implementation
> > > of SlotManager [1]. The FLIP was discussed in this thread [2].
> > >
> > > This FLIP aims to unify the implementation of SlotManager in
> > > order to reduce maintenance costs.
> > >
> > > The vote will last for at least 72 hours (03/14, 15:00 UTC+8)
> > > unless there is an objection or insufficient votes. Thank you all.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
> > > [2]https://lists.apache.org/thread/ocssfxglpc8z7cto3k8p44mrjxwr67r9
> > >
> > > Best,
> > > Weihua
> > >
> >
>

Re: [Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-09 Thread John Roesler

Thanks, Weihua!

I’m +1 (non-binding)

-John

On Thu, Mar 9, 2023, at 21:02, Shammon FY wrote:
> Thanks weihua, +1 (non-binding)
>
> Best,
> Shammon
>
> On Fri, Mar 10, 2023 at 10:32 AM Xintong Song  wrote:
>
>> +1 (binding)
>>
>> Best,
>>
>> Xintong
>>
>>
>>
>> On Thu, Mar 9, 2023 at 1:28 PM Weihua Hu  wrote:
>>
>> > Hi Everyone,
>> >
>> > I would like to start the vote on FLIP-298: Unifying the Implementation
>> > of SlotManager [1]. The FLIP was discussed in this thread [2].
>> >
>> > This FLIP aims to unify the implementation of SlotManager in
>> > order to reduce maintenance costs.
>> >
>> > The vote will last for at least 72 hours (03/14, 15:00 UTC+8)
>> > unless there is an objection or insufficient votes. Thank you all.
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
>> > [2]https://lists.apache.org/thread/ocssfxglpc8z7cto3k8p44mrjxwr67r9
>> >
>> > Best,
>> > Weihua
>> >
>>

[jira] [Created] (FLINK-31390) Optimize the FlinkChangelogModeInferenceProgram by avoiding unnecessary traversals.

2023-03-09 Thread Aitozi (Jira)

Aitozi created FLINK-31390:
--

 Summary: Optimize the FlinkChangelogModeInferenceProgram by 
avoiding unnecessary traversals.
 Key: FLINK-31390
 URL: https://issues.apache.org/jira/browse/FLINK-31390
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: Aitozi


We can avoid the unnecessary traversals of the RelNode tree, since we are only 
interested in the first satisfied plan.

 

FlinkChangelogModeInferenceProgram
{code:java}
val updateKindTraitVisitor = new SatisfyUpdateKindTraitVisitor(context)
val finalRoot = requiredUpdateKindTraits.flatMap {
  requiredUpdateKindTrait =>
updateKindTraitVisitor.visit(rootWithModifyKindSet, 
requiredUpdateKindTrait)
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-09 Thread Shammon FY

Thanks weihua, +1 (non-binding)

Best,
Shammon

On Fri, Mar 10, 2023 at 10:32 AM Xintong Song  wrote:

> +1 (binding)
>
> Best,
>
> Xintong
>
>
>
> On Thu, Mar 9, 2023 at 1:28 PM Weihua Hu  wrote:
>
> > Hi Everyone,
> >
> > I would like to start the vote on FLIP-298: Unifying the Implementation
> > of SlotManager [1]. The FLIP was discussed in this thread [2].
> >
> > This FLIP aims to unify the implementation of SlotManager in
> > order to reduce maintenance costs.
> >
> > The vote will last for at least 72 hours (03/14, 15:00 UTC+8)
> > unless there is an objection or insufficient votes. Thank you all.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
> > [2]https://lists.apache.org/thread/ocssfxglpc8z7cto3k8p44mrjxwr67r9
> >
> > Best,
> > Weihua
> >
>

Re: [Vote] FLIP-298: Unifying the Implementation of SlotManager

2023-03-09 Thread Xintong Song

+1 (binding)

Best,

Xintong



On Thu, Mar 9, 2023 at 1:28 PM Weihua Hu  wrote:

> Hi Everyone,
>
> I would like to start the vote on FLIP-298: Unifying the Implementation
> of SlotManager [1]. The FLIP was discussed in this thread [2].
>
> This FLIP aims to unify the implementation of SlotManager in
> order to reduce maintenance costs.
>
> The vote will last for at least 72 hours (03/14, 15:00 UTC+8)
> unless there is an objection or insufficient votes. Thank you all.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager
> [2]https://lists.apache.org/thread/ocssfxglpc8z7cto3k8p44mrjxwr67r9
>
> Best,
> Weihua
>

[jira] [Created] (FLINK-31389) Fix spark jar name in docs for table store

2023-03-09 Thread zhuangchong (Jira)

zhuangchong created FLINK-31389:
---

 Summary: Fix spark jar name in docs for table store
 Key: FLINK-31389
 URL: https://issues.apache.org/jira/browse/FLINK-31389
 Project: Flink
  Issue Type: Bug
  Components: Table Store
Affects Versions: table-store-0.4.0
Reporter: zhuangchong
 Fix For: table-store-0.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Flink metric collection

2023-03-09 Thread Atul Lal

Hi everyone,

I also think the reporter won't work for my use case, it could work but
then I am trying to modify the placement of operators on taskmanagers
manually based on my some decisions made from metric collection, reporter
makes it harder to do so. Right now, I have REST APIs, but I was thinking
if in some way I could create a new MetricFetcher object and use that for
collecting metrics instead of REST APIs.

Is it possible to use MetricFetcher?

Thanks a lot for the help,
Atul

On Thu, Mar 9, 2023 at 7:39 PM Mason Chen  wrote:

> Hi all,
>
> Metric reporter may be useful if you only need per component level metrics
> like jobmanager and each taskmanager since the metric reporter runs in each
> component. However, for an aggregated job level view of metrics, there is
> no better out-of-the-box/user-facing way to get metrics than the REST API.
> A good example in using the REST API is the k8s operator autoscaler which
> needs to scrape metrics:
>
> https://github.com/apache/flink-kubernetes-operator/tree/main/flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler
> .
>
> Best,
> Mason
>
> On Thu, Mar 9, 2023 at 4:34 AM Hang Ruan  wrote:
>
> > Hi, Atul,
> >
> > I think the metric reporter[1] will be helpful for you.
> >
> > Best,
> > Hang
> >
> > [1]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/metric_reporters/
> >
> > Atul Lal  于2023年3月9日周四 17:49写道：
> >
> > > Hi everyone,
> > >
> > > I am trying to do some experiments with Flink. I am trying to modify
> the
> > > source code of Flink for this experiment, and I am starting a process
> > > thread from JobMaster.java constructor whenever a new job is started in
> > > Flink. In this thread, I want to monitor a few metrics related to the
> job
> > > and make some decisions based on it.
> > >
> > > Is there any way to collect metrics related to a job without using REST
> > > endpoints? Because I think using REST endpoints here is pointless as
> this
> > > is internal code running on JobMaster. If there is no other way than
> > using
> > > REST endpoints, is there any easy way to serialize or parse JSON
> > responses
> > > from those endpoints as the response structures are already defined in
> > the
> > > classes.
> > >
> > > I would really appreciate it if someone could help me with this.
> > >
> > > Thank you,
> > > Atul
> > >
> >
>

Re: Flink metric collection

2023-03-09 Thread Mason Chen

Hi all,

Metric reporter may be useful if you only need per component level metrics
like jobmanager and each taskmanager since the metric reporter runs in each
component. However, for an aggregated job level view of metrics, there is
no better out-of-the-box/user-facing way to get metrics than the REST API.
A good example in using the REST API is the k8s operator autoscaler which
needs to scrape metrics:
https://github.com/apache/flink-kubernetes-operator/tree/main/flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler
.

Best,
Mason

On Thu, Mar 9, 2023 at 4:34 AM Hang Ruan  wrote:

> Hi, Atul,
>
> I think the metric reporter[1] will be helpful for you.
>
> Best,
> Hang
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/metric_reporters/
>
> Atul Lal  于2023年3月9日周四 17:49写道：
>
> > Hi everyone,
> >
> > I am trying to do some experiments with Flink. I am trying to modify the
> > source code of Flink for this experiment, and I am starting a process
> > thread from JobMaster.java constructor whenever a new job is started in
> > Flink. In this thread, I want to monitor a few metrics related to the job
> > and make some decisions based on it.
> >
> > Is there any way to collect metrics related to a job without using REST
> > endpoints? Because I think using REST endpoints here is pointless as this
> > is internal code running on JobMaster. If there is no other way than
> using
> > REST endpoints, is there any easy way to serialize or parse JSON
> responses
> > from those endpoints as the response structures are already defined in
> the
> > classes.
> >
> > I would really appreciate it if someone could help me with this.
> >
> > Thank you,
> > Atul
> >
>

[jira] [Created] (FLINK-31388) restart from savepoint fails with "userVisibleTail should not be larger than offset. This is a bug."

2023-03-09 Thread David Anderson (Jira)

David Anderson created FLINK-31388:
--

 Summary: restart from savepoint fails with "userVisibleTail should 
not be larger than offset. This is a bug."
 Key: FLINK-31388
 URL: https://issues.apache.org/jira/browse/FLINK-31388
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.16.1
Reporter: David Anderson


I took a savepoint, then used 
{code:java}
SET 'execution.savepoint.path' = ...
{code}
to set the savepoint path, and then re-executed the query that had been running 
before the stop-with-savepoint.

It was not an INSERT INTO job, but rather a "collect" job running a SELECT 
query.

It then failed with
{code:java}
userVisibleTail should not be larger than offset. This is a bug.
{code}

Perhaps there is an unstated requirement that using the sql-client to restart 
from a savepoint only works with INSERT INTO jobs?





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-300: Add targetColumns to DynamicTableSink#Context to solve the null overwrite problem of partial-insert

2023-03-09 Thread Jing Ge

Hi Lincoln,

sounds good to me. Thanks!

Best,
Jing

On Wed, Mar 8, 2023 at 2:01 PM Lincoln Lee  wrote:

> Hi Jing,
> Agree with you that using formal terms can be easier to users, I've updated
> the FLIP[1], since this is only one of the application scenarios for
> partial insert, our java doc for the corresponding interface will describe
> the partial insert message itself from a generic point of view, WDTY?
>
> @Jacky thanks for your feedback!
> here are my thoughts for the two questions:
> for this scenario, I don't think the planner should report an error. We
> cannot assume that such usage will necessarily result in errors or that
> users are unaware of potential risks (just like in a database, similar
> operations are not prompted with errors). In the streaming scenario,
> regarding the risks associated with the multi-insert operation with
> overlapping fields, we may consider expanding the plan advice (FLIP-280 has
> just added possibilities to support this) to prompt users instead of
> rejecting the operation with an error.
> > 1. if the two insert into with same columns, the result is not
> nondeterminism. will it check in planner and throw exception
>
> yes, not all connectors support partial insert. Therefore, the introduction
> of this interface is only intended as additional information for the
> connectors that need it. The new `targetColumns` only provide the column
> list information corresponding to the statement according to the SQL
> standard, and existing connectors do not need to make any passive changes
> by default.
> > 2. some sink connectors can not supports it like queue such as kafka
> compacted topic. will also it check in planner  and throw exception
>
> welcome your feedback!
>
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240885081
>
>
> Best,
> Lincoln Lee
>
>
> Jacky Lau  于2023年3月8日周三 20:11写道：
>
> > Thanks for bringing this up. this is a good feature. but i have two
> > questions:
> > 1. if the two insert into with same columns, the result is
> > not  nondeterminism. will it check in planner and throw exception
> > 2. some sink connectors can not supports it like queue such as kafka
> > compacted topic. will also it check in planner  and throw exception
> >
> > Lincoln Lee  于2023年3月7日周二 14:53写道：
> >
> > > Hi Aitozi,
> > >
> > > Thanks for your feedback!  Yes, including HBase and JDBC connector,
> they
> > > can be considered for support in the next step (JDBC as as a standard
> > > protocol supported not only in traditional databases, but also in more
> > and
> > > more new types of storage). Considering the ongoing externalizing of
> > > connectors and the release cycles of the connectors are decoupled with
> > the
> > > release cycle of Flink, we can initiate corresponding support issues
> for
> > > specific connectors to follow up on support after finalizing the API
> > > changes, WDYT?
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Hang Ruan  于2023年3月7日周二 12:14写道：
> > >
> > > > Hi, Lincoln,
> > > >
> > > > Thanks for bringing this up. It looks good to me. I also agree with
> > > > Jingsong's suggestion.
> > > >
> > > > Best,
> > > > Hang
> > > >
> > > > Jingsong Li  于2023年3月7日周二 11:15写道：
> > > >
> > > > > Wow, we have 300 FLIPs...
> > > > >
> > > > > Thanks Lincoln,
> > > > >
> > > > > Have you considered returning an Optional?
> > > > >
> > > > > Empty array looks a little weird to me.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Tue, Mar 7, 2023 at 10:32 AM Aitozi 
> wrote:
> > > > > >
> > > > > > Hi Lincoln,
> > > > > > Thank you for sharing this FLIP. Overall, it looks good to
> me.
> > I
> > > > have
> > > > > > one question: with the introduction of this interface,
> > > > > > will any existing Flink connectors need to be updated in order to
> > > take
> > > > > > advantage of its capabilities? For example, HBase.
> > > > > >
> > > > > > yuxia  于2023年3月7日周二 10:01写道：
> > > > > >
> > > > > > > Thanks. It makes sense to me.
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Yuxia
> > > > > > >
> > > > > > > - 原始邮件 -
> > > > > > > 发件人: "Lincoln Lee" 
> > > > > > > 收件人: "dev" 
> > > > > > > 发送时间: 星期一, 2023年 3 月 06日 下午 10:26:26
> > > > > > > 主题: Re: [DISCUSS] FLIP-300: Add targetColumns to
> > > > > DynamicTableSink#Context
> > > > > > > to solve the null overwrite problem of partial-insert
> > > > > > >
> > > > > > > hi yuxia,
> > > > > > >
> > > > > > > Thanks for your feedback and tracking the issue of update
> > > statement!
> > > > > I've
> > > > > > > updated the FLIP[1] and also the poc[2].
> > > > > > > Since the bug and flip are orthogonal, we can focus on
> finalizing
> > > the
> > > > > api
> > > > > > > changes first, and then work on the flip implementation and
> > bugfix
> > > > > > > separately, WDYT?
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240885081
> > > > > > > [2]

Re: [ANNOUNCE] Release 1.17.0, release candidate #1

2023-03-09 Thread Martijn Visser

Hi Yingjie,

Thanks for the test and identifying the issue, this is super helpful!

To all others, please continue your testing on this RC so that if there are
more blockers to be found, we can fix them with the next RC and have
(hopefully) a successful vote on it.

Best regards,

Martijn

On Thu, Mar 9, 2023 at 4:54 PM Yingjie Cao  wrote:

> Hi community and release managers:
>
> When testing the release candidate #1 for batch scenario, I found a
> potential deadlock issue of blocking shuffle. I have created a ticket [1]
> for it and marked it as blocker. I will fix it no later than tomorrow.
>
> [1] https://issues.apache.org/jira/browse/FLINK-31386
>
> Best regards,
> Yingjie
>
> Qingsheng Ren  于2023年3月9日周四 13:51写道：
>
> > Hi everyone,
> >
> > The RC1 for Apache Flink 1.17.0 has been created. This RC currently is
> for
> > preview only to facilitate the integrated testing since the release
> > announcement is still under review. The voting process will be triggered
> > once the announcement is ready. It has all the artifacts that we would
> > typically have for a release, except for the release note and the website
> > pull request for the release announcement.
> >
> > The following contents are available for your review:
> >
> > - The preview source release and binary convenience releases [1], which
> > are signed with the key with fingerprint A1BD477F79D036D2C30C [2].
> > - all artifacts that would normally be deployed to the Maven
> > Central Repository [3].
> > - source code tag "release-1.17.0-rc1" [4]
> >
> > Your help testing the release will be greatly appreciated! And we'll
> > create the voting thread as soon as all the efforts are finished.
> >
> > [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.17.0-rc1
> > [2] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [3]
> https://repository.apache.org/content/repositories/orgapacheflink-1591
> > [4] https://github.com/apache/flink/releases/tag/release-1.17.0-rc1
> >
> > Best regards,
> > Qingsheng, Leonard, Matthias and Martijn
> >
>

[jira] [Created] (FLINK-31387) StreamTaskCancellationTest.testCancelTaskShouldPreventAdditionalProcessingTimeTimersFromBeingFired failed with an assertion

2023-03-09 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-31387:
-

 Summary: 
StreamTaskCancellationTest.testCancelTaskShouldPreventAdditionalProcessingTimeTimersFromBeingFired
 failed with an assertion
 Key: FLINK-31387
 URL: https://issues.apache.org/jira/browse/FLINK-31387
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.17.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46994=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9253

{code}
Mar 09 14:04:42 [ERROR] 
org.apache.flink.streaming.runtime.tasks.StreamTaskCancellationTest.testCancelTaskShouldPreventAdditionalProcessingTimeTimersFromBeingFired
  Time elapsed: 0.018 s  <<< FAILURE!
Mar 09 14:04:42 java.lang.AssertionError: 
Mar 09 14:04:42 
Mar 09 14:04:42 Expecting AtomicInteger(0) to have value:
Mar 09 14:04:42   10
Mar 09 14:04:42 but did not.
Mar 09 14:04:42 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskCancellationTest.testCancelTaskShouldPreventAdditionalTimersFromBeingFired(StreamTaskCancellationTest.java:305)
Mar 09 14:04:42 at 
org.apache.flink.streaming.runtime.tasks.StreamTaskCancellationTest.testCancelTaskShouldPreventAdditionalProcessingTimeTimersFromBeingFired(StreamTaskCancellationTest.java:281)
Mar 09 14:04:42 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] Release 1.17.0, release candidate #1

2023-03-09 Thread Yingjie Cao

Hi community and release managers:

When testing the release candidate #1 for batch scenario, I found a
potential deadlock issue of blocking shuffle. I have created a ticket [1]
for it and marked it as blocker. I will fix it no later than tomorrow.

[1] https://issues.apache.org/jira/browse/FLINK-31386

Best regards,
Yingjie

Qingsheng Ren  于2023年3月9日周四 13:51写道：

> Hi everyone,
>
> The RC1 for Apache Flink 1.17.0 has been created. This RC currently is for
> preview only to facilitate the integrated testing since the release
> announcement is still under review. The voting process will be triggered
> once the announcement is ready. It has all the artifacts that we would
> typically have for a release, except for the release note and the website
> pull request for the release announcement.
>
> The following contents are available for your review:
>
> - The preview source release and binary convenience releases [1], which
> are signed with the key with fingerprint A1BD477F79D036D2C30C [2].
> - all artifacts that would normally be deployed to the Maven
> Central Repository [3].
> - source code tag "release-1.17.0-rc1" [4]
>
> Your help testing the release will be greatly appreciated! And we'll
> create the voting thread as soon as all the efforts are finished.
>
> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.17.0-rc1
> [2] https://dist.apache.org/repos/dist/release/flink/KEYS
> [3] https://repository.apache.org/content/repositories/orgapacheflink-1591
> [4] https://github.com/apache/flink/releases/tag/release-1.17.0-rc1
>
> Best regards,
> Qingsheng, Leonard, Matthias and Martijn
>

[jira] [Created] (FLINK-31386) Fix the potential deadlock issue of blocking shuffle

2023-03-09 Thread Yingjie Cao (Jira)

Yingjie Cao created FLINK-31386:
---

 Summary: Fix the potential deadlock issue of blocking shuffle
 Key: FLINK-31386
 URL: https://issues.apache.org/jira/browse/FLINK-31386
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Reporter: Yingjie Cao
 Fix For: 1.17.0


Currently, the SortMergeResultPartition may allocate more network buffers than 
the guaranteed size of the LocalBufferPool. As a result, some result partitions 
may need to wait other result partitions to release the over-allocated network 
buffers to continue. However, the result partitions which have allocated more 
than guaranteed buffers relies on the processing of input data to trigger data 
spilling and buffer recycling. The input data further relies on batch reading 
buffers used by the SortMergeResultPartitionReadScheduler which may already 
taken by those blocked result partitions which are waiting for buffers. Then 
deadlock occurs. We can easily fix this deadlock by reserving the guaranteed 
buffers on initializing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager

2023-03-09 Thread Matthias Pohl

Thanks for your clarification. I have nothing else to add to the
discussion. +1 from my side to proceed

On Wed, Mar 8, 2023 at 4:16 AM Weihua Hu  wrote:

> Thanks Yangze for your attention, this would be a great help.
>
> And thanks Matthias too.
>
> FLIP-156 [1] mentions some incompatibility between fine-grained resource
> > management and reactive mode. I assume that this is independent of the
> > SlotManager and replacing the DSM with the FGSM wouldn't affect reactive
> > mode?
>
> Yes. This incompatibility is independent of SlotManager. That means the
> AdpativeScheduler will always ignore the resource requirement set by
> slotSharingGroup and declare Unknown ResourceProfile to SlotManager.
> So, using FGSM as default will not affect reactive mode.
>
> About the heterogeneous TaskManager: This is a feature that's also not
> > supported in the DSM right now, is it? We should state that fact in the
> > FLIP if we mentioned that we don't want to implement it for the FSGM.
>
> Yes, both DSM and FGSM do not support request heterogeneous
> TaskManager right now. Heterogeneous will make the resource allocation
> logic more complicated, such as the resource deadlock if request A
> allocated
> the bigger slot B and then request B could not allocate the small slot A.
> We
> need to think more before starting to support the heterogeneous task
> manager.
> So, we don't want to implement heterogeneity in this FLIP.
>
> Best,
> Weihua
>
>
> On Wed, Mar 8, 2023 at 12:44 AM Matthias Pohl
>  wrote:
>
> > Thanks for updating the FLIP and adding more context to it. Additionally,
> > thanks to Xintong and Yangze for offering your expertise here as
> > contributors to the initial FineGrainedSlotManager implementation.
> >
> > The remark on cutting out functionality was only based on some
> superficial
> > initial code reading. I cannot come up with a better code structure
> myself.
> > Therefore, I'm fine with not refactoring the code as part of this FLIP.
> >
> > The strategies that were proposed around making sure that the refactoring
> > is properly backed by tests sound reasonable. My initial concern was
> based
> > on the fact that we might have unit test scenarios for the DSM that are
> not
> > covered in the unit tests of the FSGM. In that case, swapping the DSM
> with
> > the FSGM might not be good enough. Going over the GSM tests to make sure
> > that we're not accidentally deleting test scenarios sounds good to me.
> > Thanks, Weihua.
> >
> > FLIP-156 [1] mentions some incompatibility between fine-grained resource
> > management and reactive mode. I assume that this is independent of the
> > SlotManager and replacing the DSM with the FGSM wouldn't affect reactive
> > mode?
> >
> > About the heterogeneous TaskManager: This is a feature that's also not
> > supported in the DSM right now, is it? We should state that fact in the
> > FLIP if we mentioned that we don't want to implement it for the FSGM.
> >
> > Best,
> > Matthias
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >
> > On Tue, Mar 7, 2023 at 8:58 AM Yangze Guo  wrote:
> >
> > > Hi Weihua,
> > >
> > > Thanks for driving this. As Xintong mentioned, this was a technical
> > > debt from FLIP-56.
> > >
> > > The latest version of FLIP sounds good, +1 from my side. As a
> > > contributor to this component, I'm willing to assist with the review
> > > process. Feel free to reach me if you need help.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Tue, Mar 7, 2023 at 1:47 PM Weihua Hu 
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > @David @Matthias
> > > > There are a few days after hearing your thoughts. I would like to
> know
> > if
> > > > there are any other concerns about this FLIP.
> > > >
> > > >
> > > > Best,
> > > > Weihua
> > > >
> > > >
> > > > On Mon, Mar 6, 2023 at 7:53 PM Weihua Hu 
> > wrote:
> > > >
> > > > >
> > > > > Thanks Shammon,
> > > > >
> > > > > I've updated FLIP to add this redundant Task Manager limitation.
> > > > >
> > > > >
> > > > > Best,
> > > > > Weihua
> > > > >
> > > > >
> > > > > On Mon, Mar 6, 2023 at 5:07 PM Shammon FY 
> wrote:
> > > > >
> > > > >> Hi weihua
> > > > >>
> > > > >> Can you add content related to `heterogeneous resources` to this
> > > FLIP? We
> > > > >> can record it and consider it in the future. It may be useful for
> > some
> > > > >> scenarios, such as the combination of streaming and ML.
> > > > >>
> > > > >> Best,
> > > > >> Shammon
> > > > >>
> > > > >>
> > > > >> On Mon, Mar 6, 2023 at 1:39 PM weijie guo <
> > guoweijieres...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Weihua,
> > > > >> >
> > > > >> > Thanks for your clarification, SGTM.
> > > > >> >
> > > > >> > Best regards,
> > > > >> >
> > > > >> > Weijie
> > > > >> >
> > > > >> >
> > > > >> > Weihua Hu  于2023年3月6日周一 11:43写道：
> > > > >> >
> > > > >> > > Thanks Weijie.
> > > > >> > >
> > > > >> > > Heterogeneous task managers will not be

[jira] [Created] (FLINK-31385) Introduce extended Assertj Matchers for completable futures

2023-03-09 Thread Jira

David Morávek created FLINK-31385:
-

 Summary: Introduce extended Assertj Matchers for completable 
futures
 Key: FLINK-31385
 URL: https://issues.apache.org/jira/browse/FLINK-31385
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: David Morávek
Assignee: David Morávek


Introduce extended Assertj Matchers for completable futures that don't rely on 
timeouts.

In general, we want to avoid relying on timeouts in the Flink test suite to get 
additional context (thread dump) in case something gets stuck.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31384) If you don't pass a function type to the builders WithSpec method it doesn't error within the Go SDK but the function isn't registered within Flink and doesn't run corre

2023-03-09 Thread Nicolas Pocock (Jira)

Nicolas Pocock created FLINK-31384:
--

 Summary: If you don't pass a function type to the builders 
WithSpec method it doesn't error within the Go SDK but the function isn't 
registered within Flink and doesn't run correctly.
 Key: FLINK-31384
 URL: https://issues.apache.org/jira/browse/FLINK-31384
 Project: Flink
  Issue Type: Bug
  Components: Stateful Functions
Reporter: Nicolas Pocock


If you don't pass a function type to the builders `WithSpec` method it doesn't 
error within the Go SDK but the function isn't registered within Flink and 
doesn't run correctly. 

You can see the error within the Flink UI but the Go SDK just logs

`"registering Stateful Function nil"`

And then carries on as usual



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31383) Add support for documenting additionProperties of the REST API payloads.

2023-03-09 Thread Jira

David Morávek created FLINK-31383:
-

 Summary: Add support for documenting additionProperties of the 
REST API payloads.
 Key: FLINK-31383
 URL: https://issues.apache.org/jira/browse/FLINK-31383
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Runtime / REST
Reporter: David Morávek
Assignee: David Morávek


For implementing the request and response body of the resource requirements 
endpoint, we need to be able to document "additionalProperties" because these 
payloads have only top-level dynamic properties of the same type.

 

An example of what we want to be able to document is:
{code:java}
@JsonAnySetter
@JsonAnyGetter
@JsonSerialize(keyUsing = JobVertexIDKeySerializer.class)
@JsonDeserialize(keyUsing = JobVertexIDKeyDeserializer.class)
private final Map 
jobVertexResourceRequirements;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-300: Add targetColumns to DynamicTableSink#Context to solve the null overwrite problem of partial-insert

2023-03-09 Thread Lincoln Lee

Hi Timo,

Thank you for the suggestion, I've updated the java doc (to make it clear
that a statement without specifying a column list will return {@link
Optional#empty()}) in the FLIP[1] and also the poc[2].

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240885081
[2] https://github.com/apache/flink/pull/22041

Best,
Lincoln Lee


Timo Walther  于2023年3月9日周四 16:36写道：

> Hi Lincoln,
>
> thanks for proposing the FLIP. The general idea to expose the target
> columns in DynamicTableSink#Context sounds good to me.
>
> In the FLIP I found the JavaDoc a bit confusing:
>
> ```
> The column list will be empty for 'insert into target select ...'.
> ```
>
> This could mean both optional empty or array empty. Maybe you can
> rephrase that a bit in the implementation.
>
> Otherwise +1.
>
> Timo
>
>
> On 08.03.23 14:00, Lincoln Lee wrote:
> > Hi Jing,
> > Agree with you that using formal terms can be easier to users, I've
> updated
> > the FLIP[1], since this is only one of the application scenarios for
> > partial insert, our java doc for the corresponding interface will
> describe
> > the partial insert message itself from a generic point of view, WDTY?
> >
> > @Jacky thanks for your feedback!
> > here are my thoughts for the two questions:
> > for this scenario, I don't think the planner should report an error. We
> > cannot assume that such usage will necessarily result in errors or that
> > users are unaware of potential risks (just like in a database, similar
> > operations are not prompted with errors). In the streaming scenario,
> > regarding the risks associated with the multi-insert operation with
> > overlapping fields, we may consider expanding the plan advice (FLIP-280
> has
> > just added possibilities to support this) to prompt users instead of
> > rejecting the operation with an error.
> >> 1. if the two insert into with same columns, the result is not
> > nondeterminism. will it check in planner and throw exception
> >
> > yes, not all connectors support partial insert. Therefore, the
> introduction
> > of this interface is only intended as additional information for the
> > connectors that need it. The new `targetColumns` only provide the column
> > list information corresponding to the statement according to the SQL
> > standard, and existing connectors do not need to make any passive changes
> > by default.
> >> 2. some sink connectors can not supports it like queue such as kafka
> > compacted topic. will also it check in planner  and throw exception
> >
> > welcome your feedback!
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240885081
> >
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jacky Lau  于2023年3月8日周三 20:11写道：
> >
> >> Thanks for bringing this up. this is a good feature. but i have two
> >> questions:
> >> 1. if the two insert into with same columns, the result is
> >> not  nondeterminism. will it check in planner and throw exception
> >> 2. some sink connectors can not supports it like queue such as kafka
> >> compacted topic. will also it check in planner  and throw exception
> >>
> >> Lincoln Lee  于2023年3月7日周二 14:53写道：
> >>
> >>> Hi Aitozi,
> >>>
> >>> Thanks for your feedback!  Yes, including HBase and JDBC connector,
> they
> >>> can be considered for support in the next step (JDBC as as a standard
> >>> protocol supported not only in traditional databases, but also in more
> >> and
> >>> more new types of storage). Considering the ongoing externalizing of
> >>> connectors and the release cycles of the connectors are decoupled with
> >> the
> >>> release cycle of Flink, we can initiate corresponding support issues
> for
> >>> specific connectors to follow up on support after finalizing the API
> >>> changes, WDYT?
> >>>
> >>> Best,
> >>> Lincoln Lee
> >>>
> >>>
> >>> Hang Ruan  于2023年3月7日周二 12:14写道：
> >>>
>  Hi, Lincoln,
> 
>  Thanks for bringing this up. It looks good to me. I also agree with
>  Jingsong's suggestion.
> 
>  Best,
>  Hang
> 
>  Jingsong Li  于2023年3月7日周二 11:15写道：
> 
> > Wow, we have 300 FLIPs...
> >
> > Thanks Lincoln,
> >
> > Have you considered returning an Optional?
> >
> > Empty array looks a little weird to me.
> >
> > Best,
> > Jingsong
> >
> > On Tue, Mar 7, 2023 at 10:32 AM Aitozi  wrote:
> >>
> >> Hi Lincoln,
> >>  Thank you for sharing this FLIP. Overall, it looks good to me.
> >> I
>  have
> >> one question: with the introduction of this interface,
> >> will any existing Flink connectors need to be updated in order to
> >>> take
> >> advantage of its capabilities? For example, HBase.
> >>
> >> yuxia  于2023年3月7日周二 10:01写道：
> >>
> >>> Thanks. It makes sense to me.
> >>>
> >>> Best regards,
> >>> Yuxia
> >>>
> >>> - 原始邮件 -
> >>> 发件人: "Lincoln Lee" 
> >>> 收件人: "dev" 
> >>> 发送时间: 星期一, 2023年 3 月 06日 下午 10:26:26
> >>> 主题: Re:

Re: Flink metric collection

2023-03-09 Thread Hang Ruan

Hi, Atul,

I think the metric reporter[1] will be helpful for you.

Best,
Hang

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/metric_reporters/

Atul Lal  于2023年3月9日周四 17:49写道：

> Hi everyone,
>
> I am trying to do some experiments with Flink. I am trying to modify the
> source code of Flink for this experiment, and I am starting a process
> thread from JobMaster.java constructor whenever a new job is started in
> Flink. In this thread, I want to monitor a few metrics related to the job
> and make some decisions based on it.
>
> Is there any way to collect metrics related to a job without using REST
> endpoints? Because I think using REST endpoints here is pointless as this
> is internal code running on JobMaster. If there is no other way than using
> REST endpoints, is there any easy way to serialize or parse JSON responses
> from those endpoints as the response structures are already defined in the
> classes.
>
> I would really appreciate it if someone could help me with this.
>
> Thank you,
> Atul
>

[VOTE] Release 1.15.4, release candidate #2

2023-03-09 Thread Danny Cranmer

Hi everyone,
Please review and vote on the release candidate #2 for the version 1.15.4,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint 125FD8DB [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.15.4-rc2" [5],
* website pull request listing the new release and adding announcement blog
post [6].

The vote will be open for at least 72 hours (excluding weekends 2023-03-14
13:00). It is adopted by majority approval, with at least 3 PMC affirmative
votes.

Thanks,
Danny

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352526
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.4-rc2/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4]
https://repository.apache.org/content/repositories/orgapacheflink-1594/org/apache/flink/
[5] https://github.com/apache/flink/releases/tag/release-1.15.4-rc2
[6] https://github.com/apache/flink-web/pull/611

[jira] [Created] (FLINK-31381) UnsupportedOperationException: Unsupported type when convertTypeToSpec: MAP

2023-03-09 Thread jackylau (Jira)

jackylau created FLINK-31381:


 Summary: UnsupportedOperationException: Unsupported type when 
convertTypeToSpec: MAP
 Key: FLINK-31381
 URL: https://issues.apache.org/jira/browse/FLINK-31381
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.18.0
Reporter: jackylau
 Fix For: 1.18.0


when i fix this https://issues.apache.org/jira/browse/FLINK-31377, and find 
another bug.

which is not fixed completely
{code:java}
SELECT array_contains(ARRAY[CAST(null AS MAP), MAP[1, 2]], MAP[1, 
2]); {code}
{code:java}
Caused by: java.lang.UnsupportedOperationException: Unsupported type when 
convertTypeToSpec: MAPat 
org.apache.calcite.sql.type.SqlTypeUtil.convertTypeToSpec(SqlTypeUtil.java:1069)
at 
org.apache.calcite.sql.type.SqlTypeUtil.convertTypeToSpec(SqlTypeUtil.java:1091)
at 
org.apache.flink.table.planner.functions.utils.SqlValidatorUtils.castTo(SqlValidatorUtils.java:82)
at 
org.apache.flink.table.planner.functions.utils.SqlValidatorUtils.adjustTypeForMultisetConstructor(SqlValidatorUtils.java:74)
at 
org.apache.flink.table.planner.functions.utils.SqlValidatorUtils.adjustTypeForArrayConstructor(SqlValidatorUtils.java:39)
at 
org.apache.flink.table.planner.functions.sql.SqlArrayConstructor.inferReturnType(SqlArrayConstructor.java:44)
at 
org.apache.calcite.sql.SqlOperator.validateOperands(SqlOperator.java:504)at 
org.apache.calcite.sql.SqlOperator.deriveType(SqlOperator.java:605)at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:6218)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:6203)
at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161)at 
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1861)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1852)
at 
org.apache.flink.table.planner.functions.inference.CallBindingCallContext$1.get(CallBindingCallContext.java:74)
at 
org.apache.flink.table.planner.functions.inference.CallBindingCallContext$1.get(CallBindingCallContext.java:69)
at 
org.apache.flink.table.types.inference.strategies.RootArgumentTypeStrategy.inferArgumentType(RootArgumentTypeStrategy.java:58)
at 
org.apache.flink.table.types.inference.strategies.SequenceInputTypeStrategy.inferInputTypes(SequenceInputTypeStrategy.java:76)
at 
org.apache.flink.table.planner.functions.inference.TypeInferenceOperandInference.inferOperandTypesOrError(TypeInferenceOperandInference.java:91)
at org.apache.flink.table. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31382) Add select query ILIKE predicate support

2023-03-09 Thread Ran Tao (Jira)

Ran Tao created FLINK-31382:
---

 Summary: Add select query ILIKE predicate support
 Key: FLINK-31382
 URL: https://issues.apache.org/jira/browse/FLINK-31382
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Ran Tao


As FLIP discussed, we will add ILIKE support for select query in order to make 
consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31380) Add filter support for ShowOperations

2023-03-09 Thread Ran Tao (Jira)

Ran Tao created FLINK-31380:
---

 Summary: Add filter support for ShowOperations 
 Key: FLINK-31380
 URL: https://issues.apache.org/jira/browse/FLINK-31380
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Ran Tao


As FLIP discussed. We will support new syntax for these operations.


|show catalogs|SHOW CATALOGS|SHOW CATALOGS [ [NOT] (LIKE \| ILIKE) 
 ]| |
|show databases|SHOW DATABASES|SHOW DATABASES [ ( FROM \| IN ) catalog_name] [ 
[NOT] (LIKE \| ILIKE)  ]| |
|show tables|SHOW TABLES [ ( FROM \| IN ) [catalog_name.]database_name ] [ 
[NOT] LIKE  ]|SHOW TABLES [ ( FROM \| IN ) 
[catalog_name.]database_name ] [ [NOT] (LIKE \| ILIKE)  ]| |
|show columns|SHOW COLUMNS ( FROM \| IN ) 
[[catalog_name.]database.] [ [NOT] LIKE ]|SHOW 
COLUMNS ( FROM \| IN ) [[catalog_name.]database.] [ [NOT] (LIKE \| 
ILIKE) ]| |
|show functions|SHOW [USER] FUNCTIONS|SHOW [USER] FUNCTIONS [ ( FROM \| IN ) 
[catalog_name.]database_name ] [ [NOT] (LIKE \| ILIKE)  ]| |
|show views|SHOW VIEWS|SHOW VIEWS [ ( FROM \| IN ) [catalog_name.]database_name 
] [ [NOT] (LIKE \| ILIKE)  ]| |
|show modules|SHOW [FULL] MODULES|SHOW [FULL] MODULES [ [NOT] (LIKE \| ILIKE) 
 ]| |
|show jars|SHOW JARS|SHOW JARS [ [NOT] (LIKE \| ILIKE)  
]|only work in [SQL 
CLI|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sqlclient/]
 or [SQL 
Gateway|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/overview/].|
|show jobs|SHOW JOBS|SHOW JOBS [ [NOT] (LIKE \| ILIKE)  
]|only work in [SQL 
CLI|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sqlclient/]
 or [SQL 
Gateway|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql-gateway/overview/].|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[RESULT][VOTE] FLIP-297: Improve Auxiliary Sql Statements

2023-03-09 Thread Ran Tao

Hi all,

I am happy to announce that FLIP-297: Improve Auxiliary Sql Statements[1]
has been accepted.

There are 8 approving votes, 4 of which are binding:
- ConradJam
- Hang Ruan
- Jacky Lau
- Jing Ge (binding)
- Jark Wu (binding)
- yuxia
- Sergey Nuyanzin (binding)
- Timo Walther (binding)

There are no disapproving votes.

Thanks everyone for participating!

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements


Best Regards,
Ran Tao

[jira] [Created] (FLINK-31379) ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers runs into timeout

2023-03-09 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-31379:
-

 Summary: 
ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
 runs into timeout
 Key: FLINK-31379
 URL: https://issues.apache.org/jira/browse/FLINK-31379
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.16.1
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46843=logs=a57e0635-3fad-5b08-57c7-a4142d7d6fa9=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7=9655

{code}
Mar 06 12:16:45 "ForkJoinPool-51-worker-25" #645 daemon prio=5 os_prio=0 
tid=0x7fe20f633000 nid=0xdd4 waiting on condition [0x7fe0342c5000]
Mar 06 12:16:45java.lang.Thread.State: WAITING (parking)
Mar 06 12:16:45 at sun.misc.Unsafe.park(Native Method)
Mar 06 12:16:45 - parking to wait for  <0xd213d1f8> (a 
java.util.concurrent.CompletableFuture$Signaller)
Mar 06 12:16:45 at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
Mar 06 12:16:45 at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
Mar 06 12:16:45 at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
Mar 06 12:16:45 at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
Mar 06 12:16:45 at 
java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
Mar 06 12:16:45 at 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256)
Mar 06 12:16:45 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-297: Improve Auxiliary Sql Statements

2023-03-09 Thread Ran Tao

Thanks, everyone, I'm closing this vote now. I'll follow up with the result

in another email.

Best Regards,
Ran Tao


Ran Tao  于2023年3月7日周二 15:59写道：

> thanks Lau.
>
> The vote will last for at least 72 hours (03/09, 19:30 UTC+8).
> It needs consensus approval, requiring 3 binding +1 votes and no
> binding vetoes.
>
>
> Best Regards,
> Ran Tao
>
>
> Jacky Lau  于2023年3月7日周二 15:11写道：
>
>> Thanks Ran.
>> +1 (non-binding)
>>
>> Regards,
>> Jacky Lau
>>
>> Ran Tao  于2023年3月6日周一 19:32写道：
>>
>> > Hi Everyone,
>> >
>> >
>> > I want to start the vote on FLIP-297: Improve Auxiliary Sql Statements
>> [1].
>> > The FLIP was discussed in this thread [2].
>> >
>> >
>> > The goal of the FLIP is to improve flink auxiliary sql
>> statements(compared
>> > with sql standard or other mature engines).
>> >
>> > The vote will last for at least 72 hours (03/09, 19:30 UTC+8)
>> > unless there is an objection or insufficient votes. Thank you all.
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements
>> > [2] https://lists.apache.org/thread/54fyd27m8on1cf3hn6dz564zqmkobjyd
>> >
>> > Best Regards,
>> > Ran Tao
>> > https://github.com/chucheng92
>> >
>>
>

[jira] [Created] (FLINK-31378) Documentation fails to build due to lack of package

2023-03-09 Thread Hongshun Wang (Jira)

Hongshun Wang created FLINK-31378:
-

 Summary: Documentation fails to build due to lack of package
 Key: FLINK-31378
 URL: https://issues.apache.org/jira/browse/FLINK-31378
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Hongshun Wang


In [Project Configuration 
Section|[https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/configuration/overview/#running-and-packaging],]
 it shows that "If you want to run your job by simply executing the main class, 
you will need {{flink-runtime}} in your classpath". 

However, when I just add flink-runtime in my classPath, an error is thrown like 
this:"
No ExecutorFactory found to execute the application".

It seems that flink-clients is also needed to supply an excutor through Java 
Service Load.

Could you please add this in official article for beginners like me?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: StreamingFileSink's DefaultRollingPolicy

2023-03-09 Thread Lee, Keith

Hi Ram,

You are right, the effect of DefaultRollingPolicy not always rolling on 
checkpoint leads to unusable non-latest savepoints.

My question is, should DefaultRollingPolicy be changed so that it always roll 
on checkpoint to provide a better user experience?
I'd imagine most StreamingFileSink user starts off with DefaultRollingPolicy. 
Users finding that they cannot restore from earlier savepoint when they intend 
to replay events and consequently is a suboptimal experience.

Of course, changing default behaviour may have huge impact, this is also why 
I'd like to understand: what's the motivation behind making 
DefaultRollingPolicy not always roll on checkpoint?
It's worth considering the trade-offs as current default behaviour make 
non-latest savepoints unrestorable .

Thank you
Keith

Flink metric collection

2023-03-09 Thread Atul Lal

Hi everyone,

I am trying to do some experiments with Flink. I am trying to modify the
source code of Flink for this experiment, and I am starting a process
thread from JobMaster.java constructor whenever a new job is started in
Flink. In this thread, I want to monitor a few metrics related to the job
and make some decisions based on it.

Is there any way to collect metrics related to a job without using REST
endpoints? Because I think using REST endpoints here is pointless as this
is internal code running on JobMaster. If there is no other way than using
REST endpoints, is there any easy way to serialize or parse JSON responses
from those endpoints as the response structures are already defined in the
classes.

I would really appreciate it if someone could help me with this.

Thank you,
Atul

Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL

2023-03-09 Thread Timo Walther

> I was wondering why you, exceptionally, suggested 'scan.idle-timeout' 
instead of 'scan.watermark.idle-timeout'. I must miss something here.

@Jing: You are right. That was just a copy paste mistake. It should be 
`scan.watermark.idle-timeout`.

@Kui: Can you fix that in the FLIP? Sorry, for this typo.

> users might feel confused after using those OPTIONs, since they might 
not be aware of what happens underneath

I agree that those options will not make the life easier for most SQL 
users. I would consider those options intended for power users. Usually, 
the data platform team should come up with well-defined watermark 
semantics before exposing tables to SQL users.

Regards,
Timo

On 07.03.23 13:15, Kui Yuan wrote:

Hi Jing,

Thanks for your advice. In upcoming PR, I will explain all these things
(Including flip-182,flip-217,etc.) clearly in the flink doc to make sure
the user can understand the behavior behind it.

Best,
Kui

Jing Ge  于2023年3月7日周二 19:42写道：

Hi Kui,

Thanks for adding it into the Flip. There is one more thing wrt this topic
you might want to pay attention to, a little bit off-topic, is that Flink
SQL users might not be familiar with use cases of low level Datastream API.
IMHO, it is highly recommended(mandatory) to write the dependency you just
described in your last email and all related information in Flink doc
during developing this FLIP within the upcoming PR. Without those
guidelines in doc, users might feel confused after using those OPTIONs,
since they might not be aware of what happens underneath, and therefore
don't know why it does not work even if they did everything right at Flink
SQL level.

Best regards,
Jing

On Tue, Mar 7, 2023 at 6:36 AM Kui Yuan  wrote:

Hi Jing,

Thanks for the reminder. The aim of this flip is letting the sql users to
use those features in the Datastream API, we don't intend to extend
flip-217. In my opinion, the watermark alignment with only one source can
be configured by the options given in flip, and if the source connector
does not implement flip-217, the task will run with an error, reminding

the

user to use `pipeline.watermark-alignment.allow-

unaligned-source-splits`,

but maybe these behaviors are not understood by everyone, I will add some
tips about flip-217 in the flip to let users understand the behavior in

the

case of source splits.

Best,

Kui Yuan

Jing Ge  于2023年3月7日周二 04:23写道：

Hi Kui,

Thanks for pointing that out. I knew FLIP-217 which was done by an
engineer working in my team.  As far as I am concerned, your FLIP

should

answer the following questions:

1. How to enable the watermark alignment of a source splits with Flink

SQL?

e.g. which options should be used if only one source is used?

2. Default behaviour. i.e. Flink SQL users should be aware that

watermark

alignment of source split will only work for sources that implement
FLIP-217 properly. Should users take care of
`pipeline.watermark-alignment.allow-unaligned-source-splits`
while using Flink SQL?

Best regards,
Jing

On Fri, Mar 3, 2023 at 8:46 AM Kui Yuan  wrote:

Hi all,

Thanks for all. There are more questions and I will answer one by

one.

@Jark Thanks for your tips. For the first question, I will add more

details

in the flip, and give a POC[1] so that pepole can know how I'm

currently

implementing these features.

IIRC, this is the first time we introduce the framework-level

connector

options that the option is not recognized and handled by

connectors.

The FLIP should cover how framework filters the watermark related

options

to avoid discover connector factory failed, and what happens if the
connector already supported the conflict options

For the second question, We know that the default strategy is

'on-periodic'

in SQL layer, and the default interval is 200ms. The reason for

emiting

watermark periodically is that the time advancement of consecutive

events

may be very small, we don't need to calculate watermark for each

event.

Same for 'on-event' strategy, so my idea is that we can set a fixed

gap

for

'on-event' strategy.

I'm not sure about the usage scenarios of event gap emit strategy.

Do

you have any specific use case of this strategy? I'm confused why

no

one

requested this strategy before no matter in DataStream or SQL, but

maybe

I missed something. I'm not against to add this option, but just

want

to

be

careful when adding new API because it's hard to remove in the

future.

As @Timo said, There is no default features like 'on-event-gap' in
DataStream API, but the users can achieve the 'on-event-gap' feature

by

using `WatermarkGenerator` interface, just like the implemention in

my

POC[1]. However, If we don't provide it  in SQL layer, there is no

way

for

users to use similar features.

Jark raised a very good point. I thought we only expose what is
contained in DataStream API already. If this strategy is not part

of

DataStream API, would like to exclude it from the FLIP. We need to

[jira] [Created] (FLINK-31377) BinaryArrayData getArray/getMap should Handle null correctly AssertionError: valueArraySize (-6) should >= 0

2023-03-09 Thread jackylau (Jira)

jackylau created FLINK-31377:


 Summary: BinaryArrayData getArray/getMap should Handle null 
correctly AssertionError: valueArraySize (-6) should >= 0 
 Key: FLINK-31377
 URL: https://issues.apache.org/jira/browse/FLINK-31377
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.18.0
Reporter: jackylau
 Fix For: 1.18.0


{code:java}
// code placeholder
when i use , if the element has map which is null
Object getElementOrNull(ArrayData array, int pos);{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-299 Pub/Sub Lite Connector

2023-03-09 Thread Konstantin Knauf

Hi Daniel,

I think, it would be great to have a PubSub Lite Connector in Flink. Before
you put this proposal up for a vote, though, we need feedback from a
Committer who would review and help maintain it going forward. Ideally,
this Committer would guide one or more contributors from Google to
Committership so that Google could step up and maintain Flink's PubSub and
PubSub Lite Connector in the future. For this, it would be good to
understand how you envision the involvement of the PubSub Lite team at
Google.

I am specifically sensitive on this topic, because the PubSub connector has
lacked attention and maintenance for a long time. There was also a very
short-lived interested by Google in the past to contribute a Google PubSub
Connector [1].

Best,

Konstantin

[1] https://issues.apache.org/jira/browse/FLINK-22380

Am Mi., 8. März 2023 um 14:45 Uhr schrieb Etienne Chauchot <
echauc...@apache.org>:

> Hi,
>
> I agree with Ryan, even if clients might be totally different the
> backend technologies are the same so hosting them in the same repo makes
> sense. Similar thinking made us put all the Cassandra related connectors
> in the same cassandra repo.
>
> Etienne
>
> Le 02/03/2023 à 14:43, Daniel Collins a écrit :
> > Hello Ryan,
> >
> > Unfortunately there's not much shared logic between the two- the clients
> > have to look fundamentally different since the Pub/Sub Lite client
> exposes
> > partitions to the split level for repeatable reads.
> >
> > I have no objection to this living in the same repo as the Pub/Sub
> > connector, if this is an easier way forward than setting up a new repo,
> > sounds good to me. The Pub/Sub team is organizationally close to us, and
> is
> > looking into providing more support for the flink connector in the near
> > future.
> >
> > -Daniel
> >
> > On Thu, Mar 2, 2023 at 3:26 AM Ryan Skraba  >
> > wrote:
> >
> >> Hello Daniel!  Quite a while ago, I started porting the Pub/Sub
> connector
> >> (from an existing PR) to the new source API in the new
> >> flink-connector-gcp-pubsub repository [PR2].  As Martijn mentioned,
> there
> >> hasn't been a lot of attention on this connector; any community
> involvement
> >> would be appreciated!
> >>
> >> Instead of considering this a new connector, is there an opportunity
> here
> >> to offer the two variants (Pub/Sub and Pub/Sub Lite) as different
> artifacts
> >> in that same repo?  Is there much common logic that can be shared
> between
> >> the two?  I'm not as familiar as I should be with Lite, but I do recall
> >> that they share many concepts and _some_ dependencies.
> >>
> >> All my best, Ryan
> >>
> >>
> >> On Wed, Mar 1, 2023 at 11:21 PM Daniel Collins
> >> 
> >> wrote:
> >>
> >>> Hello all,
> >>>
> >>> I'd like to start an official discuss thread for adding a Pub/Sub Lite
> >>> Connector to Flink. We've had requests from our users to add flink
> >> support,
> >>> and are willing to maintain and support this connector long term from
> the
> >>> product team.
> >>>
> >>> The proposal is https://cwiki.apache.org/confluence/x/P51bDg, what
> would
> >>> be
> >>> people's thoughts on adding this connector?
> >>>
> >>> -Daniel
> >>>
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk

Re: [DISCUSS] FLIP-300: Add targetColumns to DynamicTableSink#Context to solve the null overwrite problem of partial-insert

2023-03-09 Thread Timo Walther


Hi Lincoln,

thanks for proposing the FLIP. The general idea to expose the target 
columns in DynamicTableSink#Context sounds good to me.


In the FLIP I found the JavaDoc a bit confusing:

```
The column list will be empty for 'insert into target select ...'.
```

This could mean both optional empty or array empty. Maybe you can 
rephrase that a bit in the implementation.


Otherwise +1.

Timo


On 08.03.23 14:00, Lincoln Lee wrote:

Hi Jing,
Agree with you that using formal terms can be easier to users, I've updated
the FLIP[1], since this is only one of the application scenarios for
partial insert, our java doc for the corresponding interface will describe
the partial insert message itself from a generic point of view, WDTY?

@Jacky thanks for your feedback!
here are my thoughts for the two questions:
for this scenario, I don't think the planner should report an error. We
cannot assume that such usage will necessarily result in errors or that
users are unaware of potential risks (just like in a database, similar
operations are not prompted with errors). In the streaming scenario,
regarding the risks associated with the multi-insert operation with
overlapping fields, we may consider expanding the plan advice (FLIP-280 has
just added possibilities to support this) to prompt users instead of
rejecting the operation with an error.

1. if the two insert into with same columns, the result is not

nondeterminism. will it check in planner and throw exception

yes, not all connectors support partial insert. Therefore, the introduction
of this interface is only intended as additional information for the
connectors that need it. The new `targetColumns` only provide the column
list information corresponding to the statement according to the SQL
standard, and existing connectors do not need to make any passive changes
by default.

2. some sink connectors can not supports it like queue such as kafka

compacted topic. will also it check in planner  and throw exception

welcome your feedback!


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240885081


Best,
Lincoln Lee


Jacky Lau  于2023年3月8日周三 20:11写道：


Thanks for bringing this up. this is a good feature. but i have two
questions:
1. if the two insert into with same columns, the result is
not  nondeterminism. will it check in planner and throw exception
2. some sink connectors can not supports it like queue such as kafka
compacted topic. will also it check in planner  and throw exception

Lincoln Lee  于2023年3月7日周二 14:53写道：


Hi Aitozi,

Thanks for your feedback!  Yes, including HBase and JDBC connector, they
can be considered for support in the next step (JDBC as as a standard
protocol supported not only in traditional databases, but also in more

and

more new types of storage). Considering the ongoing externalizing of
connectors and the release cycles of the connectors are decoupled with

the

release cycle of Flink, we can initiate corresponding support issues for
specific connectors to follow up on support after finalizing the API
changes, WDYT?

Best,
Lincoln Lee


Hang Ruan  于2023年3月7日周二 12:14写道：


Hi, Lincoln,

Thanks for bringing this up. It looks good to me. I also agree with
Jingsong's suggestion.

Best,
Hang

Jingsong Li  于2023年3月7日周二 11:15写道：


Wow, we have 300 FLIPs...

Thanks Lincoln,

Have you considered returning an Optional?

Empty array looks a little weird to me.

Best,
Jingsong

On Tue, Mar 7, 2023 at 10:32 AM Aitozi  wrote:


Hi Lincoln,
 Thank you for sharing this FLIP. Overall, it looks good to me.

I

have

one question: with the introduction of this interface,
will any existing Flink connectors need to be updated in order to

take

advantage of its capabilities? For example, HBase.

yuxia  于2023年3月7日周二 10:01写道：


Thanks. It makes sense to me.

Best regards,
Yuxia

- 原始邮件 -
发件人: "Lincoln Lee" 
收件人: "dev" 
发送时间: 星期一, 2023年 3 月 06日 下午 10:26:26
主题: Re: [DISCUSS] FLIP-300: Add targetColumns to

DynamicTableSink#Context

to solve the null overwrite problem of partial-insert

hi yuxia,

Thanks for your feedback and tracking the issue of update

statement!

I've

updated the FLIP[1] and also the poc[2].
Since the bug and flip are orthogonal, we can focus on finalizing

the

api

changes first, and then work on the flip implementation and

bugfix

separately, WDYT?

[1]








https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240885081

[2] https://github.com/apache/flink/pull/22041

Best,
Lincoln Lee


yuxia  于2023年3月6日周一 21:21写道：


Hi, Lincoln.
Thanks for bringing this up. +1 for this FLIP, it's helpful for

external

storage system to implement partial update.
The FLIP looks good to me. I only want to add one comment,

update

statement also doesn't support updating nested column, I have

created

FLINK-31344[1] to track it.
Maybe we also need to explain it in this FLIP.

[1] https://issues.apache.org/jira/browse/FLINK-31344

Best regards,
Yuxia

- 原始邮件 -
发件人: "Lincoln Lee" 
收件人: "dev"

Re: [VOTE] FLIP-297: Improve Auxiliary Sql Statements

2023-03-09 Thread Timo Walther


+1 (binding)

Thanks,
Timo


On 08.03.23 14:59, Sergey Nuyanzin wrote:

+1
(binding) i guess based on [1], please correct me if i am wrong
[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Actions


On Wed, Mar 8, 2023 at 8:04 AM yuxia  wrote:


+1 (non-binding)
Thanks Ran Tao for driving it.

Best regards,
Yuxia

- 原始邮件 -
发件人: "Jark Wu" 
收件人: "dev" 
发送时间: 星期二, 2023年 3 月 07日 下午 8:50:05
主题: Re: [VOTE] FLIP-297: Improve Auxiliary Sql Statements

+1 (binding)

Best,
Jark


2023年3月7日 17:07，Jing Ge  写道：

+1
Thanks!

Best regards,
Jing

On Tue, Mar 7, 2023 at 9:51 AM ConradJam  wrote:


+1 (non-binding).

Hang Ruan  于2023年3月7日周二 16:14写道：


Thanks for Ran's work.
+1 (non-binding).

Best,
Hang

Ran Tao  于2023年3月7日周二 15:59写道：


thanks Lau.

The vote will last for at least 72 hours (03/09, 19:30 UTC+8).
It needs consensus approval, requiring 3 binding +1 votes and no
binding vetoes.


Best Regards,
Ran Tao


Jacky Lau  于2023年3月7日周二 15:11写道：


Thanks Ran.
+1 (non-binding)

Regards,
Jacky Lau

Ran Tao  于2023年3月6日周一 19:32写道：


Hi Everyone,


I want to start the vote on FLIP-297: Improve Auxiliary Sql

Statements

[1].

The FLIP was discussed in this thread [2].


The goal of the FLIP is to improve flink auxiliary sql

statements(compared

with sql standard or other mature engines).

The vote will last for at least 72 hours (03/09, 19:30 UTC+8)
unless there is an objection or insufficient votes. Thank you all.

[1]











https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements

[2]

https://lists.apache.org/thread/54fyd27m8on1cf3hn6dz564zqmkobjyd


Best Regards,
Ran Tao
https://github.com/chucheng92










--
Best

ConradJam