date:20230916

Re: [Discuss] CRD for flink sql gateway in the flink k8s operator

2023-09-16 Thread Yangze Guo

> There would be many different ways of doing this. One gateway per session 
> cluster, one gateway shared across different clusters...

Currently, sql gateway cannot be shared across multiple clusters.

> understand the tradeoff and the simplest way of accomplishing this.

I'm not familiar with the Flink operator codebase, it would be
appreciated if you could elaborate more on the cost of adding this
feature. I agree that deploying a gateway using the native Kubernetes
Deployment can be a simple way and straightforward for users. However,
integrating it into an operator can provide additional benefits and be
more user-friendly, especially for users who are less familiar with
Kubernetes. By using an operator, users can benefit from consistent
version management with the session cluster and upgrade capabilities.


Best,
Yangze Guo

On Fri, Sep 15, 2023 at 5:38 PM Gyula Fóra  wrote:
>
> There would be many different ways of doing this. One gateway per session
> cluster, one gateway shared across different clusters...
> I would not rush to add anything anywhere until we understand the tradeoff
> and the simplest way of accomplishing this.
>
> The operator already supports ingresses for session clusters so we could
> have a gateway sitting somewhere else simply using it.
>
> Gyula
>
> On Fri, Sep 15, 2023 at 10:18 AM Yangze Guo  wrote:
>
> > Thanks for bringing this up, Dongwoo. Flink SQL Gateway is also a key
> > component for OLAP scenarios.
> >
> > @Gyula
> > How about add sql gateway as an optional component to Session Cluster
> > Deployments. User can specify the resource / instance number and ports
> > of the sql gateway. I think that would help a lot for OLAP and batch
> > user.
> >
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Sep 15, 2023 at 3:19 PM ConradJam  wrote:
> > >
> > > If we start from the crd direction, I think this mode is more like a
> > > sidecar of the session cluster, which is submitted to the session cluster
> > > by sending sql commands to the sql gateway. I don't know if my statement
> > is
> > > accurate.
> > >
> > > Xiaolong Wang  于2023年9月15日周五
> > 13:27写道：
> > >
> > > > Hi, Dongwoo,
> > > >
> > > > Since Flink SQL gateway should run upon a Flink session cluster, I
> > think
> > > > it'd be easier to add more fields to the CRD of `FlinkSessionJob`.
> > > >
> > > > e.g.
> > > >
> > > > apiVersion: flink.apache.org/v1beta1
> > > > kind: FlinkSessionJob
> > > > metadata:
> > > >   name: sql-gateway
> > > > spec:
> > > >   sqlGateway:
> > > > endpoint: "hiveserver2"
> > > > mode: "streaming"
> > > > hiveConf:
> > > >   configMap:
> > > > name: hive-config
> > > > items:
> > > >   - key: hive-site.xml
> > > > path: hive-site.xml
> > > >
> > > >
> > > > On Fri, Sep 15, 2023 at 12:56 PM Dongwoo Kim 
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > *@Gyula*
> > > > > Thanks for the consideration Gyula. My initial idea for the CR was
> > > > roughly
> > > > > like below.
> > > > > I focused on simplifying the setup in k8s environment, but I agree
> > with
> > > > > your opinion that for the sql gateway
> > > > > we don't need custom operator logic to handle and most of the
> > > > requirements
> > > > > can be met by existing k8s resources.
> > > > > So maybe helm chart that bundles all resources needed should be
> > enough.
> > > > >
> > > > > apiVersion: flink.apache.org/v1beta1
> > > > > kind: FlinkSqlGateway
> > > > > metadata:
> > > > >   name: flink-sql-gateway-example
> > > > >   namespace: default
> > > > > spec:
> > > > >   clusterName: flink-session-cluster-example
> > > > >   exposeServiceType: LoadBalancer
> > > > >   flinkSqlGatewayConfiguration:
> > > > > sql-gateway.endpoint.type: "hiveserver2"
> > > > > sql-gateway.endpoint.hiveserver2.catalog.name: "hive"
> > > > >   hiveConf:
> > > > > configMap:
> > > > >   name: hive-config
> > > > >   items:
> > > > > - key: hive-site.xml
> > > > >   path: hive-site.xml
> > > > >
> > > > >
> > > > > *@xiaolong, @Shammon*
> > > > > Hi xiaolong and Shammon.
> > > > > Thanks for taking the time to share.
> > > > > I'd also like to add my experience with setting up flink sql gateway
> > on
> > > > > k8s.
> > > > > Without building a new Docker image, I've added a separate container
> > to
> > > > the
> > > > > existing JobManager pod and started the sql gateway using the
> > > > > "sql-gateway.sh start-foreground" command.
> > > > > I haven't explored deploying the sql gateway as an independent
> > deployment
> > > > > yet, but that's something I'm considering after modifying JM's
> > address to
> > > > > desired session cluster.
> > > > >
> > > > > Thanks all
> > > > >
> > > > > Best
> > > > > Dongwoo
> > > > >
> > > > > 2023년 9월 15일 (금) 오전 11:55, Xiaolong Wang
> > > > > 님이 작성:
> > > > >
> > > > > > Hi, Shammon,
> > > > > >
> > > > > > Yes, I want to create a Flink SQL-gateway in a job-manager.
> > > > > >
> > > > > > Currently, the above

Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-16 Thread Feng Jin

Hi, Zhanghao

Thank you for proposing this FLIP, it is a very meaningful feature.

I agree that currently we may only consider the parallelism setting of the
source itself. If we consider the parallelism setting of other operators,
it may make the entire design more complex.

Regarding the situation where the parallelism of the source is different
from that of downstream tasks, I did not find a more detailed description
in FLIP.

By default, if the parallelism between two operators is different, the
rebalance partitioner will be used.
But in the SQL scenario, I believe that we should keep the behavior of
parallelism setting consistent with that of the sink.

1. When the source only generates insert-only data, if there is a mismatch
in parallelism between the source and downstream operators, rebalance is
used by default.

2. When the source generates update and delete data, we should require the
source to configure a primary key and then build a hash partitioner based
on that primary key.

WDYT ？


Best,
Feng


On Sat, Sep 16, 2023 at 5:58 PM Jane Chan  wrote:

> Hi Zhanghao,
>
> Thanks for the explanation.
>
> For Q1, I think the key lies in determining the boundary where the chain
> should be broken. However, this boundary is ultimately determined by the
> specific requirements of each user query.
>
> The most straightforward approach is breaking the chain after the source
> operator, even though it involves a tradeoff. This is because there may be
> instances of `StreamExecWatermarkAssigner`, `StreamExecMiniBatchAssigner`,
> or `StreamExecChangelogNormalize` occurring before the `StreamExecCalc`
> node, and it would be complex and challenging to enumerate all possible
> match patterns.
>
> A more complex workaround would be to provide an entry point for users to
> configure the specific operator that should serve as the breakpoint.
> Meanwhile, this would further increase the complexity of this FLIP.
>
> However, if the parallelism of each operator can be configured (in the
> future), then this problem would not exist (it might be beyond the scope of
> discussion for this FLIP).
>
> I personally lean towards keeping the FLIP concise and focused by choosing
> the most straightforward approach. I would also like to hear other's
> opinions.
>
> Best,
> Jane
>
> On Sat, Sep 16, 2023 at 10:21 AM Yun Tang  wrote:
>
> > Hi Zhanghao,
> >
> > Certainly, I think we shall leave this FLIP focus on setting the source
> > parallelism via DDL's properties. I just want to clarify that setting
> > parallelism for individual operators is also profitable from my
> experience,
> > which is slighted in your FLIP.
> >
> > @ConradJam BTW, compared with SQL hint, I think using `scan.parallelism`
> > is better to align with current `sink.parallelism`. And once we introduce
> > such option, we can also use SQL hint of dynamic table options[1] to
> > configure the source parallelism.
> >
> > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options
> >
> >
> > Best
> > Yun Tang
> > 
> > From: ConradJam 
> > Sent: Friday, September 15, 2023 22:52
> > To: dev@flink.apache.org 
> > Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for
> Table/SQL
> > Sources
> >
> > + 1 Thanks for the FLIP and the discussion. I would like to ask whether
> to
> > use SQL Hint syntax to set this parallelism？
> >
> > Martijn Visser  于2023年9月15日周五 20:52写道：
> >
> > > Hi everyone,
> > >
> > > Thanks for the FLIP and the discussion. I find it exciting. Thanks for
> > > pushing for this.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Fri, Sep 15, 2023 at 2:25 PM Chen Zhanghao <
> zhanghao.c...@outlook.com
> > >
> > > wrote:
> > >
> > > > Hi Jane,
> > > >
> > > > Thanks for the valuable suggestions.
> > > >
> > > > For Q1, it's indeed an issue. Some possible ideas include
> introducing a
> > > > fake transformation after the source that takes the global default
> > > > parallelism, or simply make exec nodes to take the global default
> > > > parallelism, but both ways prevent potential chaining opportunity and
> > I'm
> > > > not sure if that's good to go. We'll need to give deeper thoughts in
> it
> > > and
> > > > polish our proposal. We're also more than glad to hear your inputs on
> > it.
> > > >
> > > > For Q2, scan.parallelism will take high precedence, as the more
> > specific
> > > > config should take higher precedence.
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > > > 
> > > > 发件人: Jane Chan 
> > > > 发送时间: 2023年9月15日 11:56
> > > > 收件人: dev@flink.apache.org 
> > > > 抄送: dewe...@outlook.com 
> > > > 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> > > > Sources
> > > >
> > > > Hi, Zhanghao, Dewei,
> > > >
> > > > Thanks for initiating this discussion. This feature is valuable in
> > > > providing more flexibility for performance tuning for SQL pipelines.
> > > >
> > > > Here are

Re: [DISCUSS] FLIP-307: Flink connector Redshift

2023-09-16 Thread Samrat Deb

Hi ,

I've made updates to the FLIP[1] by incorporating relevant changes to avoid
using the Flink connector JDBC. This decision was based on the following
reasons:

AWS Redshift utilizes its specialized JDBC driver[2]. Given that their JDBC
driver may undergo evolutions over time, using the Flink connector JDBC
might face compatibility issues.
Considering the dependency issues mentioned in the thread.
Additionally, a Proof of Concept (POC) has been implemented for a
DynamicSink for Redshift[3]. This showcases that utilizing
flink-connector-jdbc for Redshift doesn't provide significant benefits.
The JDBC mode offers more flexibility by allowing direct use of the JDBC
driver, enabling Flink connector Redshift to evolve independently.

[1] -
https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
[2] -
https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
[3] -
https://github.com/Samrat002/flink-connector-aws/tree/redshift-connector

Bests,
Samrat




On Sat, Sep 16, 2023 at 10:46 AM Samrat Deb  wrote:

> Hello Martijn,
>
> I apologize for the delay in responding.
>
> Regarding your question about integrating Redshift directly into the JDBC
> connector, we are planning to offer two modes: JDBC and UNLOAD. Through our
> internal benchmarking, we have observed good performance in the UNLOAD
> flow. Additionally, there is a need for both flows based on different user
> use cases.
>
> If we were to explicitly add the JDBC mode to the flink-connector-jdbc, we
> would have two options:
>
> 1. Include flink-connector-redshift in flink-connector-jdbc: This would
> involve incorporating the Redshift connector into the JDBC connector. Since
> Redshift is an AWS proprietary product, some authentication utilities can
> be utilized from flink-connector-aws-base. If additional utilities are
> required from the Redshift connector, they could be added to
> flink-connector-aws-base. In my opinion, this approach is favorable as it
> keeps everything related to AWS in flink-connector-aws.
>
> 2. Implement JDBC mode for Redshift sink in flink-connector-jdbc and
> UNLOAD in flink-connector-aws: This alternative is not advisable as it
> could lead to maintenance challenges and complexities.
>
>
> Furthermore, it's important to highlight that Redshift has its own
> customized JDBC driver[1], specifically optimized for compatibility with
> Redshift. While I cannot confirm this definitively, there is a possibility
> that the Redshift JDBC driver [1] might have differences in compatibility
> when compared to the JDBC driver used in flink-connector-jdbc. This
> suggests that if flink-connector-redshift were to rely on the JDBC
> connector, it could potentially lead to future compatibility issues.
>
> Given these considerations, it seems prudent to maintain the
> Redshift-related functionality within flink-connector-aws and keep the
> Redshift connector independent of the JDBC connector. This approach can
> help ensure that the Redshift connector remains flexible and adaptable to
> any potential changes in JDBC compatibility.
>
> I will update the FLIP[2] to remove dependencies on flink-connector-jdbc.
>
> [1]
> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
>
> Bests,
> Samrat
>
>
> On Mon, Sep 11, 2023 at 1:21 PM Martijn Visser 
> wrote:
>
>> Hi Samrat,
>>
>> I'm still having doubts about the dependency on the JDBC connector. When a
>> user specifies 'read mode', it will use the JDBC connector under the hood.
>> Why not integrate Redshift then directly in the JDBC connector itself? It
>> removes the need for a dependency on the JDBC driver, especially keeping
>> in
>> mind that this driver uses the old SourceFunction/SinkFunction interfaces
>> because it hasn't been migrated yet.
>>
>> Best regards,
>>
>> Martijn
>>
>> On Mon, Sep 11, 2023 at 8:54 AM Samrat Deb  wrote:
>>
>> > Hi Leonard,
>> >
>> > > Do we have to rely on the latest version of JDBC Connector here?
>> >
>> > No, there's no need for us to depend on the latest version of the JDBC
>> > Connector. Redshift has its dedicated JDBC driver [1], which includes
>> > custom modifications tailored to Redshift's specific implementation
>> needs.
>> > This driver is the most suitable choice for our purposes.
>> >
>> >
>> > > Could you collect the APIs that Redshift generally needs to use?
>> >
>> > I am actively working on it and making progress towards creating the
>> POC.
>> >
>> > Bests,
>> > Samrat
>> >
>> > [1]
>> >
>> >
>> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
>> >
>> > On Mon, Sep 11, 2023 at 12:02 PM Samrat Deb 
>> wrote:
>> >
>> > > Hello Danny,
>> > >
>> > > I wanted to express my gratitude for your valuable feedback and
>> > insightful
>> > > suggestions.
>> > >
>> > > I will be revising the FLIP to incorporate all of your queries and
>> review
>> > >

Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

2023-09-16 Thread Jane Chan

Hi Zhanghao,

Thanks for the explanation.

For Q1, I think the key lies in determining the boundary where the chain
should be broken. However, this boundary is ultimately determined by the
specific requirements of each user query.

The most straightforward approach is breaking the chain after the source
operator, even though it involves a tradeoff. This is because there may be
instances of `StreamExecWatermarkAssigner`, `StreamExecMiniBatchAssigner`,
or `StreamExecChangelogNormalize` occurring before the `StreamExecCalc`
node, and it would be complex and challenging to enumerate all possible
match patterns.

A more complex workaround would be to provide an entry point for users to
configure the specific operator that should serve as the breakpoint.
Meanwhile, this would further increase the complexity of this FLIP.

However, if the parallelism of each operator can be configured (in the
future), then this problem would not exist (it might be beyond the scope of
discussion for this FLIP).

I personally lean towards keeping the FLIP concise and focused by choosing
the most straightforward approach. I would also like to hear other's
opinions.

Best,
Jane

On Sat, Sep 16, 2023 at 10:21 AM Yun Tang  wrote:

> Hi Zhanghao,
>
> Certainly, I think we shall leave this FLIP focus on setting the source
> parallelism via DDL's properties. I just want to clarify that setting
> parallelism for individual operators is also profitable from my experience,
> which is slighted in your FLIP.
>
> @ConradJam BTW, compared with SQL hint, I think using `scan.parallelism`
> is better to align with current `sink.parallelism`. And once we introduce
> such option, we can also use SQL hint of dynamic table options[1] to
> configure the source parallelism.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options
>
>
> Best
> Yun Tang
> 
> From: ConradJam 
> Sent: Friday, September 15, 2023 22:52
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> Sources
>
> + 1 Thanks for the FLIP and the discussion. I would like to ask whether to
> use SQL Hint syntax to set this parallelism？
>
> Martijn Visser  于2023年9月15日周五 20:52写道：
>
> > Hi everyone,
> >
> > Thanks for the FLIP and the discussion. I find it exciting. Thanks for
> > pushing for this.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Fri, Sep 15, 2023 at 2:25 PM Chen Zhanghao  >
> > wrote:
> >
> > > Hi Jane,
> > >
> > > Thanks for the valuable suggestions.
> > >
> > > For Q1, it's indeed an issue. Some possible ideas include introducing a
> > > fake transformation after the source that takes the global default
> > > parallelism, or simply make exec nodes to take the global default
> > > parallelism, but both ways prevent potential chaining opportunity and
> I'm
> > > not sure if that's good to go. We'll need to give deeper thoughts in it
> > and
> > > polish our proposal. We're also more than glad to hear your inputs on
> it.
> > >
> > > For Q2, scan.parallelism will take high precedence, as the more
> specific
> > > config should take higher precedence.
> > >
> > > Best,
> > > Zhanghao Chen
> > > 
> > > 发件人: Jane Chan 
> > > 发送时间: 2023年9月15日 11:56
> > > 收件人: dev@flink.apache.org 
> > > 抄送: dewe...@outlook.com 
> > > 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> > > Sources
> > >
> > > Hi, Zhanghao, Dewei,
> > >
> > > Thanks for initiating this discussion. This feature is valuable in
> > > providing more flexibility for performance tuning for SQL pipelines.
> > >
> > > Here are my two cents,
> > >
> > > 1. In the FLIP, you mentioned concerns about the parallelism of the
> calc
> > > node and concluded to "leave the behavior unchanged for now."  This
> means
> > > that the calc node will use the parallelism of the source operator,
> > > regardless of whether the source parallelism is configured or not. If I
> > > understand correctly, currently, except for the sink exec node (which
> has
> > > the ability to configure its own parallelism), the rest of the exec
> nodes
> > > accept its input parallelism. From the design, I didn't see the details
> > > about coping with input and default parallelism for the rest of the
> exec
> > > nodes. Can you elaborate more about the details?
> > >
> > > 2. Does the configuration `table.exec.resource.default-parallelism`
> take
> > > precedence over `scan.parallelism`?
> > >
> > > Best,
> > > Jane
> > >
> > > On Fri, Sep 15, 2023 at 10:43 AM Yun Tang  wrote:
> > >
> > > > Thanks for creating this FLIP,
> > > >
> > > > Many users have demands to configure the source parallelism just as
> > > > configuring the sink parallelism via DDL. Look forward for this
> > feature.
> > > >
> > > > BTW, I think setting parallelism for each operator should also be
> > > > valuable. And this shall work with compiled plan [1] instead of SQL's
> > > DDL.
> > > >
>

[jira] [Created] (FLINK-33095) Job jar related issue should be reported as BAD_REQUEST instead of INTERNAL_SERVER_ERROR

2023-09-16 Thread Surendra Singh Lilhore (Jira)

Surendra Singh Lilhore created FLINK-33095:
--

 Summary: Job jar related issue should be reported as BAD_REQUEST 
instead of INTERNAL_SERVER_ERROR
 Key: FLINK-33095
 URL: https://issues.apache.org/jira/browse/FLINK-33095
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.16.0
Reporter: Surendra Singh Lilhore


When submitting a job with incorrect parameters, such as an invalid entry 
class, the current response is an internal server error.

To enhance the user experience and consistency, it is recommended to throw a 
Rest exception and return a BAD_REQUEST response code in such cases.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re:Re: [DISCUSS] FLIP-331: Support EndOfStreamWindows and isOutputOnEOF operator attribute to optimize task deployment

2023-09-16 Thread Wencong Liu

Hi Dong & Jinhao,

Thanks for your clarification! +1

Best regards,
Wencong

















At 2023-09-15 11:26:16, "Dong Lin"  wrote:
>Hi Wencong,
>
>Thanks for your comments! Please see my reply inline.
>
>On Thu, Sep 14, 2023 at 12:30 PM Wencong Liu  wrote:
>
>> Dear Dong,
>>
>> I have thoroughly reviewed the proposal for FLIP-331 and believe it would
>> be
>> a valuable addition to Flink. However, I do have a few questions that I
>> would
>> like to discuss:
>>
>>
>> 1. The FLIP-331 proposed the EndOfStreamWindows that is implemented by
>> TimeWindow with maxTimestamp = (Long.MAX_VALUE - 1), which naturally
>> supports WindowedStream and AllWindowedStream to process all records
>> belonging to a key in a 'global' window under both STREAMING and BATCH
>> runtime execution mode.
>>
>>
>> However, besides coGroup and keyBy().aggregate(), other operators on
>> WindowedStream and AllWindowedStream, such as join/reduce, etc, currently
>> are still implemented based on WindowOperator.
>>
>>
>> In fact, these operators can also be implemented without using
>> WindowOperator
>> to prevent additional WindowAssigner#assignWindows or
>> triggerContext#onElement
>> invocation cost. Will there be plans to support these operators in the
>> future?
>>
>
>You are right. The EndOfStreamWindows proposed in this FLIP can potentially
>benefit any DataStream API that takes WindowAssigner as parameters. This
>can involve more operations than aggregate and co-group.
>
>And yes, we have plans to take advantage of this API to optimize these
>operators in the future. This FLIP focuses on the introduction of the
>public APIs and uses aggregate/co-group as the first two examples to
>show-case the performance benefits.
>
>I have added a "Analysis of APIs affected by this FLIP" to list the
>DataStream APIs that can benefit from this FLIP. Would this answer your
>question?
>
>
>>
>> 2. When using EndOfStreamWindows, upstream operators no longer support
>> checkpointing. This limit may be too strict, especially when dealing with
>> bounded data in streaming runtime execution mode, where checkpointing
>> can still be useful.
>>
>
>I am not sure we have a good way to support checkpoint while still
>achieving the performance improves targeted by this FLIP.
>
>The issue here is that if we support checkpoint, then we can not take
>advantage of algorithms (e.g. sorting inputs using ExternalSorter) that are
>not compatible with checkpoints. These algorithms (which do not support
>checkpoint) are the main reasons why batch mode currently significantly
>outperforms stream mode in doing aggregation/cogroup etc.
>
>In most cases where the user does not care about processing latency, it is
>generally preferred to use batch mode to perform aggregation operations
>(which should be 10X faster than the existing stream mode performance)
>instead of doing checkpoint.
>
>Also note that we can still let operators perform failover in the same as
>the existing batch mode execution, where the intermediate results (produced
>by one operator) can be persisted in shuffle service and downstream
>operators can re-read those data from shuffle service after failover.
>
>
>>
>> 3. The proposal mentions that if a transformation has isOutputOnEOF ==
>> true, the
>> operator as well as its upstream operators will be executed in 'batch
>> mode' with
>> checkpointing disabled. I would like to understand the specific
>> implications of this
>> 'batch mode' and if there are any other changes associated with it?
>
>
>Good point. We should explicitly mention the changes. I have updated the
>FLIP to clarify this.
>
>More specifically, the checkpoint is disabled when these operators are
>running, such that these operators can do operations not compatible with
>checkpoints (e.g. sorting inputs). And operators should re-read the data
>from the upstream blocking edge or sources after failover.
>
>Would this answer your question?
>
>
>>
>> Additionally, I am curious to know if this 'batch mode' conflicts with the
>> 'mix mode'
>>
>> described in FLIP-327. While the coGroup and keyBy().aggregate() operators
>> on
>> EndOfStreamWindows have the attribute 'isInternalSorterSupported' set to
>> true,
>> indicating support for the 'mixed mode', they also have isOutputOnEOF set
>> to true,
>> which suggests that the upstream operators should be executed in 'batch
>> mode'.
>> Will the 'mixed mode' be ignored when in 'batch mode'? I would appreciate
>> any
>> clarification on this matter.
>>
>
>Good question. I think `isInternalSorterSupported` and `isOutputOnEOF` do
>not conflict with each other.
>
>It might be useful to recap the semantics of these attributes:
>- `isOutputOnEOF` describes whether an operator outputs data only after all
>its input has been ingested by the operator.
>-  `isInternalSorterSupported` describes whether an operator will use an
>internal sorter when it does not need to do checkpoints.
>
>And we can further derive that these semantics of two attributes do not

Re: [Discuss] CRD for flink sql gateway in the flink k8s operator

Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Re: [DISCUSS] FLIP-307: Flink connector Redshift

Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

[jira] [Created] (FLINK-33095) Job jar related issue should be reported as BAD_REQUEST instead of INTERNAL_SERVER_ERROR

Re:Re: [DISCUSS] FLIP-331: Support EndOfStreamWindows and isOutputOnEOF operator attribute to optimize task deployment

6 matches

Site Navigation

Mail list logo

Footer information