Re: [DISCUSS] FLIP-XXX Support currentFetchEventTimeLag and processingLag metrics

2024-04-28 Thread Qingsheng Ren
Hi Jialiang,

Thanks for the FLIP! Here're some thoughts of mine.

- For currentFetchEventTimeLag:

The problem of currentFetchEventTimeLag is: FetchTime is determined in
SplitReader driven by SplitFetcher thread, while EventTime is calculated at
the output of SourceOperator driven by task's main thread [1], and there's
a barrier (the elementQueue) between, so it's hard to calculate FetchTime -
EventTime accurately against two threads. I assume the new method
"recordFetched()" in SourceReaderMetricGroup can only be invoked in
SplitReader when records are being fetched from an external system, and
this will introduce concurrency issues as the event time is determined in a
different thread.

One possible solution in my mind is that records carry their own FetchTime
all the way until they reach SourceOutput and their event time is extracted
by TimestampAssigner, then we can calculate the accurate FetchTime -
EventTime. This requires some changes in the SourceReaderBase API.

- For currentProcessingTime:

The name is a bit confusing to me, as it is quite similar to the
"processing time" concept in stream processing [2]. Also I have some
concerns about this new metric: I think it can be derived directly by two
existing metrics (currentEmitEventTimeLag - currentFetchEventTimeLag), so
is it necessary to introduce a new one? It should be very easy to perform
this subtraction in an external metric monitoring system.

Best,
Qingsheng

[1]
https://github.com/apache/flink/blob/b1544e4e513d2b75b350c20dbb1c17a8232c22fd/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/source/SourceOutputWithWatermarks.java#L107
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/time/#notions-of-time-event-time-and-processing-time

On Fri, Apr 26, 2024 at 8:05 PM jialiang tan 
wrote:

> Thanks and glad I saw your reply.
>
> > I could not find the precise comment in the thread you are referring to
> unfortunately. Do you have some specific point in the discussion in mind?
>
> Sorry for the misdirection, are you worried about the time zone and the NTP
> problem?
> Here are some of my personal insights, someone please correct me if I'm
> wrong.
>
> Regarding the NTP, I have consulted with an ops colleague in my company,
> this is guaranteed by the sysadmin, usually developers do not need to
> consider this problem.
> Regarding the time zone issue, I think consuming data across time zones
> shouldn't be happen (e.g. data produced in Singapore and consumed by Flink
> in China), perhaps as you said, providing this should come with disclaimers
> to the user.
>
> If we have to consider the above issues, I think it is hard for us to
> implement metrics like `currentEmitEventTimeLag`,
> `currentFetchEventTimeLag`.
> Assuming we ignore the NTP and timezone issues, then
> `currentFetchEventTimeLag` and `currentEmitEventTimeLag` are actually a
> pretty good reflection of the current consumption latency from the external
> db/mq.
>
> > However, why reflection?
>
> Thanks for the doubt, good question! Yes, at first it was just a
> workaround. But I found that "There is no connector (yet) available for
> Flink version 1.19" in flink-connector-kafka document[1]. And I kept
> thinking about it. It might be better to add support for this feature after
> upgrading the flink version of flink-connector-kafka. WDYT?
>
> > Is this some pseudo-code or an actual implementation? This should invoke
> something like `System.nanoTime()` and not `System.currentTimeMillis()`
> because of precision/accuracy reasons [1]
>
> This is the actual implementation and it has been running well in my
> company for months. And currently `currentEmitEventTimeLag` uses
> `System.currentTimeMillis()`, and `currentFetchEventTimeLag` should match
> it in my opinion.
>
> Best,
> TanJiaLiang.
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/kafka/#dependencies
>
>
>  于2024年4月26日周五 15:50写道:
>
> > Sorry for some mal-informed questions, and thanks for adding context here
> > and to the FLIP.
> >
> > About 1:
> > > I think this has been discussed in the FLIP-33 lists thread[2].
> >
> > I could not find the precise comment in the thread you are referring to
> > unfortunately. Do you have some specific point in the discussion in mind?
> > However I understand that those metrics already exist and you are making
> > them available now :)
> >
> > About 2:
> > Good :)
> >
> > About 3:
> > Totally agree with you, I like this approach, looks very consistent.
> >
> > About 4:
> > Yeah, it is already there :)
> >
> > About 5:
> > Thank you for providing the example, now it is clearer.
> > However, why reflection? Is this only a workaround to make the current
> > Kafka connector invoke `recordFetched` with newer version of Flink
> without
> > bumping the Flink version? Because, when you bump the Flink version the
> > method will be exposed at compile time by `SourceReaderMetricGroup`. Can
> > you comment on this?

Re: [DISCUSS] FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-04-28 Thread Ron Liu
Hi, Lorenzo

> I have a question there: how can the gateway update the refreshHandler in
the Catalog before getting it from the scheduler?

The refreshHandler in CatalogMateriazedTable is null before getting it from
the scheduler, you can look at the CatalogMaterializedTable.Builder[1] for
more details.

> You have a typo here: WorkflowScheudler -> WorkflowScheduler :)

Fix it now, thanks very much.

> For the operations part, I still think that the FLIP would benefit from
providing a specific pattern for operations. You could either propose a
command pattern [1] or a visitor pattern (where the scheduler visits the
operation to get relevant info) [2] for those operations at your choice.

Thank you for your input, I find it very useful. I tried to understand your
thinking through code and implemented the following pseudo code using the
visitor design pattern:
1. first defined WorkflowOperationVisitor, providing several overloaded
visit methods.

public interface WorkflowOperationVisitor {

 T visit(CreateWorkflowOperation
createWorkflowOperation);

void visit(ModifyWorkflowOperation operation);
}

2. then in the WorkflowOperation add the accept method.

@PublicEvolving
public interface WorkflowOperation {

void accept(WorkflowOperationVisitor visitor);
}


3. in the WorkflowScheduler call the implementation class of
WorkflowOperationVisitor, complete the corresponding operations.

I recognize this design pattern purely from a code design point of view,
but from the point of our specific scenario:
1. For CreateWorkflowOperation, the visit method needs to return
RefreshHandler, for ModifyWorkflowOperation, such as suspend and resume,
the visit method doesn't need to return RefreshHandler. parameter,
currently for different WorkflowOperation, WorkflowOperationVisitor#accept
can't be unified, so I think visitor may not be applicable here.

2. In addition, I think using the visitor pattern will add complexity to
the WorkflowScheduler implementer, which needs to implement one more
interface WorkflowOperationVisitor, this interface is not for the engine to
use, so I don't see any benefit from this design at the moment.

3. furthermore, I think the current does not see the benefits of the time,
simpler instead of better, similar to the design of
CatalogModificationEvent[2] and CatalogModificationListener[3], the
developer only needs instanceof judgment.

To summarize, I don't think there is a need to introduce command or visitor
pattern at present.

> About the REST API, I will wait for your offline discussion :)

After discussing with Shengkai offline, there is no need for this REST API
to support multiple tables to be refreshed at the same time, so it would be
more appropriate to put the materialized table identifier in the path of
the URL, thanks for the suggestion.

[1]
https://github.com/apache/flink/blob/e412402ca4dfc438e28fb990dc53ea7809430aee/flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogMaterializedTable.java#L264
[2]
https://github.com/apache/flink/blob/b1544e4e513d2b75b350c20dbb1c17a8232c22fd/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/listener/CatalogModificationEvent.java#L28
[3]
https://github.com/apache/flink/blob/b1544e4e513d2b75b350c20dbb1c17a8232c22fd/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/listener/CatalogModificationListener.java#L31

Best,
Ron

Ron Liu  于2024年4月28日周日 23:53写道:

> Hi, Shengkai
>
> Thanks for your feedback and suggestion, it looks very useful for this
> proposal, regarding your question I made the following optimization:
>
> > *WorkflowScheduler*
> > 1. How to get the exception details if `modifyRefreshWorkflow` fails?
> > 2. Could you give us an example about how to configure the scheduler?
>
> 1. Added a new WorkflowException, WorkflowScheduler's related method
> signature will throw WorkflowException, when creating or modifying Workflow
> encountered an exception, so that the framework will sense and deal with it.
>
> 2. Added a new Configuration section, introduced a new Option, and gave an
> example of how to define the Scheduler in flink-conf.yaml.
>
> > *SQL Gateway*
> > 1. SqlGatewayService requires Session as the input, but the REST API
> doesn't need any Session information.
> > 2. Use "-" instead of "_" in the REST URI and camel case for fields in
> request/response
> > 3. Do we need scheduleTime and scheduleTimeFormat together?
>
> 1. If it is designed as a synchronous API, it may lead to network jitter,
> thread resource exhaustion and other problems, which I have not considered
> before. The asynchronous API, although increasing the cost of use for the
> user, is friendly to the SqlGatewayService, as well as the Client thread
> resources. In summary as discussed offline, so I also tend to think that
> all APIs of SqlGateway should be unified, and all should be asynchronous
> APIs, and bound to session. I have updated the REST API section in FLIP.

[jira] [Created] (FLINK-35261) Flink CDC pipeline transform doesn't support decimal comparison

2024-04-28 Thread yux (Jira)
yux created FLINK-35261:
---

 Summary: Flink CDC pipeline transform doesn't support decimal 
comparison
 Key: FLINK-35261
 URL: https://issues.apache.org/jira/browse/FLINK-35261
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: yux


It would be convenient if we can filter by comparing decimal to number literals 
like:

{{transform:}}

{{  - source-table: XXX}}

{{    filter: price > 50}}

where price is a Decimal typed column. However currently such expression is not 
supported, and a runtime exception will be thrown as follows:

 

Caused by: org.apache.flink.api.common.InvalidProgramException: Expression 
cannot be compiled. This is a bug. Please file an issue.
Expression: import static 
org.apache.flink.cdc.runtime.functions.SystemFunctionUtils.*;PRICEALPHA > 50
    at 
org.apache.flink.cdc.runtime.operators.transform.TransformExpressionCompiler.lambda$compileExpression$0(TransformExpressionCompiler.java:62)
 
~[blob_p-c3b34ad5a5a3a0bc443bb738e308b20b1da04a1f-8d419e2c927baeb0eeb40fb35c0a52dc:3.2-SNAPSHOT]
    at 
org.apache.flink.shaded.guava31.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4868)
 
~[blob_p-d4bee45ff47f13d247ff78e6f3d164170cf71835-4816fa00c7fd16a7096498e5ed3caaee:3.2-SNAPSHOT]
    at 
org.apache.flink.shaded.guava31.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3533)
 
~[blob_p-d4bee45ff47f13d247ff78e6f3d164170cf71835-4816fa00c7fd16a7096498e5ed3caaee:3.2-SNAPSHOT]
    at 
org.apache.flink.shaded.guava31.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2282)
 
~[blob_p-d4bee45ff47f13d247ff78e6f3d164170cf71835-4816fa00c7fd16a7096498e5ed3caaee:3.2-SNAPSHOT]
    at 
org.apache.flink.shaded.guava31.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2159)
 
~[blob_p-d4bee45ff47f13d247ff78e6f3d164170cf71835-4816fa00c7fd16a7096498e5ed3caaee:3.2-SNAPSHOT]
    at 
org.apache.flink.shaded.guava31.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2049)
 
~[blob_p-d4bee45ff47f13d247ff78e6f3d164170cf71835-4816fa00c7fd16a7096498e5ed3caaee:3.2-SNAPSHOT]
    at 
org.apache.flink.shaded.guava31.com.google.common.cache.LocalCache.get(LocalCache.java:3966)
 
~[blob_p-d4bee45ff47f13d247ff78e6f3d164170cf71835-4816fa00c7fd16a7096498e5ed3caaee:3.2-SNAPSHOT]
    at 
org.apache.flink.shaded.guava31.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863)
 
~[blob_p-d4bee45ff47f13d247ff78e6f3d164170cf71835-4816fa00c7fd16a7096498e5ed3caaee:3.2-SNAPSHOT]
    at 
org.apache.flink.cdc.runtime.operators.transform.TransformExpressionCompiler.compileExpression(TransformExpressionCompiler.java:46)
 
~[blob_p-c3b34ad5a5a3a0bc443bb738e308b20b1da04a1f-8d419e2c927baeb0eeb40fb35c0a52dc:3.2-SNAPSHOT]
    ... 17 more
Caused by: org.codehaus.commons.compiler.CompileException: Line 1, Column 89: 
Cannot compare types "java.math.BigDecimal" and "int"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35260) Translate "Watermark alignment "page into Chinese

2024-04-28 Thread hongxu han (Jira)
hongxu han created FLINK-35260:
--

 Summary: Translate "Watermark alignment "page into Chinese
 Key: FLINK-35260
 URL: https://issues.apache.org/jira/browse/FLINK-35260
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation, Documentation
Affects Versions: 1.19.0
Reporter: hongxu han


Watermark alignment lack of chinese translation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-kafka v3.2.0, release candidate #1

2024-04-28 Thread Aleksandr Pilipenko
+1 (non-binding)

- Validated checksum
- Verified signature
- Checked that no binaries exist in the source archive
- Build source
- Verified web PR

Thanks,
Aleksandr

On Sun, 28 Apr 2024 at 11:35, Hang Ruan  wrote:

> +1 (non-binding)
>
> - Validated checksum hash
> - Verified signature
> - Verified that no binaries exist in the source archive
> - Build the source with Maven and jdk8
> - Verified web PR
> - Check that the jar is built by jdk8
>
> Best,
> Hang
>
> Ahmed Hamdy  于2024年4月24日周三 17:21写道:
>
> > Thanks Danny,
> > +1 (non-binding)
> >
> > - Verified Checksums and hashes
> > - Verified Signatures
> > - Reviewed web PR
> > - github tag exists
> > - Build source
> >
> >
> > Best Regards
> > Ahmed Hamdy
> >
> >
> > On Tue, 23 Apr 2024 at 03:47, Muhammet Orazov
> > 
> > wrote:
> >
> > > Thanks Danny, +1 (non-binding)
> > >
> > > - Checked 512 hash
> > > - Checked gpg signature
> > > - Reviewed pr
> > > - Built the source with JDK 11 & 8
> > >
> > > Best,
> > > Muhammet
> > >
> > > On 2024-04-22 13:55, Danny Cranmer wrote:
> > > > Hi everyone,
> > > >
> > > > Please review and vote on release candidate #1 for
> > > > flink-connector-kafka
> > > > v3.2.0, as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > This release supports Flink 1.18 and 1.19.
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > > * JIRA release notes [1],
> > > > * the official Apache source release to be deployed to
> dist.apache.org
> > > > [2],
> > > > which are signed with the key with fingerprint 125FD8DB [3],
> > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > * source code tag v3.2.0-rc1 [5],
> > > > * website pull request listing the new release [6].
> > > > * CI build of the tag [7].
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Danny
> > > >
> > > > [1]
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354209
> > > > [2]
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.2.0-rc1
> > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > [4]
> > > >
> https://repository.apache.org/content/repositories/orgapacheflink-1723
> > > > [5]
> > > >
> > https://github.com/apache/flink-connector-kafka/releases/tag/v3.2.0-rc1
> > > > [6] https://github.com/apache/flink-web/pull/738
> > > > [7] https://github.com/apache/flink-connector-kafka
> > >
> >
>


Re: [VOTE] Release flink-connector-aws v4.3.0, release candidate #2

2024-04-28 Thread Aleksandr Pilipenko
+1 (non-binding)

- Verified checksums
- Verified signatures
- Checked that no binaries exist in the source archive
- Reviewed Web PR
- Built source

Thanks,
Aleksandr

On Mon, 22 Apr 2024 at 09:31, Ahmed Hamdy  wrote:

> Thanks Danny,
> +1 (non-binding)
>
> - Verified Checksums
> - Verified Signatures
> - No binaries exists in source archive
> - Built source
> - Reviewed Web PR
> - Run basic Kinesis example
>
>
> Best Regards
> Ahmed Hamdy
>
>
> On Sun, 21 Apr 2024 at 14:25, Hang Ruan  wrote:
>
> > +1 (non-binding)
> >
> > - Validated checksum hash
> > - Verified signature
> > - Verified that no binaries exist in the source archive
> > - Build the source with Maven and jdk8
> > - Verified web PR
> > - Check that the jar is built by jdk8
> >
> > Best,
> > Hang
> >
> > Danny Cranmer  于2024年4月19日周五 18:08写道:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on release candidate #2 for flink-connector-aws
> > > v4.3.0, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > This version supports Flink 1.18 and 1.19.
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > which are signed with the key with fingerprint 125FD8DB [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag v4.3.0-rc2 [5],
> > > * website pull request listing the new release [6].
> > > * CI build of the tag [7].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Release Manager
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353793
> > > [2]
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-aws-4.3.0-rc2
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1721/
> > > [5]
> > https://github.com/apache/flink-connector-aws/releases/tag/v4.3.0-rc2
> > > [6] https://github.com/apache/flink-web/pull/733
> > > [7]
> > https://github.com/apache/flink-connector-aws/actions/runs/8751694197
> > >
> >
>


Re: [DISCUSS] FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-04-28 Thread Ron Liu
Hi, Shengkai

Thanks for your feedback and suggestion, it looks very useful for this
proposal, regarding your question I made the following optimization:

> *WorkflowScheduler*
> 1. How to get the exception details if `modifyRefreshWorkflow` fails?
> 2. Could you give us an example about how to configure the scheduler?

1. Added a new WorkflowException, WorkflowScheduler's related method
signature will throw WorkflowException, when creating or modifying Workflow
encountered an exception, so that the framework will sense and deal with it.

2. Added a new Configuration section, introduced a new Option, and gave an
example of how to define the Scheduler in flink-conf.yaml.

> *SQL Gateway*
> 1. SqlGatewayService requires Session as the input, but the REST API
doesn't need any Session information.
> 2. Use "-" instead of "_" in the REST URI and camel case for fields in
request/response
> 3. Do we need scheduleTime and scheduleTimeFormat together?

1. If it is designed as a synchronous API, it may lead to network jitter,
thread resource exhaustion and other problems, which I have not considered
before. The asynchronous API, although increasing the cost of use for the
user, is friendly to the SqlGatewayService, as well as the Client thread
resources. In summary as discussed offline, so I also tend to think that
all APIs of SqlGateway should be unified, and all should be asynchronous
APIs, and bound to session. I have updated the REST API section in FLIP.

2. thanks for the reminder, it has been updated

3. After rethinking, I think it can indeed be simpler, there is no need to
pass in a custom time format, scheduleTime can be unified to the SQL
standard timestamp format: '-MM-dd HH:mm:ss', it is able to satisfy the
time related needs of materialized table.

Based on your feedback, I have optimized and updated the FLIP related
section.

Best,
Ron


Shengkai Fang  于2024年4月28日周日 15:47写道:

> Hi, Liu.
>
> Thanks for your proposal. I have some questions about the FLIP:
>
> *WorkflowScheduler*
>
> 1. How to get the exception details if `modifyRefreshWorkflow` fails?
> 2. Could you give us an example about how to configure the scheduler?
>
> *SQL Gateway*
>
> 1. SqlGatewayService requires Session as the input, but the REST API
> doesn't need any Session information.
>
> From the perspective of a gateway developer, I tend to unify the API of the
> SQL gateway, binding all concepts to the session. On the one hand, this
> approach allows us to reduce maintenance and understanding costs, as we
> only need to maintain one set of architecture to complete basic concepts.
> On the other hand, the benefits of an asynchronous architecture are
> evident: we maintain state on the server side. If the request is a long
> connection, even in the face of network layer jitter, we can still find the
> original result through session and operation handles.
>
> Using asynchronous APIs may increase the development cost for users, but
> from a platform perspective, if a request remains in a blocking state for a
> long time, it also becomes a burden on the platform's JVM. This is because
> thread switching and maintenance require certain resources.
>
> 2. Use "-" instead of "_" in the REST URI and camel case for fields in
> request/response
>
> Please follow the Flink REST Design.
>
> 3. Do we need scheduleTime and scheduleTimeFormat together?
>
> I think we can use SQL timestamp format or ISO timestamp format. It is not
> necessary to pass time in any specific format.
>
> https://en.wikipedia.org/wiki/ISO_8601
>
> Best,
> Shengkai
>


[jira] [Created] (FLINK-35259) FlinkCDC Pipeline transform can't deal timestamp field

2024-04-28 Thread Wenkai Qi (Jira)
Wenkai Qi created FLINK-35259:
-

 Summary: FlinkCDC Pipeline transform can't deal timestamp field
 Key: FLINK-35259
 URL: https://issues.apache.org/jira/browse/FLINK-35259
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: 3.1.0
Reporter: Wenkai Qi
 Fix For: 3.1.0


When the original table contains fields of type Timestamp, it cannot be 
converted properly.

When the added calculation columns contain fields of type Timestamp, it cannot 
be converted properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-kafka v3.2.0, release candidate #1

2024-04-28 Thread Hang Ruan
+1 (non-binding)

- Validated checksum hash
- Verified signature
- Verified that no binaries exist in the source archive
- Build the source with Maven and jdk8
- Verified web PR
- Check that the jar is built by jdk8

Best,
Hang

Ahmed Hamdy  于2024年4月24日周三 17:21写道:

> Thanks Danny,
> +1 (non-binding)
>
> - Verified Checksums and hashes
> - Verified Signatures
> - Reviewed web PR
> - github tag exists
> - Build source
>
>
> Best Regards
> Ahmed Hamdy
>
>
> On Tue, 23 Apr 2024 at 03:47, Muhammet Orazov
> 
> wrote:
>
> > Thanks Danny, +1 (non-binding)
> >
> > - Checked 512 hash
> > - Checked gpg signature
> > - Reviewed pr
> > - Built the source with JDK 11 & 8
> >
> > Best,
> > Muhammet
> >
> > On 2024-04-22 13:55, Danny Cranmer wrote:
> > > Hi everyone,
> > >
> > > Please review and vote on release candidate #1 for
> > > flink-connector-kafka
> > > v3.2.0, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > This release supports Flink 1.18 and 1.19.
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > which are signed with the key with fingerprint 125FD8DB [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag v3.2.0-rc1 [5],
> > > * website pull request listing the new release [6].
> > > * CI build of the tag [7].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Danny
> > >
> > > [1]
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354209
> > > [2]
> > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-connector-kafka-3.2.0-rc1
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > > https://repository.apache.org/content/repositories/orgapacheflink-1723
> > > [5]
> > >
> https://github.com/apache/flink-connector-kafka/releases/tag/v3.2.0-rc1
> > > [6] https://github.com/apache/flink-web/pull/738
> > > [7] https://github.com/apache/flink-connector-kafka
> >
>


[jira] [Created] (FLINK-35258) Broken links to Doris in Flink CDC Documentation

2024-04-28 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-35258:
-

 Summary:  Broken links to Doris in Flink CDC Documentation
 Key: FLINK-35258
 URL: https://issues.apache.org/jira/browse/FLINK-35258
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: cdc-3.1.0
Reporter: Qingsheng Ren
Assignee: Qingsheng Ren
 Fix For: cdc-3.1.0


These broken links are detected by CI: 
 
{code:java}
ERROR: 3 dead links found!
535  [✖] 
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD/
 → Status: 404
536  [✖] 
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE/
 → Status: 404
537  [✖] 
https://doris.apache.org/docs/dev/sql-manual/sql-reference/Data-Types/BOOLEAN/ 
→ Status: 404 


ERROR: 3 dead links found!
1008  [✖] 
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD/
 → Status: 404
1009  [✖] 
https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE/
 → Status: 404
1010  [✖] 
https://doris.apache.org/docs/dev/sql-manual/sql-reference/Data-Types/BOOLEAN/ 
→ Status: 404{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-04-28 Thread Shengkai Fang
Hi, Liu.

Thanks for your proposal. I have some questions about the FLIP:

*WorkflowScheduler*

1. How to get the exception details if `modifyRefreshWorkflow` fails?
2. Could you give us an example about how to configure the scheduler?

*SQL Gateway*

1. SqlGatewayService requires Session as the input, but the REST API
doesn't need any Session information.

>From the perspective of a gateway developer, I tend to unify the API of the
SQL gateway, binding all concepts to the session. On the one hand, this
approach allows us to reduce maintenance and understanding costs, as we
only need to maintain one set of architecture to complete basic concepts.
On the other hand, the benefits of an asynchronous architecture are
evident: we maintain state on the server side. If the request is a long
connection, even in the face of network layer jitter, we can still find the
original result through session and operation handles.

Using asynchronous APIs may increase the development cost for users, but
from a platform perspective, if a request remains in a blocking state for a
long time, it also becomes a burden on the platform's JVM. This is because
thread switching and maintenance require certain resources.

2. Use "-" instead of "_" in the REST URI and camel case for fields in
request/response

Please follow the Flink REST Design.

3. Do we need scheduleTime and scheduleTimeFormat together?

I think we can use SQL timestamp format or ISO timestamp format. It is not
necessary to pass time in any specific format.

https://en.wikipedia.org/wiki/ISO_8601

Best,
Shengkai


[jira] [Created] (FLINK-35257) optimize the handling of schema.change.behavior as EXCEPTION

2024-04-28 Thread yanghuai (Jira)
yanghuai created FLINK-35257:


 Summary: optimize the handling of schema.change.behavior as 
EXCEPTION
 Key: FLINK-35257
 URL: https://issues.apache.org/jira/browse/FLINK-35257
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: yanghuai


cdc job will throw exception when schema.change.behavior is exception,The data 
between the last checkpoint and the current DDL will not be refreshed to the 
sink, making it difficult for the task to recover this part of the data.
if the sink will flush all data and then throw an exception. This way, the task 
only needs to recover from the DDL position on the source to ensure data 
consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35256) Pipeline transform ignores column type nullability

2024-04-28 Thread yux (Jira)
yux created FLINK-35256:
---

 Summary: Pipeline transform ignores column type nullability
 Key: FLINK-35256
 URL: https://issues.apache.org/jira/browse/FLINK-35256
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: yux
 Attachments: log.txt

Flink CDC 3.1.0 brought transform feature, allowing column type / value 
transformation prior to data routing process. However after the transformation, 
column type marked as `NOT NULL` lost their annotation, causing some downstream 
sinks to fail since they require primary key to be NOT NULL.

Here's the minimum reproducible example about this problem:

```yaml
source:
  type: mysql
  ...

sink:
  type: starrocks
  name: StarRocks Sink
  ...

pipeline:
  name: Sync MySQL Database to StarRocks
  parallelism: 4

transform:
  - source-table: reicigo.\.*
projection: ID, UPPER(ID) AS UPID
```

In the MySQL source table, primary key column `ID` is marked as `NOT NULL`, but 
such information was lost at downstream, causing the following exception (see 
attachment).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35255) DataSinkWriterOperator should override snapshotState和processWatermark method

2024-04-28 Thread yanghuai (Jira)
yanghuai created FLINK-35255:


 Summary: DataSinkWriterOperator should override 
snapshotState和processWatermark method
 Key: FLINK-35255
 URL: https://issues.apache.org/jira/browse/FLINK-35255
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: yanghuai
 Fix For: cdc-3.1.0


DataSinkWriterOperator just override 
org.apache.flink.streaming.api.operators.AbstractStreamOperator#initializeState,but
 not override 
org.apache.flink.streaming.api.operators.AbstractStreamOperator#snapshotState 
and 
org.apache.flink.streaming.api.operators.AbstractStreamOperator#processWatermark.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)