Re: [PR] [FLINK-33102][autoscaler] Document the autoscaler standalone and Extensibility of Autoscaler [flink-kubernetes-operator]

2023-11-10 Thread via GitHub


gyfora merged PR #703:
URL: https://github.com/apache/flink-kubernetes-operator/pull/703


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-33102) Document the standalone autoscaler

2023-11-10 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-33102.
--
Resolution: Fixed

merged to main e36819820627dedaca66d33f85687166ac829395

> Document the standalone autoscaler
> --
>
> Key: FLINK-33102
> URL: https://issues.apache.org/jira/browse/FLINK-33102
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33401) Kafka connector has broken version

2023-11-10 Thread Yuxin Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxin Tan updated FLINK-33401:
--
Attachment: screenshot-1.png

> Kafka connector has broken version
> --
>
> Key: FLINK-33401
> URL: https://issues.apache.org/jira/browse/FLINK-33401
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Reporter: Pavel Khokhlov
>Priority: Major
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>
> Trying to run Flink 1.18 with Kafka Connector
> but official documentation has a bug  
> [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/kafka/]
> {noformat}
> 
> org.apache.flink
> flink-connector-kafka
> -1.18
> {noformat}
> Basically version *-1.18* doesn't exist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33295) Separate SinkV2 and SinkV1Adapter tests

2023-11-10 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784710#comment-17784710
 ] 

Martijn Visser commented on FLINK-33295:


It's used for testing only, see 
https://github.com/apache/flink-connector-elasticsearch/pull/81 which fixed it 
for Elasticsearch. 

> Separate SinkV2 and SinkV1Adapter tests
> ---
>
> Key: FLINK-33295
> URL: https://issues.apache.org/jira/browse/FLINK-33295
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Connectors / Common
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Current SinkV2 tests are based on the sink generated by the 
> _org.apache.flink.streaming.runtime.operators.sink.TestSink_ test class. This 
> test class does not generate the SinkV2 directly, but generates a SinkV1 and 
> wraps in with a 
> _org.apache.flink.streaming.api.transformations.SinkV1Adapter._ While this 
> tests the SinkV2, but only as it is aligned with SinkV1, and the 
> SinkV1Adapter.
> We should have tests where we create a SinkV2 directly and the functionality 
> is tested without the adapter.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33210][core] Introduce job status changed listener for lineage [flink]

2023-11-10 Thread via GitHub


FangYongs opened a new pull request, #23695:
URL: https://github.com/apache/flink/pull/23695

   ## What is the purpose of the change
   
   This PR aims to introduce job status changed listener for lineage.
   
   ## Brief change log
   
 - Introduce JobStatusChangedListener and JobStatusChangedListenerFactory 
for flink
 - Introduce JobStatusChangedEvent for listener
 - Create listener in execution graph and notify listener when job status 
changes
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
 - Added JobStatusChangedListenerTest to validate job status changed events
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no) no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no) no
 - The serializers: (yes / no / don't know) no
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know) no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) 
no
 - The S3 file system connector: (yes / no / don't know) no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no) no
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33487) Add the new Snowflake connector to supported list

2023-11-10 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784713#comment-17784713
 ] 

Martijn Visser commented on FLINK-33487:


[~morezaei00] My pleasure. Let me know if I can help otherwise

> Add the new Snowflake connector to supported list
> -
>
> Key: FLINK-33487
> URL: https://issues.apache.org/jira/browse/FLINK-33487
> Project: Flink
>  Issue Type: New Feature
>  Components: Documentation
>Affects Versions: 1.18.0, 1.17.1
>Reporter: Mohsen Rezaei
>Priority: Major
>
> Code was contributed in FLINK-32737.
> Add this new connector to the list of supported ones in the documentation 
> with a corresponding sub-page for the details of the connector:
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/overview/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33210][core] Introduce job status changed listener for lineage [flink]

2023-11-10 Thread via GitHub


flinkbot commented on PR #23695:
URL: https://github.com/apache/flink/pull/23695#issuecomment-1805283690

   
   ## CI report:
   
   * 6c765ee67ab2f2081c0d215d19991f423aae9748 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-32723) FLIP-334 : Decoupling autoscaler and kubernetes and support the Standalone Autoscaler

2023-11-10 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784714#comment-17784714
 ] 

Rui Fan commented on FLINK-32723:
-

Hi [~gyfora]  [~mxm] , I resolve this Jira first, it means the 
FLIP-334(decoupling and support autoscaler standalone) has already done in 
kubernetes-operator 1.7.0. Thank you very much for your professional 
suggestions and review on FLIP-334. :)

I create FLINK-33452 to follow the a series of improvements related to 
Autoscaler Standalone, and do them gradually in the future versions.

> FLIP-334 : Decoupling autoscaler and kubernetes and support the Standalone 
> Autoscaler
> -
>
> Key: FLINK-32723
> URL: https://issues.apache.org/jira/browse/FLINK-32723
> Project: Flink
>  Issue Type: New Feature
>  Components: Autoscaler, Kubernetes Operator
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
> Fix For: kubernetes-operator-1.7.0
>
>
> This is an umbrella jira for decoupling autoscaler and kubernetes.
> https://cwiki.apache.org/confluence/x/x4qzDw



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-32723) FLIP-334 : Decoupling autoscaler and kubernetes and support the Standalone Autoscaler

2023-11-10 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan resolved FLINK-32723.
-
Resolution: Fixed

> FLIP-334 : Decoupling autoscaler and kubernetes and support the Standalone 
> Autoscaler
> -
>
> Key: FLINK-32723
> URL: https://issues.apache.org/jira/browse/FLINK-32723
> Project: Flink
>  Issue Type: New Feature
>  Components: Autoscaler, Kubernetes Operator
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
> Fix For: kubernetes-operator-1.7.0
>
>
> This is an umbrella jira for decoupling autoscaler and kubernetes.
> https://cwiki.apache.org/confluence/x/x4qzDw



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33251) SQL Client query execution aborts after a few seconds: ConnectTimeoutException

2023-11-10 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784715#comment-17784715
 ] 

Martijn Visser commented on FLINK-33251:


[~jcaberio] Are you also on a VPN by any chance? 

> SQL Client query execution aborts after a few seconds: ConnectTimeoutException
> --
>
> Key: FLINK-33251
> URL: https://issues.apache.org/jira/browse/FLINK-33251
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.18.0, 1.17.1
> Environment: Macbook Pro 
> Apple M1 Max
>  
> {code:java}
> $ uname -a
> Darwin asgard08 23.0.0 Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:43 PDT 
> 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6000 arm64
> {code}
> {code:bash}
> $ java --version
> openjdk 11.0.20.1 2023-08-24
> OpenJDK Runtime Environment Homebrew (build 11.0.20.1+0)
> OpenJDK 64-Bit Server VM Homebrew (build 11.0.20.1+0, mixed mode)
> $ mvn --version
> Apache Maven 3.9.5 (57804ffe001d7215b5e7bcb531cf83df38f93546)
> Maven home: /opt/homebrew/Cellar/maven/3.9.5/libexec
> Java version: 11.0.20.1, vendor: Homebrew, runtime: 
> /opt/homebrew/Cellar/openjdk@11/11.0.20.1/libexec/openjdk.jdk/Contents/Home
> Default locale: en_GB, platform encoding: UTF-8
> OS name: "mac os x", version: "14.0", arch: "aarch64", family: "mac"
> {code}
>Reporter: Robin Moffatt
>Priority: Major
> Attachments: log.zip
>
>
> If I run a streaming query from an unbounded connector from the SQL Client, 
> it bombs out after ~15 seconds.
> {code:java}
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
> connection timed out: localhost/127.0.0.1:52596
> {code}
> This *doesn't* happen on 1.16.2. It *does* happen on *1.17.1* and *1.18* that 
> I have just built locally (git repo hash `9b837727b6d`). 
> The corresponding task's status in the Web UI shows as `CANCELED`. 
> ---
> h2. To reproduce
> Launch local cluster and SQL client
> {code}
> ➜  flink-1.18-SNAPSHOT ./bin/start-cluster.sh 
> Starting cluster.
> Starting standalonesession daemon on host asgard08.
> Starting taskexecutor daemon on host asgard08.
> ➜  flink-1.18-SNAPSHOT ./bin/sql-client.sh
> […]
> Flink SQL>
> {code}
> Set streaming mode and result mode
> {code:sql}
> Flink SQL> SET 'execution.runtime-mode' = 'STREAMING';
> [INFO] Execute statement succeed.
> Flink SQL> SET 'sql-client.execution.result-mode' = 'changelog';
> [INFO] Execute statement succeed.
> {code}
> Define a table to read data from CSV files in a folder
> {code:sql}
> CREATE TABLE firewall (
>   event_time STRING,
>   source_ip  STRING,
>   dest_ipSTRING,
>   source_prt INT,
>   dest_prt   INT
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'file:///tmp/firewall/',
>   'format' = 'csv',
>   'source.monitor-interval' = '1' -- unclear from the docs what the unit is 
> here
> );
> {code}
> Create a CSV file to read in
> {code:bash}
> $ mkdir /tmp/firewall
> $ cat > /tmp/firewall/data.csv < 2018-05-11 00:19:34,151.35.34.162,125.26.20.222,2014,68
> 2018-05-11 22:20:43,114.24.126.190,21.68.21.69,379,1619
> EOF
> {code}
> Run a streaming query 
> {code}
> SELECT * FROM firewall;
> {code}
> You will get results showing (and if you add another data file it will show 
> up) - but after ~30 seconds the query aborts and throws an error back to the 
> user at the SQL Client prompt
> {code}
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
> connection timed out: localhost/127.0.0.1:58470
> Flink SQL>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33484) Flink Kafka Connector Offset Lag Issue with Transactional Data and Read Committed Isolation Level

2023-11-10 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784716#comment-17784716
 ] 

Martijn Visser commented on FLINK-33484:


[~lintingbin] Can you please use the connector version I've listed above and 
try again? That's a newer version with more fixes for various bugs

> Flink Kafka Connector Offset Lag Issue with Transactional Data and Read 
> Committed Isolation Level
> -
>
> Key: FLINK-33484
> URL: https://issues.apache.org/jira/browse/FLINK-33484
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.17.1
> Environment: Flink 1.17.1
> kafka 2.5.1
>Reporter: Darcy Lin
>Priority: Major
>
> We have encountered an issue with the Flink Kafka connector when consuming 
> transactional data from Kafka with the {{isolation.level}} set to 
> {{read_committed}} ({{{}setProperty("isolation.level", 
> "read_committed"){}}}). The problem is that even when all the data from a 
> topic is consumed, the offset lag is not 0, but 1. However, when using the 
> Kafka Java client to consume the same data, this issue does not occur.
> We suspect that this issue arises due to the way Flink Kafka connector 
> calculates the offset. The problem seems to be in the 
> {{KafkaRecordEmitter.java}} file, specifically in the {{emitRecord}} method. 
> When saving the offset, the method calls 
> {{{}splitState.setCurrentOffset(consumerRecord.offset() + 1);{}}}. While this 
> statement works correctly in a regular Kafka scenario, it might not be 
> accurate when the {{read_committed}} mode is used. We believe that it should 
> be {{{}splitState.setCurrentOffset(consumerRecord.offset() + 2);{}}}, as 
> transactional data in Kafka occupies an additional offset to store the 
> transaction marker.
> We request the Flink team to investigate this issue and provide us with 
> guidance on how to resolve it.
> Thank you for your attention and support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]

2023-11-10 Thread via GitHub


MartijnVisser commented on PR #23656:
URL: https://github.com/apache/flink/pull/23656#issuecomment-1805286458

   > trino support this function, we need use this function, so can we suuport 
this function?
   
   I don't think Trino is the primary engine we compare ourselves against. Does 
the equivalent of this function exist in other databases/tech like Postgres, 
MySQL, Hive or Spark?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33206] Verify the existence of hbase table before read/write [flink-connector-hbase]

2023-11-10 Thread via GitHub


MartijnVisser merged PR #22:
URL: https://github.com/apache/flink-connector-hbase/pull/22


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33513) Metastore delegation-token can be cached?

2023-11-10 Thread katty he (Jira)
katty he created FLINK-33513:


 Summary: Metastore delegation-token can be cached?
 Key: FLINK-33513
 URL: https://issues.apache.org/jira/browse/FLINK-33513
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: katty he


Now, every time, getDelegationToken wil be called when asking for metastore, 
how about build a cache, we cache the token for the first time, then we can 
just get token from cache?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33206) Verify the existence of hbase table before read/write

2023-11-10 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser updated FLINK-33206:
---
Priority: Critical  (was: Blocker)

> Verify the existence of hbase table before read/write
> -
>
> Key: FLINK-33206
> URL: https://issues.apache.org/jira/browse/FLINK-33206
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase
>Affects Versions: hbase-3.0.1
>Reporter: tanjialiang
>Assignee: tanjialiang
>Priority: Critical
>  Labels: pull-request-available, stale-blocker
> Attachments: image-2023-10-08-16-54-05-917.png
>
>
> Currently, we do not verify the existence of hbase table before read/write, 
> and the error would make the user confused.
> The `HBaseSinkFunction` throws `TableNotFoundException` when do flush.
> The `inputFormat` throws not obvious enough.
> !image-2023-10-08-16-54-05-917.png!
> So i think we should verify the existence of hbase table when call `open` 
> function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33206) Verify the existence of hbase table before read/write

2023-11-10 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser reassigned FLINK-33206:
--

Assignee: tanjialiang

> Verify the existence of hbase table before read/write
> -
>
> Key: FLINK-33206
> URL: https://issues.apache.org/jira/browse/FLINK-33206
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase
>Affects Versions: hbase-3.0.1
>Reporter: tanjialiang
>Assignee: tanjialiang
>Priority: Blocker
>  Labels: pull-request-available, stale-blocker
> Attachments: image-2023-10-08-16-54-05-917.png
>
>
> Currently, we do not verify the existence of hbase table before read/write, 
> and the error would make the user confused.
> The `HBaseSinkFunction` throws `TableNotFoundException` when do flush.
> The `inputFormat` throws not obvious enough.
> !image-2023-10-08-16-54-05-917.png!
> So i think we should verify the existence of hbase table when call `open` 
> function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33206) Verify the existence of hbase table before read/write

2023-11-10 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser closed FLINK-33206.
--
Fix Version/s: hbase-3.0.1
   Resolution: Fixed

Fixed in apache/flink-connector-hbase

main: 00143773ba3f647099b7f53c17133fef99ab8fed
v3.0: db4f3389a791e954bdbca0f5fbaa3a09cdc95148

> Verify the existence of hbase table before read/write
> -
>
> Key: FLINK-33206
> URL: https://issues.apache.org/jira/browse/FLINK-33206
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase
>Affects Versions: hbase-3.0.1
>Reporter: tanjialiang
>Assignee: tanjialiang
>Priority: Critical
>  Labels: pull-request-available, stale-blocker
> Fix For: hbase-3.0.1
>
> Attachments: image-2023-10-08-16-54-05-917.png
>
>
> Currently, we do not verify the existence of hbase table before read/write, 
> and the error would make the user confused.
> The `HBaseSinkFunction` throws `TableNotFoundException` when do flush.
> The `inputFormat` throws not obvious enough.
> !image-2023-10-08-16-54-05-917.png!
> So i think we should verify the existence of hbase table when call `open` 
> function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33514) FlinkScalaKryoInstantiator class not found in KryoSerializer

2023-11-10 Thread Jake.zhang (Jira)
Jake.zhang created FLINK-33514:
--

 Summary: FlinkScalaKryoInstantiator class not found in 
KryoSerializer
 Key: FLINK-33514
 URL: https://issues.apache.org/jira/browse/FLINK-33514
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.18.0
Reporter: Jake.zhang


{code:java}
16:03:13,402 INFO  
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer [] - Kryo 
serializer scala extensions are not available.
java.lang.ClassNotFoundException: 
org.apache.flink.runtime.types.FlinkScalaKryoInstantiator
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[?:1.8.0_341]
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_341]
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) 
~[?:1.8.0_341]
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_341]
    at java.lang.Class.forName0(Native Method) ~[?:1.8.0_341]
    at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_341]
    at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.getKryoInstance(KryoSerializer.java:487)
 ~[flink-core-1.18.0.jar:1.18.0]
    at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.checkKryoInitialized(KryoSerializer.java:522)
 ~[flink-core-1.18.0.jar:1.18.0]
    at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:394)
 ~[flink-core-1.18.0.jar:1.18.0]
    at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:412)
 ~[flink-core-1.18.0.jar:1.18.0]
    at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:190)
 ~[flink-streaming-java-1.18.0.jar:1.18.0]
    at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:43)
 ~[flink-streaming-java-1.18.0.jar:1.18.0]
    at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:53)
 ~[flink-runtime-1.18.0.jar:1.18.0]
    at 
org.apache.flink.runtime.io.network.api.serialization.NonSpanningWrapper.readInto(NonSpanningWrapper.java:337)
 ~[flink-runtime-1.18.0.jar:1.18.0]
    at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.readNonSpanningRecord(SpillingAdaptiveSpanningRecordDeserializer.java:128)
 ~[flink-runtime-1.18.0.jar:1.18.0]
    at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.readNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:103)
 ~[flink-runtime-1.18.0.jar:1.18.0]
    at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:93)
 ~[flink-runtime-1.18.0.jar:1.18.0]
    at 
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:100)
 ~[flink-streaming-java-1.18.0.jar:1.18.0]
    at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
 ~[flink-streaming-java-1.18.0.jar:1.18.0]
    at 
org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:85)
 ~[flink-streaming-java-1.18.0.jar:1.18.0]
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
 ~[flink-streaming-java-1.18.0.jar:1.18.0]
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
 ~[flink-streaming-java-1.18.0.jar:1.18.0]
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
 ~[flink-streaming-java-1.18.0.jar:1.18.0]
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807) 
~[flink-streaming-java-1.18.0.jar:1.18.0]
    at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
 [flink-runtime-1.18.0.jar:1.18.0]
    at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932) 
[flink-runtime-1.18.0.jar:1.18.0]
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746) 
[flink-runtime-1.18.0.jar:1.18.0]
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
[flink-runtime-1.18.0.jar:1.18.0]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_341] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33164) HBase connector support ignore null value for partial update

2023-11-10 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784722#comment-17784722
 ] 

Martijn Visser commented on FLINK-33164:


Re-opened to add documentation in apache/flink-connector-base:

main: 2dcaa6a13f6eb76337c2c28a6685a8759fe890a1
v3.0: 87912baee1f75c43a6380867984c91715f201b99

> HBase connector support ignore null value for partial update
> 
>
> Key: FLINK-33164
> URL: https://issues.apache.org/jira/browse/FLINK-33164
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase
>Affects Versions: hbase-3.0.0
>Reporter: tanjialiang
>Assignee: tanjialiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: hbase-3.0.1
>
>
> Sometimes, user want to write data and ignore null value to achieve partial 
> update. So i suggest adding an options: sink.ignore-null-value.
>  
> {code:java}
> CREATE TABLE hTable (
>  rowkey STRING,
>  cf1 ROW,
>  PRIMARY KEY (rowkey) NOT ENFORCED
> ) WITH (
>  'connector' = 'hbase-2.2',
>  'table-name' = 'default:test',
>  'zookeeper.quorum' = 'localhost:2181',
>  'sink.ignore-null-value' = 'true' -- default is false, true is enabled
> );
> INSERT INTO hTable VALUES('1', ROW('10', 'hello, world'));
> INSERT INTO hTable VALUES('1', ROW('30', CAST(NULL AS STRING))); -- null 
> value to cf1.q2
> -- when sink.ignore-null-value is false
> // after first insert
> {rowkey: "1", "cf1": {q1: "10", q2: "hello, world"}} 
> // after second insert, cf1.q2 update to null
> {rowkey: "1", "cf1": {q1: "30", q2: "null"}} 
> -- when sink.ignore-null-value is true
> // after first insert 
> {rowkey: "1", "cf1": {q1: "10", q2: "hello, world"}}
> // after second insert, cf1.q2 is still the old value 
> {rowkey: "1", "cf1": {q1: "30", q2: "hello, world"}} {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33164][docs] Add document for the write option sink.ignore-null-value [flink-connector-hbase]

2023-11-10 Thread via GitHub


MartijnVisser merged PR #31:
URL: https://github.com/apache/flink-connector-hbase/pull/31


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Reopened] (FLINK-33164) HBase connector support ignore null value for partial update

2023-11-10 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser reopened FLINK-33164:


> HBase connector support ignore null value for partial update
> 
>
> Key: FLINK-33164
> URL: https://issues.apache.org/jira/browse/FLINK-33164
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase
>Affects Versions: hbase-3.0.0
>Reporter: tanjialiang
>Assignee: tanjialiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: hbase-3.0.1
>
>
> Sometimes, user want to write data and ignore null value to achieve partial 
> update. So i suggest adding an options: sink.ignore-null-value.
>  
> {code:java}
> CREATE TABLE hTable (
>  rowkey STRING,
>  cf1 ROW,
>  PRIMARY KEY (rowkey) NOT ENFORCED
> ) WITH (
>  'connector' = 'hbase-2.2',
>  'table-name' = 'default:test',
>  'zookeeper.quorum' = 'localhost:2181',
>  'sink.ignore-null-value' = 'true' -- default is false, true is enabled
> );
> INSERT INTO hTable VALUES('1', ROW('10', 'hello, world'));
> INSERT INTO hTable VALUES('1', ROW('30', CAST(NULL AS STRING))); -- null 
> value to cf1.q2
> -- when sink.ignore-null-value is false
> // after first insert
> {rowkey: "1", "cf1": {q1: "10", q2: "hello, world"}} 
> // after second insert, cf1.q2 update to null
> {rowkey: "1", "cf1": {q1: "30", q2: "null"}} 
> -- when sink.ignore-null-value is true
> // after first insert 
> {rowkey: "1", "cf1": {q1: "10", q2: "hello, world"}}
> // after second insert, cf1.q2 is still the old value 
> {rowkey: "1", "cf1": {q1: "30", q2: "hello, world"}} {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33164) HBase connector support ignore null value for partial update

2023-11-10 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser closed FLINK-33164.
--
Resolution: Fixed

> HBase connector support ignore null value for partial update
> 
>
> Key: FLINK-33164
> URL: https://issues.apache.org/jira/browse/FLINK-33164
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase
>Affects Versions: hbase-3.0.0
>Reporter: tanjialiang
>Assignee: tanjialiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: hbase-3.0.1
>
>
> Sometimes, user want to write data and ignore null value to achieve partial 
> update. So i suggest adding an options: sink.ignore-null-value.
>  
> {code:java}
> CREATE TABLE hTable (
>  rowkey STRING,
>  cf1 ROW,
>  PRIMARY KEY (rowkey) NOT ENFORCED
> ) WITH (
>  'connector' = 'hbase-2.2',
>  'table-name' = 'default:test',
>  'zookeeper.quorum' = 'localhost:2181',
>  'sink.ignore-null-value' = 'true' -- default is false, true is enabled
> );
> INSERT INTO hTable VALUES('1', ROW('10', 'hello, world'));
> INSERT INTO hTable VALUES('1', ROW('30', CAST(NULL AS STRING))); -- null 
> value to cf1.q2
> -- when sink.ignore-null-value is false
> // after first insert
> {rowkey: "1", "cf1": {q1: "10", q2: "hello, world"}} 
> // after second insert, cf1.q2 update to null
> {rowkey: "1", "cf1": {q1: "30", q2: "null"}} 
> -- when sink.ignore-null-value is true
> // after first insert 
> {rowkey: "1", "cf1": {q1: "10", q2: "hello, world"}}
> // after second insert, cf1.q2 is still the old value 
> {rowkey: "1", "cf1": {q1: "30", q2: "hello, world"}} {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33059] Support transparent compression for file-connector for all file input formats [flink]

2023-11-10 Thread via GitHub


echauchot commented on PR #23443:
URL: https://github.com/apache/flink/pull/23443#issuecomment-1805294735

   > Thank you @echauchot !
   
   My pleasure ! Merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33059] Support transparent compression for file-connector for all file input formats [flink]

2023-11-10 Thread via GitHub


echauchot merged PR #23443:
URL: https://github.com/apache/flink/pull/23443


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33515) PythonDriver need to stream python process output to log instead of collecting it in memory

2023-11-10 Thread Gabor Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi reassigned FLINK-33515:
-

Assignee: Gabor Somogyi

> PythonDriver need to stream python process output to log instead of 
> collecting it in memory
> ---
>
> Key: FLINK-33515
> URL: https://issues.apache.org/jira/browse/FLINK-33515
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.19.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>
> PythonDriver now collects the python process output in a Stringbuilder 
> instead of streaming it. It can cause OOM when the python process is 
> generating huge amount of output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-6755][CLI] Support manual checkpoints triggering [flink]

2023-11-10 Thread via GitHub


Zakelly commented on code in PR #23679:
URL: https://github.com/apache/flink/pull/23679#discussion_r1389074605


##
docs/content.zh/docs/deployment/cli.md:
##
@@ -153,6 +153,35 @@ $ ./bin/flink savepoint \
 Triggering the savepoint disposal through the `savepoint` action does not only 
remove the data from 
 the storage but makes Flink clean up the savepoint-related metadata as well.
 
+### Creating a Checkpoint
+[Checkpoints]({{< ref "docs/ops/state/checkpoints" >}}) can also be manually 
created to save the
+current state. To get the difference between checkpoint and savepoint, please 
refer to
+[Checkpoints vs. Savepoints]({{< ref 
"docs/ops/state/checkpoints_vs_savepoints" >}}). All that's
+needed to trigger a checkpoint manually is the JobID:
+```bash
+$ ./bin/flink checkpoint \
+  $JOB_ID
+```
+```
+Triggering checkpoint for job 99c59fead08c613763944f533bf90c0f.
+Waiting for response...
+Checkpoint(CONFIGURED) 26 for job 99c59fead08c613763944f533bf90c0f completed.
+You can resume your program from this checkpoint with the run command.
+```
+If you want to trigger a full checkpoint while the job periodically triggering 
incremental checkpoints,
+please use the `--full` option.

Review Comment:
   Sure thing, here: https://issues.apache.org/jira/browse/FLINK-33498 . I will 
start working on this recently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33515) PythonDriver need to stream python process output to log instead of collecting it in memory

2023-11-10 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-33515:
-

 Summary: PythonDriver need to stream python process output to log 
instead of collecting it in memory
 Key: FLINK-33515
 URL: https://issues.apache.org/jira/browse/FLINK-33515
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.19.0
Reporter: Gabor Somogyi


PythonDriver now collects the python process output in a Stringbuilder instead 
of streaming it. It can cause OOM when the python process is generating huge 
amount of output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-6755][CLI] Support manual checkpoints triggering [flink]

2023-11-10 Thread via GitHub


Zakelly commented on PR #23679:
URL: https://github.com/apache/flink/pull/23679#issuecomment-1805307753

   > Can you squash the latest two fixup commits into the first two commits:
   > 
   > ```
   > [[FLINK-6755] Support manual checkpoints triggering from 
CLI](https://github.com/apache/flink/pull/23679/commits/c563fd15191137aafc548b0a19fc09a28833aff5)
   > 
   > @[Zakelly](https://github.com/Zakelly/flink/commits?author=Zakelly)
   > Zakelly committed 2 days ago
   > [[FLINK-6755] Docs about manual checkpoints triggering from 
CLI](https://github.com/apache/flink/pull/23679/commits/bd404c8dad2c78dc5106fde39e3f81a689b90c85)
   > ```
   > 
   > ? LGTM otherwise :)
   
   Already done. Many thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33401) Kafka connector has broken version

2023-11-10 Thread Yuxin Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784724#comment-17784724
 ] 

Yuxin Tan commented on FLINK-33401:
---

After merging the Flink 1.19 hotfix (fa1036c73e3bcd66b57d835c7859572ca4b2250d, 
Remove Kafka documentation for SQL/Table API, since this is now externalized), 
I conducted tests on the Flink 1.19 version, and it reflected the correct 
version.

I observed that the hotfix has also been backported to the Flink release-1.18. 
Once this fix is merged into the Kafka connector repository, the 1.18 
documentation will display the accurate version.

 !screenshot-1.png! 

> Kafka connector has broken version
> --
>
> Key: FLINK-33401
> URL: https://issues.apache.org/jira/browse/FLINK-33401
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Reporter: Pavel Khokhlov
>Priority: Major
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>
> Trying to run Flink 1.18 with Kafka Connector
> but official documentation has a bug  
> [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/kafka/]
> {noformat}
> 
> org.apache.flink
> flink-connector-kafka
> -1.18
> {noformat}
> Basically version *-1.18* doesn't exist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33488] Implement restore tests for Deduplicate node [flink]

2023-11-10 Thread via GitHub


dawidwys closed pull request #23686: [FLINK-33488] Implement restore tests for 
Deduplicate node
URL: https://github.com/apache/flink/pull/23686


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-33401) Kafka connector has broken version

2023-11-10 Thread Yuxin Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784724#comment-17784724
 ] 

Yuxin Tan edited comment on FLINK-33401 at 11/10/23 8:47 AM:
-

After merging the Flink 1.19 hotfix (fa1036c73e3bcd66b57d835c7859572ca4b2250d, 
Remove Kafka documentation for SQL/Table API, since this is now externalized), 
I conducted tests on the Flink 1.19 version, and it showes the correct version.

I noticed that the hotfix has also been backported to the Flink release-1.18. 
Once this fix is merged into the Kafka connector repository, then 1.18 
documentation will display the accurate version.

 !screenshot-1.png! 


was (Author: tanyuxin):
After merging the Flink 1.19 hotfix (fa1036c73e3bcd66b57d835c7859572ca4b2250d, 
Remove Kafka documentation for SQL/Table API, since this is now externalized), 
I conducted tests on the Flink 1.19 version, and it reflected the correct 
version.

I observed that the hotfix has also been backported to the Flink release-1.18. 
Once this fix is merged into the Kafka connector repository, the 1.18 
documentation will display the accurate version.

 !screenshot-1.png! 

> Kafka connector has broken version
> --
>
> Key: FLINK-33401
> URL: https://issues.apache.org/jira/browse/FLINK-33401
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Reporter: Pavel Khokhlov
>Priority: Major
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>
> Trying to run Flink 1.18 with Kafka Connector
> but official documentation has a bug  
> [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/kafka/]
> {noformat}
> 
> org.apache.flink
> flink-connector-kafka
> -1.18
> {noformat}
> Basically version *-1.18* doesn't exist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33401) Kafka connector has broken version

2023-11-10 Thread Yuxin Tan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784724#comment-17784724
 ] 

Yuxin Tan edited comment on FLINK-33401 at 11/10/23 8:47 AM:
-

After merging the Flink 1.19 hotfix (fa1036c73e3bcd66b57d835c7859572ca4b2250d, 
Remove Kafka documentation for SQL/Table API, since this is now externalized), 
I conducted tests on the Flink 1.19 version, and it shows the correct version.

I noticed that the hotfix has also been backported to the Flink release-1.18. 
Once this fix is merged into the Kafka connector repository, then 1.18 
documentation will display the accurate version.

 !screenshot-1.png! 


was (Author: tanyuxin):
After merging the Flink 1.19 hotfix (fa1036c73e3bcd66b57d835c7859572ca4b2250d, 
Remove Kafka documentation for SQL/Table API, since this is now externalized), 
I conducted tests on the Flink 1.19 version, and it showes the correct version.

I noticed that the hotfix has also been backported to the Flink release-1.18. 
Once this fix is merged into the Kafka connector repository, then 1.18 
documentation will display the accurate version.

 !screenshot-1.png! 

> Kafka connector has broken version
> --
>
> Key: FLINK-33401
> URL: https://issues.apache.org/jira/browse/FLINK-33401
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Reporter: Pavel Khokhlov
>Priority: Major
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>
> Trying to run Flink 1.18 with Kafka Connector
> but official documentation has a bug  
> [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/kafka/]
> {noformat}
> 
> org.apache.flink
> flink-connector-kafka
> -1.18
> {noformat}
> Basically version *-1.18* doesn't exist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33488) Implement restore tests for Deduplicate node

2023-11-10 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz closed FLINK-33488.

Resolution: Implemented

Implemented in 88953ba622f4d3b67b59f56e15e1983e7932b926

> Implement restore tests for Deduplicate node
> 
>
> Key: FLINK-33488
> URL: https://issues.apache.org/jira/browse/FLINK-33488
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Jim Hughes
>Assignee: Jim Hughes
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]

2023-11-10 Thread via GitHub


leoyy0316 commented on PR #23656:
URL: https://github.com/apache/flink/pull/23656#issuecomment-1805324411

   > > trino support this function, we need use this function, so can we 
suuport this function?
   > 
   > I don't think Trino is the primary engine we compare ourselves against. 
Does the equivalent of this function exist in other databases/tech like 
Postgres, MySQL, Hive or Spark?
   
   I have a question, why we always follow hive and spark? if we thinks is a 
feature need to support, we can implement it firstly. i think having rich 
functions is better 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-33059) Support transparent compression for file-connector for all file input formats

2023-11-10 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-33059.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

master: 51252638fcb855a82da9983b3dfaa3b89754523e

> Support transparent compression for file-connector for all file input formats
> -
>
> Key: FLINK-33059
> URL: https://issues.apache.org/jira/browse/FLINK-33059
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / FileSystem
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Some FileInputFormats don't use FileInputFormat#createSplits (that would 
> detect that the file is non-splittable and deal with reading boundaries 
> correctly), they all create split manually from FileSourceSplit. If input 
> files are compressed, split length is determined by the compressed file 
> length leading to [this|https://issues.apache.org/jira/browse/FLINK-30314] 
> bug. We should force reading the whole file split (like it is done for binary 
> input formats) on compressed files. Parallelism is still done at the file 
> level (as now)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]

2023-11-10 Thread via GitHub


MartijnVisser commented on PR #23656:
URL: https://github.com/apache/flink/pull/23656#issuecomment-1805337202

   > I have a question, why we always follow hive and spark? if we thinks is a 
feature need to support, we can implement it firstly. i think having rich 
functions is better
   
   Because we want to make sure that we have a consistent set of functions 
available, with function names that are consistent. If we pick random function 
names from different tools, that because more problematic. It might be that 
Calcite or any of the other tech solves the same problem but with a different 
function and/or different function signature. That's why we always try to look 
at what others are also doing. Then again, this discussion should have happened 
before a PR was opened. Either in the Jira, or on the Dev mailing list. See 
https://flink.apache.org/how-to-contribute/contribute-code/#1-create-jira-ticket-and-reach-consensus


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] Fix the checkstyle after supporting jdk21 [flink-kubernetes-operator]

2023-11-10 Thread via GitHub


gyfora merged PR #704:
URL: https://github.com/apache/flink-kubernetes-operator/pull/704


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30371] Fix the problem of JdbcOutputFormat database connection leak [flink-connector-jdbc]

2023-11-10 Thread via GitHub


KevDi commented on PR #75:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/75#issuecomment-1805360349

   @eskabetxe @MartijnVisser  i looked into this error but i was unable to 
resolve it. I removed the mentioned import with explicit imports but then i get 
an error telling me that some of the static imports are imported before others. 
If i used the `Sort Imports´ on Intellij it removed the imports with the `.*` 
version. Maybe i need some support with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33513) Metastore delegation-token can be cached?

2023-11-10 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784739#comment-17784739
 ] 

Gabor Somogyi commented on FLINK-33513:
---

If that hurts the solution is not caching but adding a token provider for 
metastore like HiveServer2DelegationTokenProvider.

> Metastore delegation-token can be cached?
> -
>
> Key: FLINK-33513
> URL: https://issues.apache.org/jira/browse/FLINK-33513
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: katty he
>Priority: Major
>
> Now, every time, getDelegationToken wil be called when asking for metastore, 
> how about build a cache, we cache the token for the first time, then we can 
> just get token from cache?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-30702] Add Elasticsearch dialect [flink-connector-jdbc]

2023-11-10 Thread via GitHub


grzegorz8 commented on PR #67:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/67#issuecomment-1805376630

   @eskabetxe Could you review the PR, please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33207] Return empty split when the hbase table is empty [flink-connector-hbase]

2023-11-10 Thread via GitHub


ferenc-csaky commented on PR #23:
URL: 
https://github.com/apache/flink-connector-hbase/pull/23#issuecomment-1805419097

   cc @MartijnVisser 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Flink 31966] Flink Kubernetes operator lacks TLS support [flink-kubernetes-operator]

2023-11-10 Thread via GitHub


gaborgsomogyi commented on PR #689:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/689#issuecomment-1805421386

   What you're suggesting sounds better. I think we need 2 additional 
parameters:
   * `...trustore`
   * `...trustore-password`
   
   Not yet understand why keystore is needed, if you can elaborate we can be 
more on the same page.
   In my understanding trustrore is used in client scenarios and keystore is 
used in server scenarios.
   Since I'm not aware where the operator is behaving as server I don't see the 
need to add any keystore configs.
   If you know stuff like that please point on to discuss.
   
   The mount part is not yet clear why we would like to add any config for 
that. Until now it's designed to add
   any custom mount in the helm chart which is still available. I would use 
that.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33516) Create dedicated PyFlink channel

2023-11-10 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-33516:
--

 Summary: Create dedicated PyFlink channel
 Key: FLINK-33516
 URL: https://issues.apache.org/jira/browse/FLINK-33516
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Martijn Visser
Assignee: Martijn Visser


See https://lists.apache.org/thread/ynb5drhqqbd84w4o4337qv47100cp67h

1. Create new Slack channel
2. Update website 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [hotfix] fix_kafka_sasl_ssl_example in the docs [flink-connector-kafka]

2023-11-10 Thread via GitHub


531651225 opened a new pull request, #67:
URL: https://github.com/apache/flink-connector-kafka/pull/67

   - Removing unnecessary transition symbols'/'in  the document 
   
   `  'properties.sasl.jaas.config' = 
'org.apache.kafka.common.security.scram.ScramLoginModule required 
username=\"username\" password=\"password\";'`
   
   
![图片](https://github.com/apache/flink-connector-kafka/assets/33744252/b7cc3f6c-6e1e-476f-af5c-089061e7b6f7)
   
   
   - using transition characters can cause "Value not specified for key 
'username' in JAAS config" error,The examples in the document can mislead users
   
![图片](https://github.com/apache/flink-connector-kafka/assets/33744252/dc63ed71-1742-4451-890a-e60eb9b656b1)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix] fix_kafka_sasl_ssl_example in the docs [flink-connector-kafka]

2023-11-10 Thread via GitHub


boring-cyborg[bot] commented on PR #67:
URL: 
https://github.com/apache/flink-connector-kafka/pull/67#issuecomment-1805424464

   Thanks for opening this pull request! Please check out our contributing 
guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33517) Implement restore tests for Value node

2023-11-10 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-33517:
-

 Summary: Implement restore tests for Value node
 Key: FLINK-33517
 URL: https://issues.apache.org/jira/browse/FLINK-33517
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Jacky Lau
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33518) Implement restore tests for WatermarkAssigner node

2023-11-10 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-33518:
-

 Summary: Implement restore tests for WatermarkAssigner node
 Key: FLINK-33518
 URL: https://issues.apache.org/jira/browse/FLINK-33518
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Jacky Lau
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [hotfix][build] Bump lowest supported flink version to 1.17.0 [flink-connector-gcp-pubsub]

2023-11-10 Thread via GitHub


leonardBang opened a new pull request, #20:
URL: https://github.com/apache/flink-connector-gcp-pubsub/pull/20

   Bump lowest supported flink version to 1.17.0 as the discussion thread[1]
   
   [1] https://lists.apache.org/thread/2qqd59nng50k4po1dlyoxkwjmvb2fmb4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix][build] Bump lowest supported flink version to 1.17.0 [flink-connector-gcp-pubsub]

2023-11-10 Thread via GitHub


boring-cyborg[bot] commented on PR #20:
URL: 
https://github.com/apache/flink-connector-gcp-pubsub/pull/20#issuecomment-1805427501

   Thanks for opening this pull request! Please check out our contributing 
guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33484) Flink Kafka Connector Offset Lag Issue with Transactional Data and Read Committed Isolation Level

2023-11-10 Thread Darcy Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784752#comment-17784752
 ] 

Darcy Lin commented on FLINK-33484:
---

[~martijnvisser] The issue remains the same with 
flink-connector-kafka:3.0.1-1.17. I checked the implementation of 
{{{}emitRecord{}}}, and it is still the same.

> Flink Kafka Connector Offset Lag Issue with Transactional Data and Read 
> Committed Isolation Level
> -
>
> Key: FLINK-33484
> URL: https://issues.apache.org/jira/browse/FLINK-33484
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.17.1
> Environment: Flink 1.17.1
> kafka 2.5.1
>Reporter: Darcy Lin
>Priority: Major
>
> We have encountered an issue with the Flink Kafka connector when consuming 
> transactional data from Kafka with the {{isolation.level}} set to 
> {{read_committed}} ({{{}setProperty("isolation.level", 
> "read_committed"){}}}). The problem is that even when all the data from a 
> topic is consumed, the offset lag is not 0, but 1. However, when using the 
> Kafka Java client to consume the same data, this issue does not occur.
> We suspect that this issue arises due to the way Flink Kafka connector 
> calculates the offset. The problem seems to be in the 
> {{KafkaRecordEmitter.java}} file, specifically in the {{emitRecord}} method. 
> When saving the offset, the method calls 
> {{{}splitState.setCurrentOffset(consumerRecord.offset() + 1);{}}}. While this 
> statement works correctly in a regular Kafka scenario, it might not be 
> accurate when the {{read_committed}} mode is used. We believe that it should 
> be {{{}splitState.setCurrentOffset(consumerRecord.offset() + 2);{}}}, as 
> transactional data in Kafka occupies an additional offset to store the 
> transaction marker.
> We request the Flink team to investigate this issue and provide us with 
> guidance on how to resolve it.
> Thank you for your attention and support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33207] Return empty split when the hbase table is empty [flink-connector-hbase]

2023-11-10 Thread via GitHub


MartijnVisser merged PR #23:
URL: https://github.com/apache/flink-connector-hbase/pull/23


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-32807) when i use emitUpdateWithRetract of udtagg,bug error

2023-11-10 Thread yong yang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yong yang closed FLINK-32807.
-
Resolution: Duplicate

> when i use emitUpdateWithRetract of udtagg,bug error
> 
>
> Key: FLINK-32807
> URL: https://issues.apache.org/jira/browse/FLINK-32807
> Project: Flink
>  Issue Type: Bug
>  Components: API / Scala, Table SQL / Planner
>Affects Versions: 1.17.1
> Environment: http://maven.apache.org/POM/4.0.0"; 
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
> http://maven.apache.org/xsd/maven-4.0.0.xsd";>
> 4.0.0
> org.example
> FlinkLocalDemo
> 1.0-SNAPSHOT
> jar
> FlinkLocalDemo
> http://maven.apache.org
> 
> UTF-8
> 1.17.1
> 2.12
> 2.12.8
> 
> 
> 
> 
> com.chuusai
> shapeless_${scala.binary.version}
> 2.3.10
> 
> 
> 
> joda-time
> joda-time
> 2.12.5
> 
> 
> org.apache.flink
> flink-avro
> ${flink.version}
> 
> 
> org.apache.flink
> flink-runtime-web
> ${flink.version}
> 
> 
> 
> com.alibaba.fastjson2
> fastjson2
> 2.0.33
> 
> 
> 
> com.alibaba
> fastjson
> 1.2.83
> 
> 
> 
> junit
> junit
> 3.8.1
> test
> 
> 
> 
> org.apache.flink
> flink-table-common
> ${flink.version}
> 
> 
> 
> org.apache.flink
> flink-connector-kafka
> ${flink.version}
> provided
> 
> 
> org.apache.flink
> flink-json
> ${flink.version}
> 
> 
> org.apache.flink
> flink-scala_${scala.binary.version}
> ${flink.version}
> provided
> 
> 
> org.apache.flink
> flink-streaming-scala_${scala.binary.version}
> ${flink.version}
> 
> 
> org.apache.flink
> flink-csv
> ${flink.version}
> 
> 
> 
> org.apache.flink
> flink-table-api-java-bridge
> ${flink.version}
> 
> 
> 
> 
> org.apache.flink
> flink-table-api-scala-bridge_${scala.binary.version}
> ${flink.version}
> 
> 
> 
> org.apache.flink
> flink-table-planner-loader
> ${flink.version}
> 
> 
> 
> 
> org.apache.flink
> flink-table-runtime
> ${flink.version}
> provided
> 
> 
> 
> org.apache.flink
> flink-connector-files
> ${flink.version}
> 
> 
> 
> 
> 
> 
> 
> 
> 
> org.apache.flink
> flink-clients
> ${flink.version}
> 
> 
> org.apache.flink
> flink-connector-jdbc
> 3.1.0-1.17
> provided
> 
> 
> mysql
> mysql-connector-java
> 8.0.11
> 
> 
> 
> 
> 
> 
> org.apache.maven.plugins
> maven-shade-plugin
> 2.4.3
> 
> 
> package
> 
> shade
> 
> 
> 
> 
> *:*
> 
> META-INF/*.SF
> META-INF/*.DSA
> META-INF/*.RSA
> 
> 
> 
> 
> 
> 
> 
> 
> org.scala-tools
> maven-scala-plugin
> 2.15.2
> 
> 
> 
> compile
> testCompile
> 
> 
> 
> 
> 
> net.alchim31.maven
> scala-maven-plugin
> 3.2.2
> 
> 
> scala-compile-first
> process-resources
> 
> add-source
> compile
> 
> 
> 
> 
> ${scala.version}
> 
> 
> 
> org.apache.maven.plugins
> maven-assembly-plugin
> 2.5.5
> 
> 
> 
> 
> 
> 
> 
> 
> jar-with-dependencies
> 
> 
> 
> 
> org.apache.maven.plugins
> maven-compiler-plugin
> 3.1
> 
> 11
> 11
> 
> 
> 
> 
> 
>Reporter: yong yang
>Priority: Major
> Attachments: Top2WithRetract.scala, UdtaggDemo3.scala
>
>
> 参考: 
> [https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/functions/udfs/#retraction-example]
> 我的代码:
> [^Top2WithRetract.scala]
>  
> bug show error:
> {code:java}
> //代码占位符
> /Users/thomas990p/Library/Java/JavaVirtualMachines/corretto-11.0.20/Contents/Home/bin/java
>  -javaagent:/Users/thomas990p/Library/Application 
> Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/231.9161.38/IntelliJ 
> IDEA.app/Contents/lib/idea_rt.jar=56941:/Users/thomas990p/Library/Application 
> Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/231.9161.38/IntelliJ 
> IDEA.app/Contents/bin -Dfile.encoding=UTF-8 -classpath 
> /Users/thomas990p/IdeaProjects/FlinkLocalDemo/target/classes:/Users/thomas990p/.m2/repository/com/chuusai/shapeless_2.12/2.3.10/shapeless_2.12-2.3.10.jar:/Users/thomas990p/.m2/repository/org/scala-lang/scala-library/2.12.15/scala-library-2.12.15.jar:/Users/thomas990p/.m2/repository/joda-time/joda-time/2.12.5/joda-time-2.12.5.jar:/Users/thomas990p/.m2/repository/org/apache/flink/flink-avro/1.17.1/flink-avro-1.17.1.jar:/Users/thomas990p/.m2/repository/org/apache/avro/avro/1.11.1/avro-1.11.1.jar:/Users/thomas990p/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.12.7/jackson-core-2.12.7.jar:/Users/thomas990p/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.12.7/jackson-databind-2.12.7.jar:/Users/thomas990p/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.12.7/jackson-annotations-2.12.7.jar:/Users/thomas990p/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar:/Users/thomas990p/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/Users/thomas990p/.m2/repository/org/apache/flink/flink-runtime-web/1.17.1/flink-runtime-web-1.17.1.jar:/Users/thomas990p/.m2/repository/org/apache/flink/flink-runtime/1.17.1/flink-runtime-1.17.1.

[jira] [Closed] (FLINK-33207) Scan hbase table will throw error when table is empty

2023-11-10 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser closed FLINK-33207.
--
Fix Version/s: hbase-3.0.1
   Resolution: Fixed

Fixed in apache/flink-connector-hbase 

main: 4b33c32a7f40b7e4fb469facf436017f2cdd8485
v3.0: 870ff0749d617ba03ecbd48632107793112e594e

> Scan hbase table will throw error when table is empty
> -
>
> Key: FLINK-33207
> URL: https://issues.apache.org/jira/browse/FLINK-33207
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / HBase
>Affects Versions: hbase-3.0.1
>Reporter: tanjialiang
>Assignee: tanjialiang
>Priority: Blocker
>  Labels: pull-request-available, stale-blocker
> Fix For: hbase-3.0.1
>
>
> When i scan the empty hbase table, it will throw an error when 
> createInputSplits, we should return empty split instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33207) Scan hbase table will throw error when table is empty

2023-11-10 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser reassigned FLINK-33207:
--

Assignee: tanjialiang

> Scan hbase table will throw error when table is empty
> -
>
> Key: FLINK-33207
> URL: https://issues.apache.org/jira/browse/FLINK-33207
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / HBase
>Affects Versions: hbase-3.0.1
>Reporter: tanjialiang
>Assignee: tanjialiang
>Priority: Blocker
>  Labels: pull-request-available, stale-blocker
>
> When i scan the empty hbase table, it will throw an error when 
> createInputSplits, we should return empty split instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-30314) Unable to read all records from compressed delimited file input format

2023-11-10 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved FLINK-30314.
--
Fix Version/s: 1.19.0
   Resolution: Fixed

Fixed by [FLINK-33059|https://issues.apache.org/jira/browse/FLINK-33059]

> Unable to read all records from compressed delimited file input format
> --
>
> Key: FLINK-30314
> URL: https://issues.apache.org/jira/browse/FLINK-30314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.16.0, 1.15.2, 1.17.1
>Reporter: Dmitry Yaraev
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: input.json, input.json.gz, input.json.zip
>
>
> I am reading gzipped JSON line-delimited files in the batch mode using 
> [FileSystem 
> Connector|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/].
>  For reading the files a new table is created with the following 
> configuration:
> {code:sql}
> CREATE TEMPORARY TABLE `my_database`.`my_table` (
>   `my_field1` BIGINT,
>   `my_field2` INT,
>   `my_field3` VARCHAR(2147483647)
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'path-to-input-dir',
>   'format' = 'json',
>   'json.ignore-parse-errors' = 'false',
>   'json.fail-on-missing-field' = 'true'
> ) {code}
> In the input directory I have two files: input-0.json.gz and 
> input-1.json.gz. As it comes from the filenames, the files are compressed 
> with GZIP. Each of the files contains 10 records. The issue is that only 2 
> records from each file are read (4 in total). If decompressed versions of the 
> same data files are used, all 20 records are read.
> As far as I understand, that problem may be related to the fact that split 
> length, which is used when the files are read, is in fact the length of a 
> compressed file. So files are closed before all records are read from them 
> because read position of the decompressed file stream exceeds split length.
> Probably, it makes sense to add a flag to {{{}FSDataInputStream{}}}, so we 
> could identify if the file compressed or not. The flag can be set to true in 
> {{InputStreamFSInputWrapper}} because it is used for wrapping compressed file 
> streams. With such a flag it could be possible to differentiate 
> non-splittable compressed files and only rely on the end of the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33514) FlinkScalaKryoInstantiator class not found in KryoSerializer

2023-11-10 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784758#comment-17784758
 ] 

Martijn Visser commented on FLINK-33514:


This sounds like you're missing a dependency. What's in your POM?

> FlinkScalaKryoInstantiator class not found in KryoSerializer
> 
>
> Key: FLINK-33514
> URL: https://issues.apache.org/jira/browse/FLINK-33514
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.18.0
>Reporter: Jake.zhang
>Priority: Minor
>
> {code:java}
> 16:03:13,402 INFO  
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer [] - Kryo 
> serializer scala extensions are not available.
> java.lang.ClassNotFoundException: 
> org.apache.flink.runtime.types.FlinkScalaKryoInstantiator
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:387) 
> ~[?:1.8.0_341]
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_341]
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) 
> ~[?:1.8.0_341]
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_341]
>     at java.lang.Class.forName0(Native Method) ~[?:1.8.0_341]
>     at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_341]
>     at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.getKryoInstance(KryoSerializer.java:487)
>  ~[flink-core-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.checkKryoInitialized(KryoSerializer.java:522)
>  ~[flink-core-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:394)
>  ~[flink-core-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:412)
>  ~[flink-core-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:190)
>  ~[flink-streaming-java-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:43)
>  ~[flink-streaming-java-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:53)
>  ~[flink-runtime-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.runtime.io.network.api.serialization.NonSpanningWrapper.readInto(NonSpanningWrapper.java:337)
>  ~[flink-runtime-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.readNonSpanningRecord(SpillingAdaptiveSpanningRecordDeserializer.java:128)
>  ~[flink-runtime-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.readNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:103)
>  ~[flink-runtime-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:93)
>  ~[flink-runtime-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:100)
>  ~[flink-streaming-java-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>  ~[flink-streaming-java-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.streaming.runtime.io.StreamMultipleInputProcessor.processInput(StreamMultipleInputProcessor.java:85)
>  ~[flink-streaming-java-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
>  ~[flink-streaming-java-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
>  ~[flink-streaming-java-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
>  ~[flink-streaming-java-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
>  ~[flink-streaming-java-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
>  [flink-runtime-1.18.0.jar:1.18.0]
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932) 
> [flink-runtime-1.18.0.jar:1.18.0]
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746) 
> [flink-runtime-1.18.0.jar:1.18.0]
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
> [flink-runtime-1.18.0.jar:1.18.0]
>     at java.lang.Thread.ru

Re: [PR] [hotfix][build] Bump lowest supported flink version to 1.17.0 [flink-connector-gcp-pubsub]

2023-11-10 Thread via GitHub


MartijnVisser commented on PR #20:
URL: 
https://github.com/apache/flink-connector-gcp-pubsub/pull/20#issuecomment-1805460516

   @leonardBang If we want to bump the minimum version to 1.17.0, you'll also 
need to address the dependency convergence.
   
   However, looking at the code, we actually should be OK with releasing a 
newer version of the connector that includes support for 1.18, 1.17 and we 
could also still support 1.16 since the API is still compatible. It's just that 
we need to have a minimum of 2 Flink versions to be supported, but we can add 
an additional one if we want to. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix][docs] refer to sql_connector_download_table shortcode in zh docs [flink-connector-kafka]

2023-11-10 Thread via GitHub


MartijnVisser commented on PR #65:
URL: 
https://github.com/apache/flink-connector-kafka/pull/65#issuecomment-1805462974

   Closed, duplicate of https://github.com/apache/flink-connector-kafka/pull/64


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix][docs] refer to sql_connector_download_table shortcode in zh docs [flink-connector-kafka]

2023-11-10 Thread via GitHub


MartijnVisser closed pull request #65: [hotfix][docs] refer to 
sql_connector_download_table shortcode in zh docs
URL: https://github.com/apache/flink-connector-kafka/pull/65


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33510) Update plugin for SBOM generation to 2.7.10

2023-11-10 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser updated FLINK-33510:
---
Component/s: Build System

> Update plugin for SBOM generation to 2.7.10
> ---
>
> Key: FLINK-33510
> URL: https://issues.apache.org/jira/browse/FLINK-33510
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Vinod Anandan
>Priority: Major
>
> Update the CycloneDX Maven plugin for SBOM generation to 2.7.10



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33510) Update plugin for SBOM generation to 2.7.10

2023-11-10 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser updated FLINK-33510:
---
Issue Type: Technical Debt  (was: Improvement)

> Update plugin for SBOM generation to 2.7.10
> ---
>
> Key: FLINK-33510
> URL: https://issues.apache.org/jira/browse/FLINK-33510
> Project: Flink
>  Issue Type: Technical Debt
>Reporter: Vinod Anandan
>Priority: Major
>
> Update the CycloneDX Maven plugin for SBOM generation to 2.7.10



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33507) JsonToRowDataConverters can't parse zero timestamp '0000-00-00 00:00:00'

2023-11-10 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784767#comment-17784767
 ] 

Martijn Visser commented on FLINK-33507:


[~lucas_jin] There is already the option "json.ignore-parse-errors" for the 
JSON format, then it will return NULLs. So I would think this is already 
supported

> JsonToRowDataConverters can't parse zero timestamp  '-00-00 00:00:00'
> -
>
> Key: FLINK-33507
> URL: https://issues.apache.org/jira/browse/FLINK-33507
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: jinzhuguang
>Priority: Major
>  Labels: CDC, JsonFormatter, Kafka, MySQL
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When I use Flink CDC to synchronize data from MySQL, Kafka is used to store 
> data in JSON format. But when I read data from Kafka, I found that the 
> Timestamp type data "-00-00 00:00:00" in MySQL could not be parsed by 
> Flink, and the error was reported as follows:
> Caused by: 
> org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: 
> Fail to deserialize at field: data.
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonRowDataDeserializationSchema.convertToRowData(JsonRowDataDeserializationSchema.java:131)
>     at 
> org.apache.flink.formats.json.canal.CanalJsonDeserializationSchema.deserialize(CanalJsonDeserializationSchema.java:234)
>     ... 17 more
> Caused by: 
> org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: 
> Fail to deserialize at field: update_time.
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createArrayConverter$94141d67$1(JsonToRowDataConverters.java:304)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
>     ... 20 more
> Caused by: java.time.format.DateTimeParseException: Text '-00-00 
> 00:00:00' could not be parsed: Invalid value for MonthOfYear (valid values 1 
> - 12): 0
>     at 
> java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1920)
>     at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1781)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertToTimestamp(JsonToRowDataConverters.java:224)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
>     ... 25 more
> Caused by: java.time.DateTimeException: Invalid value for MonthOfYear (valid 
> values 1 - 12): 0
>     at java.time.temporal.ValueRange.checkValidIntValue(ValueRange.java:330)
>     at java.time.temporal.ChronoField.checkValidIntValue(ChronoField.java:722)
>     at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:550)
>     at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:123)
>     at 
> java.time.chrono.AbstractChronology.resolveDate(AbstractChronology.java:472)
>     at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:492)
>     at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:123)
>     at java.time.format.Parsed.resolveDateFields(Parsed.java:351)
>     at java.time.format.Parsed.resolveFields(Parsed.java:257)
>     at java.time.format.Parsed.resolve(Parsed.java:244)
>     at 
> java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331)
>     at 
> java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1955)
>     at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777)
>     ... 29 more
> Usually MySQL allows

Re: [PR] [hotfix][build] Bump lowest supported flink version to 1.17.0 [flink-connector-gcp-pubsub]

2023-11-10 Thread via GitHub


leonardBang commented on PR #20:
URL: 
https://github.com/apache/flink-connector-gcp-pubsub/pull/20#issuecomment-1805552181

   > @leonardBang If we want to bump the minimum version to 1.17.0, you'll also 
need to address the dependency convergence.
   > 
   > However, looking at the code, we actually should be OK with releasing a 
newer version of the connector that includes support for 1.18, 1.17 and we 
could also still support 1.16 since the API is still compatible. It's just that 
we need to have a minimum of 2 Flink versions to be supported, but we can add 
an additional one if we want to.
   
   Let's try to address dependency convergence first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-25857] Add committer metrics to track the status of committables [flink]

2023-11-10 Thread via GitHub


pvary commented on PR #23555:
URL: https://github.com/apache/flink/pull/23555#issuecomment-1805565611

   @mbalassi: Rebased, and got a green run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Fix a speling error in upgrade.md [flink-kubernetes-operator]

2023-11-10 Thread via GitHub


sdahlbac opened a new pull request, #705:
URL: https://github.com/apache/flink-kubernetes-operator/pull/705

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request adds a new feature to periodically create 
and maintain savepoints through the `FlinkDeployment` custom resource.)*
   
   Fix a spelling error in upgrade documentation.
   
   ## Brief change log
   
   *(for example:)*
 - *Periodic savepoint trigger is introduced to the custom resource*
 - *The operator checks on reconciliation whether the required time has 
passed*
 - *The JobManager's dispose savepoint API is used to clean up obsolete 
savepoints*
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changes to the `CustomResourceDescriptors`: 
(yes / no)
 - Core observer or reconciler logic that is regularly executed: (yes / no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Flink 31966] Flink Kubernetes operator lacks TLS support [flink-kubernetes-operator]

2023-11-10 Thread via GitHub


tagarr commented on PR #689:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/689#issuecomment-1805607794

   You're right truststore would normally only be needed, it depends on if we 
want to cater for mTLS. If we did we would need to include keystore info. 
   Quick question if we wanted to cater for mTLS would you cater for 
truststore/keystore potentially being in different secrets or document they 
should be in the same secret ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33519) standalone mode could not create keytab secret

2023-11-10 Thread chaoran.su (Jira)
chaoran.su created FLINK-33519:
--

 Summary: standalone mode could not create keytab secret
 Key: FLINK-33519
 URL: https://issues.apache.org/jira/browse/FLINK-33519
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.6.1, kubernetes-operator-1.6.0, 
kubernetes-operator-1.5.0, kubernetes-operator-1.3.1, 
kubernetes-operator-1.4.0, kubernetes-operator-1.3.0
Reporter: chaoran.su


when standalone build cluster, and configuration with security.kerberos.login.* 
configurations.

flink-kubernetes module will modify the path of security.kerberos.login.keytab 
configuration to /opt/kerberos/kerberos-keytab, and then create secret for job 
manager. the secret data from operator pod's keytab file.

after job manager created, creating task manager process will find the keytab 
file from the location from security.kerberos.login.keytab configuration, then 
it throws a exception says keytabs file not find. 

the bug is because of the configuration modified once, and reused it when 
create tm. Native mode didn't exist this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33251) SQL Client query execution aborts after a few seconds: ConnectTimeoutException

2023-11-10 Thread Jorick Caberio (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784905#comment-17784905
 ] 

Jorick Caberio commented on FLINK-33251:


No

> SQL Client query execution aborts after a few seconds: ConnectTimeoutException
> --
>
> Key: FLINK-33251
> URL: https://issues.apache.org/jira/browse/FLINK-33251
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.18.0, 1.17.1
> Environment: Macbook Pro 
> Apple M1 Max
>  
> {code:java}
> $ uname -a
> Darwin asgard08 23.0.0 Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:43 PDT 
> 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6000 arm64
> {code}
> {code:bash}
> $ java --version
> openjdk 11.0.20.1 2023-08-24
> OpenJDK Runtime Environment Homebrew (build 11.0.20.1+0)
> OpenJDK 64-Bit Server VM Homebrew (build 11.0.20.1+0, mixed mode)
> $ mvn --version
> Apache Maven 3.9.5 (57804ffe001d7215b5e7bcb531cf83df38f93546)
> Maven home: /opt/homebrew/Cellar/maven/3.9.5/libexec
> Java version: 11.0.20.1, vendor: Homebrew, runtime: 
> /opt/homebrew/Cellar/openjdk@11/11.0.20.1/libexec/openjdk.jdk/Contents/Home
> Default locale: en_GB, platform encoding: UTF-8
> OS name: "mac os x", version: "14.0", arch: "aarch64", family: "mac"
> {code}
>Reporter: Robin Moffatt
>Priority: Major
> Attachments: log.zip
>
>
> If I run a streaming query from an unbounded connector from the SQL Client, 
> it bombs out after ~15 seconds.
> {code:java}
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
> connection timed out: localhost/127.0.0.1:52596
> {code}
> This *doesn't* happen on 1.16.2. It *does* happen on *1.17.1* and *1.18* that 
> I have just built locally (git repo hash `9b837727b6d`). 
> The corresponding task's status in the Web UI shows as `CANCELED`. 
> ---
> h2. To reproduce
> Launch local cluster and SQL client
> {code}
> ➜  flink-1.18-SNAPSHOT ./bin/start-cluster.sh 
> Starting cluster.
> Starting standalonesession daemon on host asgard08.
> Starting taskexecutor daemon on host asgard08.
> ➜  flink-1.18-SNAPSHOT ./bin/sql-client.sh
> […]
> Flink SQL>
> {code}
> Set streaming mode and result mode
> {code:sql}
> Flink SQL> SET 'execution.runtime-mode' = 'STREAMING';
> [INFO] Execute statement succeed.
> Flink SQL> SET 'sql-client.execution.result-mode' = 'changelog';
> [INFO] Execute statement succeed.
> {code}
> Define a table to read data from CSV files in a folder
> {code:sql}
> CREATE TABLE firewall (
>   event_time STRING,
>   source_ip  STRING,
>   dest_ipSTRING,
>   source_prt INT,
>   dest_prt   INT
> ) WITH (
>   'connector' = 'filesystem',
>   'path' = 'file:///tmp/firewall/',
>   'format' = 'csv',
>   'source.monitor-interval' = '1' -- unclear from the docs what the unit is 
> here
> );
> {code}
> Create a CSV file to read in
> {code:bash}
> $ mkdir /tmp/firewall
> $ cat > /tmp/firewall/data.csv < 2018-05-11 00:19:34,151.35.34.162,125.26.20.222,2014,68
> 2018-05-11 22:20:43,114.24.126.190,21.68.21.69,379,1619
> EOF
> {code}
> Run a streaming query 
> {code}
> SELECT * FROM firewall;
> {code}
> You will get results showing (and if you add another data file it will show 
> up) - but after ~30 seconds the query aborts and throws an error back to the 
> user at the SQL Client prompt
> {code}
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: 
> connection timed out: localhost/127.0.0.1:58470
> Flink SQL>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33365) Missing filter condition in execution plan containing lookup join with mysql jdbc connector

2023-11-10 Thread david radley (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784915#comment-17784915
 ] 

david radley commented on FLINK-33365:
--

an update on what I have found:

 

I have switched on DEBUG put out the rules that are being driven for my 
recreation. I see :

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize time_indicator cost 1 ms.

optimize result:

FlinkLogicalSink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, 
type])

+- FlinkLogicalCalc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 
ip0, type])

   +- FlinkLogicalJoin(condition=[=($0, $4)], joinType=[left])

      :- FlinkLogicalCalc(select=[ip, PROCTIME() AS proctime])

      :  +- FlinkLogicalTableSourceScan(table=[[paimon_catalog, default, a]], 
fields=[ip])

      +- FlinkLogicalSnapshot(period=[$cor0.proctime])

         +- FlinkLogicalCalc(select=[ip, CAST(0 AS INTEGER) AS type, CAST(ip AS 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

            +- FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, 
c, {*}filter=[=(type, 0)]]]{*}, fields=[ip, type])

 

Is changed in the next rule to  

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize physical cost 3 ms.

optimize result:

Sink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, type])

+- Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])

   +- LookupJoin(table=[mariadb_catalog.menagerie.c], joinType=[LeftOuterJoin], 
lookup=[ip=ip], select=[ip, proctime, ip, *CAST(0 AS INTEGER)* AS type, CAST(ip 
AS VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

      +- Calc(select=[ip, PROCTIME() AS proctime])

         +- TableSourceScan(table=[[paimon_catalog, default, a]], fields=[ip])

 

The *CAST(0 AS INTEGER)* is in the final Optimized Execution Plan we see in the 
explain.

 

I am not an expert at this, but it seems to me that either 2 things are 
happening:

1) This change to the graph is a valid optimization but it is not being 
actioned properly when executed, such that the CAST(0 AS INTEGER) is ignored ** 

or

2) the comments at the top of 
[CommonPhysicalLookupJoin.scala]([https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/common/CommonPhysicalLookupJoin.scala)]
 are correct and this filter should actually be in the lookup keys. The 
comments says

 
_* For a lookup join query:_
_*_
_*  SELECT T.id, T.content, D.age FROM T JOIN userTable FOR SYSTEM_TIME AS 
OF T.proctime AS D_
_* ON T.content = concat(D.name, '!') AND D.age = 11 AND T.id = D.id WHERE 
D.name LIKE 'Jack%'_
_* _
_*_
_* The LookupJoin physical node encapsulates the following RelNode tree:_
_*_
_*  Join (l.name = r.name) / \ RelNode Calc (concat(name, "!") as name, 
name LIKE 'Jack%') \|_
_* DimTable (lookup-keys: age=11, id=l.id) (age, id, name) _
_*_
_* The important member fields in LookupJoin:  allLookupKeys: [$0=11, 
$1=l.id] ($0 and $1 is_
_* the indexes of age and id in dim table) remainingCondition: 
l.name=r.name _
_*_
_* The workflow of lookup join:_
_*_
_* 1) lookup records dimension table using the lookup-keys  2) project & 
filter on the lookup-ed_
_* records  3) join left input record and lookup-ed records  4) only 
outputs the rows which_
_* match to the remainingCondition _
 
 
 

 

 

 

 

 

 

 

> Missing filter condition in execution plan containing lookup join with mysql 
> jdbc connector
> ---
>
> Key: FLINK-33365
> URL: https://issues.apache.org/jira/browse/FLINK-33365
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: 1.18.0, 1.17.1
> Environment: Flink 1.17.1 & Flink 1.18.0 with 
> flink-connector-jdbc-3.1.1-1.17.jar
>Reporter: macdoor615
>Assignee: david radley
>Priority: Critical
> Attachments: flink-connector-jdbc-3.0.0-1.16.png, 
> flink-connector-jdbc-3.1.1-1.17.png
>
>
> create table in flink with sql-client.sh
> {code:java}
> CREATE TABLE default_catalog.default_database.a (
>   ip string, 
>   proctime as proctime()
> ) 
> WITH (
>   'connector' = 'datagen'
> );{code}
> create table in mysql
> {code:java}
> create table b (
>   ip varchar(20), 
>   type int
> );  {code}
>  
> Flink 1.17.1/ 1.18.0 and *flink-connector-jdbc-3.1.1-1.17.jar*
> excute in sql-client.sh 
> {code:java}
> explain SELECT * FROM default_catalog.default_database.a left join 
> bnpmp_mysql_test.gem_tmp.b FOR SYSTEM_TIME AS OF a.proctime on b.type = 0 and 
> a.ip = b.ip; {code}
> get the execution plan
> {code:java}
> ...
> == Optimized Execution Plan ==
> Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 

[jira] [Comment Edited] (FLINK-33365) Missing filter condition in execution plan containing lookup join with mysql jdbc connector

2023-11-10 Thread david radley (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784915#comment-17784915
 ] 

david radley edited comment on FLINK-33365 at 11/10/23 12:29 PM:
-

an update on what I have found:

 

I have switched on DEBUG put out the rules that are being driven for my 
recreation. I see :

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize time_indicator cost 1 ms.

optimize result:

FlinkLogicalSink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, 
type])

+- FlinkLogicalCalc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 
ip0, type])

   +- FlinkLogicalJoin(condition=[=($0, $4)], joinType=[left])

      :- FlinkLogicalCalc(select=[ip, PROCTIME() AS proctime])

      :  +- FlinkLogicalTableSourceScan(table=[[paimon_catalog, default, a]], 
fields=[ip])

      +- FlinkLogicalSnapshot(period=[$cor0.proctime])

         +- FlinkLogicalCalc(select=[ip, CAST(0 AS INTEGER) AS type, CAST(ip AS 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

            +- FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, 
c, {*}filter=[=(type, 0)]]]{*}, fields=[ip, type])

 

Is changed in the next rule to  

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize physical cost 3 ms.

optimize result:

Sink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, type])

+- Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])

   +- LookupJoin(table=[mariadb_catalog.menagerie.c], joinType=[LeftOuterJoin], 
lookup=[ip=ip], select=[ip, proctime, ip, *CAST(0 AS INTEGER)* AS type, CAST(ip 
AS VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

      +- Calc(select=[ip, PROCTIME() AS proctime])

         +- TableSourceScan(table=[[paimon_catalog, default, a]], fields=[ip])

 

The *CAST(0 AS INTEGER)* is in the final Optimized Execution Plan we see in the 
explain.

 

I am not an expert at this, but it seems to me that either 2 things are 
happening:

1) This change to the graph is a valid optimization but it is not being 
actioned properly when executed, such that the CAST(0 AS INTEGER) is ignored.

or

2) the comments at the top of 
[CommonPhysicalLookupJoin.scala]([https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/common/CommonPhysicalLookupJoin.scala)]
 are correct and this filter should actually be in the lookup keys. The 
comments says

 
_* For a lookup join query:_
_*_
_*  SELECT T.id, T.content, D.age FROM T JOIN userTable FOR SYSTEM_TIME AS 
OF T.proctime AS D_
_* ON T.content = concat(D.name, '!') AND D.age = 11 AND T.id = D.id WHERE 
D.name LIKE 'Jack%'_
_* _
_*_
_* The LookupJoin physical node encapsulates the following RelNode tree:_
_*_
_*  Join (l.name = r.name) / \ RelNode Calc (concat(name, "!") as name, 
name LIKE 'Jack%') |_
_* DimTable (lookup-keys: age=11, id=l.id) (age, id, name) _
_*_
_* The important member fields in LookupJoin:  allLookupKeys: [$0=11, 
$1=l.id] ($0 and $1 is_
_* the indexes of age and id in dim table) remainingCondition: 
l.name=r.name _
_*_
_* The workflow of lookup join:_
_*_
_* 1) lookup records dimension table using the lookup-keys  2) project & 
filter on the lookup-ed_
_* records  3) join left input record and lookup-ed records  4) only 
outputs the rows which_
_* match to the remainingCondition _
 


was (Author: JIRAUSER300523):
an update on what I have found:

 

I have switched on DEBUG put out the rules that are being driven for my 
recreation. I see :

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize time_indicator cost 1 ms.

optimize result:

FlinkLogicalSink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, 
type])

+- FlinkLogicalCalc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 
ip0, type])

   +- FlinkLogicalJoin(condition=[=($0, $4)], joinType=[left])

      :- FlinkLogicalCalc(select=[ip, PROCTIME() AS proctime])

      :  +- FlinkLogicalTableSourceScan(table=[[paimon_catalog, default, a]], 
fields=[ip])

      +- FlinkLogicalSnapshot(period=[$cor0.proctime])

         +- FlinkLogicalCalc(select=[ip, CAST(0 AS INTEGER) AS type, CAST(ip AS 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

            +- FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, 
c, {*}filter=[=(type, 0)]]]{*}, fields=[ip, type])

 

Is changed in the next rule to  

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize physical cost 3 ms.

optimize result:

Sink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, type])

+- Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])

   +- LookupJoin(table=[mariadb_catalog.menagerie.c], joinType=[LeftOuterJoin], 
lookup=[ip=ip], select=[ip, proctime, 

[jira] [Comment Edited] (FLINK-33365) Missing filter condition in execution plan containing lookup join with mysql jdbc connector

2023-11-10 Thread david radley (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784924#comment-17784924
 ] 

david radley edited comment on FLINK-33365 at 11/10/23 12:46 PM:
-

Testing against MariaDB I see:

select  CAST(0 AS INT) as type  from c;

+--+
|type|

+--+
|    0|
|    0|

+--+

 

It looks like this change from a filter to a cast is not logically equivalent. 
It just forces the result set to have that value. 


was (Author: JIRAUSER300523):
Testing against MariaDB I see:

select  CAST(0 AS INT) as type  from c;

+--+

| type |

+--+

|    0 |

|    0 |

+--+

 

It looks like this change from a filter to a cast is not logically equivalent. 

> Missing filter condition in execution plan containing lookup join with mysql 
> jdbc connector
> ---
>
> Key: FLINK-33365
> URL: https://issues.apache.org/jira/browse/FLINK-33365
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: 1.18.0, 1.17.1
> Environment: Flink 1.17.1 & Flink 1.18.0 with 
> flink-connector-jdbc-3.1.1-1.17.jar
>Reporter: macdoor615
>Assignee: david radley
>Priority: Critical
> Attachments: flink-connector-jdbc-3.0.0-1.16.png, 
> flink-connector-jdbc-3.1.1-1.17.png
>
>
> create table in flink with sql-client.sh
> {code:java}
> CREATE TABLE default_catalog.default_database.a (
>   ip string, 
>   proctime as proctime()
> ) 
> WITH (
>   'connector' = 'datagen'
> );{code}
> create table in mysql
> {code:java}
> create table b (
>   ip varchar(20), 
>   type int
> );  {code}
>  
> Flink 1.17.1/ 1.18.0 and *flink-connector-jdbc-3.1.1-1.17.jar*
> excute in sql-client.sh 
> {code:java}
> explain SELECT * FROM default_catalog.default_database.a left join 
> bnpmp_mysql_test.gem_tmp.b FOR SYSTEM_TIME AS OF a.proctime on b.type = 0 and 
> a.ip = b.ip; {code}
> get the execution plan
> {code:java}
> ...
> == Optimized Execution Plan ==
> Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])
> +- LookupJoin(table=[bnpmp_mysql_test.gem_tmp.b], joinType=[LeftOuterJoin], 
> lookup=[ip=ip], select=[ip, proctime, ip, CAST(0 AS INTEGER) AS type, CAST(ip 
> AS VARCHAR(2147483647)) AS ip0])
>    +- Calc(select=[ip, PROCTIME() AS proctime])
>       +- TableSourceScan(table=[[default_catalog, default_database, a]], 
> fields=[ip]){code}
>  
> excute same sql in sql-client with Flink 1.17.1/ 1.18.0 and 
> *flink-connector-jdbc-3.0.0-1.16.jar*
> get the execution plan
> {code:java}
> ...
> == Optimized Execution Plan ==
> Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])
> +- LookupJoin(table=[bnpmp_mysql_test.gem_tmp.b], joinType=[LeftOuterJoin], 
> lookup=[type=0, ip=ip], where=[(type = 0)], select=[ip, proctime, ip, CAST(0 
> AS INTEGER) AS type, CAST(ip AS VARCHAR(2147483647)) AS ip0])
>    +- Calc(select=[ip, PROCTIME() AS proctime])
>       +- TableSourceScan(table=[[default_catalog, default_database, a]], 
> fields=[ip]) {code}
> with flink-connector-jdbc-3.1.1-1.17.jar,  the condition is 
> *lookup=[ip=ip]*
> with flink-connector-jdbc-3.0.0-1.16.jar ,  the condition is 
> *lookup=[type=0, ip=ip], where=[(type = 0)]*
>  
> In out real world production environment, this lead incorrect data output
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33365) Missing filter condition in execution plan containing lookup join with mysql jdbc connector

2023-11-10 Thread david radley (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784924#comment-17784924
 ] 

david radley commented on FLINK-33365:
--

Testing against MariaDB I see:

select  CAST(0 AS INT) as type  from c;

+--+

| type |

+--+

|    0 |

|    0 |

+--+

 

It looks like this change from a filter to a cast is not logically equivalent. 

> Missing filter condition in execution plan containing lookup join with mysql 
> jdbc connector
> ---
>
> Key: FLINK-33365
> URL: https://issues.apache.org/jira/browse/FLINK-33365
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: 1.18.0, 1.17.1
> Environment: Flink 1.17.1 & Flink 1.18.0 with 
> flink-connector-jdbc-3.1.1-1.17.jar
>Reporter: macdoor615
>Assignee: david radley
>Priority: Critical
> Attachments: flink-connector-jdbc-3.0.0-1.16.png, 
> flink-connector-jdbc-3.1.1-1.17.png
>
>
> create table in flink with sql-client.sh
> {code:java}
> CREATE TABLE default_catalog.default_database.a (
>   ip string, 
>   proctime as proctime()
> ) 
> WITH (
>   'connector' = 'datagen'
> );{code}
> create table in mysql
> {code:java}
> create table b (
>   ip varchar(20), 
>   type int
> );  {code}
>  
> Flink 1.17.1/ 1.18.0 and *flink-connector-jdbc-3.1.1-1.17.jar*
> excute in sql-client.sh 
> {code:java}
> explain SELECT * FROM default_catalog.default_database.a left join 
> bnpmp_mysql_test.gem_tmp.b FOR SYSTEM_TIME AS OF a.proctime on b.type = 0 and 
> a.ip = b.ip; {code}
> get the execution plan
> {code:java}
> ...
> == Optimized Execution Plan ==
> Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])
> +- LookupJoin(table=[bnpmp_mysql_test.gem_tmp.b], joinType=[LeftOuterJoin], 
> lookup=[ip=ip], select=[ip, proctime, ip, CAST(0 AS INTEGER) AS type, CAST(ip 
> AS VARCHAR(2147483647)) AS ip0])
>    +- Calc(select=[ip, PROCTIME() AS proctime])
>       +- TableSourceScan(table=[[default_catalog, default_database, a]], 
> fields=[ip]){code}
>  
> excute same sql in sql-client with Flink 1.17.1/ 1.18.0 and 
> *flink-connector-jdbc-3.0.0-1.16.jar*
> get the execution plan
> {code:java}
> ...
> == Optimized Execution Plan ==
> Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])
> +- LookupJoin(table=[bnpmp_mysql_test.gem_tmp.b], joinType=[LeftOuterJoin], 
> lookup=[type=0, ip=ip], where=[(type = 0)], select=[ip, proctime, ip, CAST(0 
> AS INTEGER) AS type, CAST(ip AS VARCHAR(2147483647)) AS ip0])
>    +- Calc(select=[ip, PROCTIME() AS proctime])
>       +- TableSourceScan(table=[[default_catalog, default_database, a]], 
> fields=[ip]) {code}
> with flink-connector-jdbc-3.1.1-1.17.jar,  the condition is 
> *lookup=[ip=ip]*
> with flink-connector-jdbc-3.0.0-1.16.jar ,  the condition is 
> *lookup=[type=0, ip=ip], where=[(type = 0)]*
>  
> In out real world production environment, this lead incorrect data output
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33365) Missing filter condition in execution plan containing lookup join with mysql jdbc connector

2023-11-10 Thread david radley (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784924#comment-17784924
 ] 

david radley edited comment on FLINK-33365 at 11/10/23 12:47 PM:
-

Testing against MariaDB I see:

select  CAST(0 AS INT) as type  from c;

+--+
|type|

+--+
|    0|
|    0|

+--+

 

It looks like this change from a filter to a cast is not logically equivalent. 
It just forces the result set to have that value but does not act as a filter.


was (Author: JIRAUSER300523):
Testing against MariaDB I see:

select  CAST(0 AS INT) as type  from c;

+--+
|type|

+--+
|    0|
|    0|

+--+

 

It looks like this change from a filter to a cast is not logically equivalent. 
It just forces the result set to have that value. 

> Missing filter condition in execution plan containing lookup join with mysql 
> jdbc connector
> ---
>
> Key: FLINK-33365
> URL: https://issues.apache.org/jira/browse/FLINK-33365
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: 1.18.0, 1.17.1
> Environment: Flink 1.17.1 & Flink 1.18.0 with 
> flink-connector-jdbc-3.1.1-1.17.jar
>Reporter: macdoor615
>Assignee: david radley
>Priority: Critical
> Attachments: flink-connector-jdbc-3.0.0-1.16.png, 
> flink-connector-jdbc-3.1.1-1.17.png
>
>
> create table in flink with sql-client.sh
> {code:java}
> CREATE TABLE default_catalog.default_database.a (
>   ip string, 
>   proctime as proctime()
> ) 
> WITH (
>   'connector' = 'datagen'
> );{code}
> create table in mysql
> {code:java}
> create table b (
>   ip varchar(20), 
>   type int
> );  {code}
>  
> Flink 1.17.1/ 1.18.0 and *flink-connector-jdbc-3.1.1-1.17.jar*
> excute in sql-client.sh 
> {code:java}
> explain SELECT * FROM default_catalog.default_database.a left join 
> bnpmp_mysql_test.gem_tmp.b FOR SYSTEM_TIME AS OF a.proctime on b.type = 0 and 
> a.ip = b.ip; {code}
> get the execution plan
> {code:java}
> ...
> == Optimized Execution Plan ==
> Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])
> +- LookupJoin(table=[bnpmp_mysql_test.gem_tmp.b], joinType=[LeftOuterJoin], 
> lookup=[ip=ip], select=[ip, proctime, ip, CAST(0 AS INTEGER) AS type, CAST(ip 
> AS VARCHAR(2147483647)) AS ip0])
>    +- Calc(select=[ip, PROCTIME() AS proctime])
>       +- TableSourceScan(table=[[default_catalog, default_database, a]], 
> fields=[ip]){code}
>  
> excute same sql in sql-client with Flink 1.17.1/ 1.18.0 and 
> *flink-connector-jdbc-3.0.0-1.16.jar*
> get the execution plan
> {code:java}
> ...
> == Optimized Execution Plan ==
> Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])
> +- LookupJoin(table=[bnpmp_mysql_test.gem_tmp.b], joinType=[LeftOuterJoin], 
> lookup=[type=0, ip=ip], where=[(type = 0)], select=[ip, proctime, ip, CAST(0 
> AS INTEGER) AS type, CAST(ip AS VARCHAR(2147483647)) AS ip0])
>    +- Calc(select=[ip, PROCTIME() AS proctime])
>       +- TableSourceScan(table=[[default_catalog, default_database, a]], 
> fields=[ip]) {code}
> with flink-connector-jdbc-3.1.1-1.17.jar,  the condition is 
> *lookup=[ip=ip]*
> with flink-connector-jdbc-3.0.0-1.16.jar ,  the condition is 
> *lookup=[type=0, ip=ip], where=[(type = 0)]*
>  
> In out real world production environment, this lead incorrect data output
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33507) JsonToRowDataConverters can't parse zero timestamp '0000-00-00 00:00:00'

2023-11-10 Thread jinzhuguang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784929#comment-17784929
 ] 

jinzhuguang commented on FLINK-33507:
-

Thank you for your reply. In my scenario, I just want to recognize zero date as 
null, but json.ignore-parse-errors will swallow all exceptions, which will 
cause unknown problems in actual production, so I hope Flink can provide a 
parameter similar to MySQL JDBC:
{code:java}
URL = "jdbc:mysql://:3306/?zeroDateTimeBehavior=convertToNull";{code}

> JsonToRowDataConverters can't parse zero timestamp  '-00-00 00:00:00'
> -
>
> Key: FLINK-33507
> URL: https://issues.apache.org/jira/browse/FLINK-33507
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: jinzhuguang
>Priority: Major
>  Labels: CDC, JsonFormatter, Kafka, MySQL
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When I use Flink CDC to synchronize data from MySQL, Kafka is used to store 
> data in JSON format. But when I read data from Kafka, I found that the 
> Timestamp type data "-00-00 00:00:00" in MySQL could not be parsed by 
> Flink, and the error was reported as follows:
> Caused by: 
> org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: 
> Fail to deserialize at field: data.
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonRowDataDeserializationSchema.convertToRowData(JsonRowDataDeserializationSchema.java:131)
>     at 
> org.apache.flink.formats.json.canal.CanalJsonDeserializationSchema.deserialize(CanalJsonDeserializationSchema.java:234)
>     ... 17 more
> Caused by: 
> org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: 
> Fail to deserialize at field: update_time.
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createArrayConverter$94141d67$1(JsonToRowDataConverters.java:304)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
>     ... 20 more
> Caused by: java.time.format.DateTimeParseException: Text '-00-00 
> 00:00:00' could not be parsed: Invalid value for MonthOfYear (valid values 1 
> - 12): 0
>     at 
> java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1920)
>     at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1781)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertToTimestamp(JsonToRowDataConverters.java:224)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
>     ... 25 more
> Caused by: java.time.DateTimeException: Invalid value for MonthOfYear (valid 
> values 1 - 12): 0
>     at java.time.temporal.ValueRange.checkValidIntValue(ValueRange.java:330)
>     at java.time.temporal.ChronoField.checkValidIntValue(ChronoField.java:722)
>     at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:550)
>     at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:123)
>     at 
> java.time.chrono.AbstractChronology.resolveDate(AbstractChronology.java:472)
>     at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:492)
>     at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:123)
>     at java.time.format.Parsed.resolveDateFields(Parsed.java:351)
>     at java.time.format.Parsed.resolveFields(Parsed.java:257)
>     at java.time.format.Parsed.resolve(Parsed.java:244)
>     at 
> java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331)
>     at 
> java.time

[jira] (FLINK-33507) JsonToRowDataConverters can't parse zero timestamp '0000-00-00 00:00:00'

2023-11-10 Thread jinzhuguang (Jira)


[ https://issues.apache.org/jira/browse/FLINK-33507 ]


jinzhuguang deleted comment on FLINK-33507:
-

was (Author: JIRAUSER302532):
Thank you for your reply. In my scenario, I just want to recognize zero date as 
null, but json.ignore-parse-errors will swallow all exceptions, which will 
cause unknown problems in actual production, so I hope Flink can provide a 
parameter similar to MySQL JDBC:
{code:java}
URL = "jdbc:mysql://:3306/?zeroDateTimeBehavior=convertToNull";{code}

> JsonToRowDataConverters can't parse zero timestamp  '-00-00 00:00:00'
> -
>
> Key: FLINK-33507
> URL: https://issues.apache.org/jira/browse/FLINK-33507
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: jinzhuguang
>Priority: Major
>  Labels: CDC, JsonFormatter, Kafka, MySQL
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When I use Flink CDC to synchronize data from MySQL, Kafka is used to store 
> data in JSON format. But when I read data from Kafka, I found that the 
> Timestamp type data "-00-00 00:00:00" in MySQL could not be parsed by 
> Flink, and the error was reported as follows:
> Caused by: 
> org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: 
> Fail to deserialize at field: data.
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonRowDataDeserializationSchema.convertToRowData(JsonRowDataDeserializationSchema.java:131)
>     at 
> org.apache.flink.formats.json.canal.CanalJsonDeserializationSchema.deserialize(CanalJsonDeserializationSchema.java:234)
>     ... 17 more
> Caused by: 
> org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: 
> Fail to deserialize at field: update_time.
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createArrayConverter$94141d67$1(JsonToRowDataConverters.java:304)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
>     ... 20 more
> Caused by: java.time.format.DateTimeParseException: Text '-00-00 
> 00:00:00' could not be parsed: Invalid value for MonthOfYear (valid values 1 
> - 12): 0
>     at 
> java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1920)
>     at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1781)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertToTimestamp(JsonToRowDataConverters.java:224)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
>     ... 25 more
> Caused by: java.time.DateTimeException: Invalid value for MonthOfYear (valid 
> values 1 - 12): 0
>     at java.time.temporal.ValueRange.checkValidIntValue(ValueRange.java:330)
>     at java.time.temporal.ChronoField.checkValidIntValue(ChronoField.java:722)
>     at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:550)
>     at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:123)
>     at 
> java.time.chrono.AbstractChronology.resolveDate(AbstractChronology.java:472)
>     at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:492)
>     at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:123)
>     at java.time.format.Parsed.resolveDateFields(Parsed.java:351)
>     at java.time.format.Parsed.resolveFields(Parsed.java:257)
>     at java.time.format.Parsed.resolve(Parsed.java:244)
>     at 
> java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331)
>     at 
> java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1955)
>

[jira] [Commented] (FLINK-33507) JsonToRowDataConverters can't parse zero timestamp '0000-00-00 00:00:00'

2023-11-10 Thread jinzhuguang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784930#comment-17784930
 ] 

jinzhuguang commented on FLINK-33507:
-

[~martijnvisser] Thank you for your reply. In my scenario, I just want to 
recognize zero date as null, but json.ignore-parse-errors will swallow all 
exceptions, which will cause unknown problems in actual production, so I I hope 
Flink can provide a parameter similar to MySQL JDBC:
{code:java}
URL = "jdbc:mysql://:3306/?zeroDateTimeBehavior=convertToNull";{code}

> JsonToRowDataConverters can't parse zero timestamp  '-00-00 00:00:00'
> -
>
> Key: FLINK-33507
> URL: https://issues.apache.org/jira/browse/FLINK-33507
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
> Environment: Flink 1.16.0
>Reporter: jinzhuguang
>Priority: Major
>  Labels: CDC, JsonFormatter, Kafka, MySQL
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When I use Flink CDC to synchronize data from MySQL, Kafka is used to store 
> data in JSON format. But when I read data from Kafka, I found that the 
> Timestamp type data "-00-00 00:00:00" in MySQL could not be parsed by 
> Flink, and the error was reported as follows:
> Caused by: 
> org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: 
> Fail to deserialize at field: data.
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonRowDataDeserializationSchema.convertToRowData(JsonRowDataDeserializationSchema.java:131)
>     at 
> org.apache.flink.formats.json.canal.CanalJsonDeserializationSchema.deserialize(CanalJsonDeserializationSchema.java:234)
>     ... 17 more
> Caused by: 
> org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: 
> Fail to deserialize at field: update_time.
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createArrayConverter$94141d67$1(JsonToRowDataConverters.java:304)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
>     ... 20 more
> Caused by: java.time.format.DateTimeParseException: Text '-00-00 
> 00:00:00' could not be parsed: Invalid value for MonthOfYear (valid values 1 
> - 12): 0
>     at 
> java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1920)
>     at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1781)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertToTimestamp(JsonToRowDataConverters.java:224)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
>     at 
> org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
>     ... 25 more
> Caused by: java.time.DateTimeException: Invalid value for MonthOfYear (valid 
> values 1 - 12): 0
>     at java.time.temporal.ValueRange.checkValidIntValue(ValueRange.java:330)
>     at java.time.temporal.ChronoField.checkValidIntValue(ChronoField.java:722)
>     at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:550)
>     at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:123)
>     at 
> java.time.chrono.AbstractChronology.resolveDate(AbstractChronology.java:472)
>     at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:492)
>     at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:123)
>     at java.time.format.Parsed.resolveDateFields(Parsed.java:351)
>     at java.time.format.Parsed.resolveFields(Parsed.java:257)
>     at java.time.format.Parsed.resolve(Parsed.java:244)
>     at 
> java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331)
> 

Re: [PR] [FLINK-33419] Port PROCTIME/ROWTIME functions to the new inference stack [flink]

2023-11-10 Thread via GitHub


dawidwys merged PR #23634:
URL: https://github.com/apache/flink/pull/23634


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-33419) Port PROCTIME/ROWTIME functions to the new inference stack

2023-11-10 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz closed FLINK-33419.

Resolution: Implemented

Implemented in 3dd98430e5cc7ee88051d0eb6bc2c71908eb997b

> Port PROCTIME/ROWTIME functions to the new inference stack
> --
>
> Key: FLINK-33419
> URL: https://issues.apache.org/jira/browse/FLINK-33419
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33520) FileNotFoundException when running GPUDriverTest

2023-11-10 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-33520:
---

 Summary: FileNotFoundException when running GPUDriverTest
 Key: FLINK-33520
 URL: https://issues.apache.org/jira/browse/FLINK-33520
 Project: Flink
  Issue Type: Technical Debt
  Components: Tests
Affects Versions: 1.18.0
Reporter: Ryan Skraba


I'd been running into a mysterious error running the 
{{flink-external-resources}} module tests:

{code}
java.io.FileNotFoundException: The gpu discovery script does not exist in path 
/opt/asf/flink/src/test/resources/testing-gpu-discovery.sh.
at 
org.apache.flink.externalresource.gpu.GPUDriver.(GPUDriver.java:98)
at 
org.apache.flink.externalresource.gpu.GPUDriverTest.testGPUDriverWithInvalidAmount(GPUDriverTest.java:64)
at
{code}

>From the command line and IntelliJ, when it seems to works, it _always_ works, 
>and when it fails it _always_ fails. I finally took a moment to figure it out: 
>if the {{FLINK_HOME}} environment variable is set (to a valid Flink 
>distribution of any version), this test fails.

This is a very minor irritation, but it's pretty easy to fix.

The workaround is to launch the unit test in an environment where this 
environment variable is not set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33521) Implement restore tests for PythonCalc node

2023-11-10 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33521:
--

 Summary: Implement restore tests for PythonCalc node
 Key: FLINK-33521
 URL: https://issues.apache.org/jira/browse/FLINK-33521
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-33521) Implement restore tests for PythonCalc node

2023-11-10 Thread Jim Hughes (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Hughes reassigned FLINK-33521:
--

Assignee: Jim Hughes

> Implement restore tests for PythonCalc node
> ---
>
> Key: FLINK-33521
> URL: https://issues.apache.org/jira/browse/FLINK-33521
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Jim Hughes
>Assignee: Jim Hughes
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33365) Missing filter condition in execution plan containing lookup join with mysql jdbc connector

2023-11-10 Thread david radley (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784915#comment-17784915
 ] 

david radley edited comment on FLINK-33365 at 11/10/23 3:01 PM:


an update on what I have found:

 

I have switched on DEBUG put out the rules that are being driven for my 
recreation. I see :

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize time_indicator cost 1 ms.

optimize result:

FlinkLogicalSink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, 
type])

+- FlinkLogicalCalc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 
ip0, type])

   +- FlinkLogicalJoin(condition=[=($0, $4)], joinType=[left])

      :- FlinkLogicalCalc(select=[ip, PROCTIME() AS proctime])

      :  +- FlinkLogicalTableSourceScan(table=[[paimon_catalog, default, a]], 
fields=[ip])

      +- FlinkLogicalSnapshot(period=[$cor0.proctime])

         +- FlinkLogicalCalc(select=[ip, CAST(0 AS INTEGER) AS type, CAST(ip AS 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

            +- FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, 
c, {*}filter=[=(type, 0)]]]{*}, fields=[ip, type])

 

Is removed in the next stage.  

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize physical cost 3 ms.

optimize result:

Sink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, type])

+- Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])

   +- LookupJoin(table=[mariadb_catalog.menagerie.c], joinType=[LeftOuterJoin], 
lookup=[ip=ip], select=[ip, proctime, ip, *CAST(0 AS INTEGER)* AS type, CAST(ip 
AS VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

      +- Calc(select=[ip, PROCTIME() AS proctime])

         +- TableSourceScan(table=[[paimon_catalog, default, a]], fields=[ip])

 

The *CAST(0 AS INTEGER)* is in the final Optimized Execution Plan we see in the 
explain This cast is fine as long as the filter is there.

 

I am not an expert at this, but it seems to me that either 2 things are 
happening:

1) This change to the graph is a valid optimization but it is not being 
actioned properly when executed, such that the CAST(0 AS INTEGER) is ignored.

or

2) the comments at the top of 
[CommonPhysicalLookupJoin.scala]([https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/common/CommonPhysicalLookupJoin.scala)]
 are correct and this filter should actually be in the lookup keys. The 
comments says

 
_* For a lookup join query:_
_*_
_*  SELECT T.id, T.content, D.age FROM T JOIN userTable FOR SYSTEM_TIME AS 
OF T.proctime AS D_
_* ON T.content = concat(D.name, '!') AND D.age = 11 AND T.id = D.id WHERE 
D.name LIKE 'Jack%'_
_* _
_*_
_* The LookupJoin physical node encapsulates the following RelNode tree:_
_*_
_*  Join (l.name = r.name) / \ RelNode Calc (concat(name, "!") as name, 
name LIKE 'Jack%') |_
_* DimTable (lookup-keys: age=11, id=l.id) (age, id, name) _
_*_
_* The important member fields in LookupJoin:  allLookupKeys: [$0=11, 
$1=l.id] ($0 and $1 is_
_* the indexes of age and id in dim table) remainingCondition: 
l.name=r.name _
_*_
_* The workflow of lookup join:_
_*_
_* 1) lookup records dimension table using the lookup-keys  2) project & 
filter on the lookup-ed_
_* records  3) join left input record and lookup-ed records  4) only 
outputs the rows which_
_* match to the remainingCondition _
 


was (Author: JIRAUSER300523):
an update on what I have found:

 

I have switched on DEBUG put out the rules that are being driven for my 
recreation. I see :

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize time_indicator cost 1 ms.

optimize result:

FlinkLogicalSink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, 
type])

+- FlinkLogicalCalc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 
ip0, type])

   +- FlinkLogicalJoin(condition=[=($0, $4)], joinType=[left])

      :- FlinkLogicalCalc(select=[ip, PROCTIME() AS proctime])

      :  +- FlinkLogicalTableSourceScan(table=[[paimon_catalog, default, a]], 
fields=[ip])

      +- FlinkLogicalSnapshot(period=[$cor0.proctime])

         +- FlinkLogicalCalc(select=[ip, CAST(0 AS INTEGER) AS type, CAST(ip AS 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

            +- FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, 
c, {*}filter=[=(type, 0)]]]{*}, fields=[ip, type])

 

Is changed in the next rule to  

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize physical cost 3 ms.

optimize result:

Sink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, type])

+- Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])

   +- LookupJoin(table=[mariadb_catalog.menagerie.c], joinType=[LeftOuter

[jira] [Comment Edited] (FLINK-33365) Missing filter condition in execution plan containing lookup join with mysql jdbc connector

2023-11-10 Thread david radley (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784924#comment-17784924
 ] 

david radley edited comment on FLINK-33365 at 11/10/23 3:01 PM:


Testing against MariaDB I see:

select  CAST(0 AS INT) as type  from c;

+--+
|type|

+--+
|    0|
|    0|

+--+

 

Just confirming the cast is not a filter.


was (Author: JIRAUSER300523):
Testing against MariaDB I see:

select  CAST(0 AS INT) as type  from c;

+--+
|type|

+--+
|    0|
|    0|

+--+

 

It looks like this change from a filter to a cast is not logically equivalent. 
It just forces the result set to have that value but does not act as a filter.

> Missing filter condition in execution plan containing lookup join with mysql 
> jdbc connector
> ---
>
> Key: FLINK-33365
> URL: https://issues.apache.org/jira/browse/FLINK-33365
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC
>Affects Versions: 1.18.0, 1.17.1
> Environment: Flink 1.17.1 & Flink 1.18.0 with 
> flink-connector-jdbc-3.1.1-1.17.jar
>Reporter: macdoor615
>Assignee: david radley
>Priority: Critical
> Attachments: flink-connector-jdbc-3.0.0-1.16.png, 
> flink-connector-jdbc-3.1.1-1.17.png
>
>
> create table in flink with sql-client.sh
> {code:java}
> CREATE TABLE default_catalog.default_database.a (
>   ip string, 
>   proctime as proctime()
> ) 
> WITH (
>   'connector' = 'datagen'
> );{code}
> create table in mysql
> {code:java}
> create table b (
>   ip varchar(20), 
>   type int
> );  {code}
>  
> Flink 1.17.1/ 1.18.0 and *flink-connector-jdbc-3.1.1-1.17.jar*
> excute in sql-client.sh 
> {code:java}
> explain SELECT * FROM default_catalog.default_database.a left join 
> bnpmp_mysql_test.gem_tmp.b FOR SYSTEM_TIME AS OF a.proctime on b.type = 0 and 
> a.ip = b.ip; {code}
> get the execution plan
> {code:java}
> ...
> == Optimized Execution Plan ==
> Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])
> +- LookupJoin(table=[bnpmp_mysql_test.gem_tmp.b], joinType=[LeftOuterJoin], 
> lookup=[ip=ip], select=[ip, proctime, ip, CAST(0 AS INTEGER) AS type, CAST(ip 
> AS VARCHAR(2147483647)) AS ip0])
>    +- Calc(select=[ip, PROCTIME() AS proctime])
>       +- TableSourceScan(table=[[default_catalog, default_database, a]], 
> fields=[ip]){code}
>  
> excute same sql in sql-client with Flink 1.17.1/ 1.18.0 and 
> *flink-connector-jdbc-3.0.0-1.16.jar*
> get the execution plan
> {code:java}
> ...
> == Optimized Execution Plan ==
> Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])
> +- LookupJoin(table=[bnpmp_mysql_test.gem_tmp.b], joinType=[LeftOuterJoin], 
> lookup=[type=0, ip=ip], where=[(type = 0)], select=[ip, proctime, ip, CAST(0 
> AS INTEGER) AS type, CAST(ip AS VARCHAR(2147483647)) AS ip0])
>    +- Calc(select=[ip, PROCTIME() AS proctime])
>       +- TableSourceScan(table=[[default_catalog, default_database, a]], 
> fields=[ip]) {code}
> with flink-connector-jdbc-3.1.1-1.17.jar,  the condition is 
> *lookup=[ip=ip]*
> with flink-connector-jdbc-3.0.0-1.16.jar ,  the condition is 
> *lookup=[type=0, ip=ip], where=[(type = 0)]*
>  
> In out real world production environment, this lead incorrect data output
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33365) Missing filter condition in execution plan containing lookup join with mysql jdbc connector

2023-11-10 Thread david radley (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784915#comment-17784915
 ] 

david radley edited comment on FLINK-33365 at 11/10/23 3:04 PM:


an update on what I have found:

 

I have switched on DEBUG put out the rules that are being driven for my 
recreation. I see :

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize time_indicator cost 1 ms.

optimize result:

FlinkLogicalSink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, 
type])

+- FlinkLogicalCalc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 
ip0, type])

   +- FlinkLogicalJoin(condition=[=($0, $4)], joinType=[left])

      :- FlinkLogicalCalc(select=[ip, PROCTIME() AS proctime])

      :  +- FlinkLogicalTableSourceScan(table=[[paimon_catalog, default, a]], 
fields=[ip])

      +- FlinkLogicalSnapshot(period=[$cor0.proctime])

         +- FlinkLogicalCalc(select=[ip, CAST(0 AS INTEGER) AS type, CAST(ip AS 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

            +- FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, 
c, {*}filter=[=(type, 0)]]]{*}, fields=[ip, type])

 

Is removed in the next stage.  

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize physical cost 3 ms.

optimize result:

Sink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, type])

+- Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])

   +- LookupJoin(table=[mariadb_catalog.menagerie.c], joinType=[LeftOuterJoin], 
lookup=[ip=ip], select=[ip, proctime, ip, *CAST(0 AS INTEGER)* AS type, CAST(ip 
AS VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

      +- Calc(select=[ip, PROCTIME() AS proctime])

         +- TableSourceScan(table=[[paimon_catalog, default, a]], fields=[ip])

 

The *CAST(0 AS INTEGER)* is in the final Optimized Execution Plan we see in the 
explain This cast is fine as long as the filter is there.

 

I am not an expert at this, but the comments at the top of 
[CommonPhysicalLookupJoin.scala]([https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/common/CommonPhysicalLookupJoin.scala)]
 are correct and this filter should actually be in the lookup keys. The 
comments says

 
_* For a lookup join query:_
_*_
_*  SELECT T.id, T.content, D.age FROM T JOIN userTable FOR SYSTEM_TIME AS 
OF T.proctime AS D_
_* ON T.content = concat(D.name, '!') AND D.age = 11 AND T.id = D.id WHERE 
D.name LIKE 'Jack%'_
_* _
_*_
_* The LookupJoin physical node encapsulates the following RelNode tree:_
_*_
_*  Join (l.name = r.name) / \ RelNode Calc (concat(name, "!") as name, 
name LIKE 'Jack%') |_
_* DimTable (lookup-keys: age=11, id=l.id) (age, id, name) _
_*_
_* The important member fields in LookupJoin:  allLookupKeys: [$0=11, 
$1=l.id] ($0 and $1 is_
_* the indexes of age and id in dim table) remainingCondition: 
l.name=r.name _
_*_
_* The workflow of lookup join:_
_*_
_* 1) lookup records dimension table using the lookup-keys  2) project & 
filter on the lookup-ed_
_* records  3) join left input record and lookup-ed records  4) only 
outputs the rows which_
_* match to the remainingCondition _
 

I would have thought that a filter on a value on one of the sources could be 
pushed down a a filter to that source, rather than being in the lookup keys. 
Maybe that could be a subsequent optimization. I could be missing something.


was (Author: JIRAUSER300523):
an update on what I have found:

 

I have switched on DEBUG put out the rules that are being driven for my 
recreation. I see :

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize time_indicator cost 1 ms.

optimize result:

FlinkLogicalSink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, 
type])

+- FlinkLogicalCalc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 
ip0, type])

   +- FlinkLogicalJoin(condition=[=($0, $4)], joinType=[left])

      :- FlinkLogicalCalc(select=[ip, PROCTIME() AS proctime])

      :  +- FlinkLogicalTableSourceScan(table=[[paimon_catalog, default, a]], 
fields=[ip])

      +- FlinkLogicalSnapshot(period=[$cor0.proctime])

         +- FlinkLogicalCalc(select=[ip, CAST(0 AS INTEGER) AS type, CAST(ip AS 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

            +- FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, 
c, {*}filter=[=(type, 0)]]]{*}, fields=[ip, type])

 

Is removed in the next stage.  

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize physical cost 3 ms.

optimize result:

Sink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, type])

+- Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])

   +- LookupJoin(table=[mariadb_catalog.menagerie.c]

[jira] [Comment Edited] (FLINK-33365) Missing filter condition in execution plan containing lookup join with mysql jdbc connector

2023-11-10 Thread david radley (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784915#comment-17784915
 ] 

david radley edited comment on FLINK-33365 at 11/10/23 3:08 PM:


an update on what I have found:

 

I have switched on DEBUG put out the rules that are being driven for my 
recreation. I see :

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize time_indicator cost 1 ms.

optimize result:

FlinkLogicalSink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, 
type])

+- FlinkLogicalCalc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 
ip0, type])

   +- FlinkLogicalJoin(condition=[=($0, $4)], joinType=[left])

      :- FlinkLogicalCalc(select=[ip, PROCTIME() AS proctime])

      :  +- FlinkLogicalTableSourceScan(table=[[paimon_catalog, default, a]], 
fields=[ip])

      +- FlinkLogicalSnapshot(period=[$cor0.proctime])

         +- FlinkLogicalCalc(select=[ip, CAST(0 AS INTEGER) AS type, CAST(ip AS 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

            +- FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, 
c, {*}filter=[=(type, 0)]]]{*}, fields=[ip, type])

 

Is removed in the next stage.  

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize physical cost 3 ms.

optimize result:

Sink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, type])

+- Calc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, ip0, type])

   +- LookupJoin(table=[mariadb_catalog.menagerie.c], joinType=[LeftOuterJoin], 
lookup=[ip=ip], select=[ip, proctime, ip, *CAST(0 AS INTEGER)* AS type, CAST(ip 
AS VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

      +- Calc(select=[ip, PROCTIME() AS proctime])

         +- TableSourceScan(table=[[paimon_catalog, default, a]], fields=[ip])

 

The *CAST(0 AS INTEGER)* is in the final Optimized Execution Plan we see in the 
explain This cast is fine as long as the filter is there.

 

I am not an expert at this, but the comments at the top of 
[CommonPhysicalLookupJoin.scala]([https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/common/CommonPhysicalLookupJoin.scala)]
 are correct and this filter should actually be in the lookup keys. The 
comments says

 
_* For a lookup join query:_
_*_
_*  SELECT T.id, T.content, D.age FROM T JOIN userTable FOR SYSTEM_TIME AS 
OF T.proctime AS D_
_* ON T.content = concat(D.name, '!') AND D.age = 11 AND T.id = D.id WHERE 
D.name LIKE 'Jack%'_
_* _
_*_
_* The LookupJoin physical node encapsulates the following RelNode tree:_
_*_
_*  Join (l.name = r.name) / \ RelNode Calc (concat(name, "!") as name, 
name LIKE 'Jack%') |_
_* DimTable (lookup-keys: age=11, id=l.id) (age, id, name) _
_*_
_* The important member fields in LookupJoin:  allLookupKeys: [$0=11, 
$1=l.id] ($0 and $1 is_
_* the indexes of age and id in dim table) remainingCondition: 
l.name=r.name _
_*_
_* The workflow of lookup join:_
_*_
_* 1) lookup records dimension table using the lookup-keys  2) project & 
filter on the lookup-ed_
_* records  3) join left input record and lookup-ed records  4) only 
outputs the rows which_
_* match to the remainingCondition _
 

I would have thought that a filter on a value on one of the sources could be 
pushed down a a filter to that source, rather than being in the lookup keys. 
Maybe that could be a subsequent optimization. We see this in: 

FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, c, 
filter=[=(type, 0)]]], fields=[ip, type])

but this is then lost. 

 


was (Author: JIRAUSER300523):
an update on what I have found:

 

I have switched on DEBUG put out the rules that are being driven for my 
recreation. I see :

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize time_indicator cost 1 ms.

optimize result:

FlinkLogicalSink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, 
type])

+- FlinkLogicalCalc(select=[ip, PROCTIME_MATERIALIZE(proctime) AS proctime, 
ip0, type])

   +- FlinkLogicalJoin(condition=[=($0, $4)], joinType=[left])

      :- FlinkLogicalCalc(select=[ip, PROCTIME() AS proctime])

      :  +- FlinkLogicalTableSourceScan(table=[[paimon_catalog, default, a]], 
fields=[ip])

      +- FlinkLogicalSnapshot(period=[$cor0.proctime])

         +- FlinkLogicalCalc(select=[ip, CAST(0 AS INTEGER) AS type, CAST(ip AS 
VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS ip0])

            +- FlinkLogicalTableSourceScan(table=[[mariadb_catalog, menagerie, 
c, {*}filter=[=(type, 0)]]]{*}, fields=[ip, type])

 

Is removed in the next stage.  

org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram [] - 
optimize physical cost 3 ms.

optimize result:

Sink(table=[*anonymous_collect$4*], fields=[ip, proctime, ip0, type])

+- 

[jira] [Commented] (FLINK-13786) Implement type inference for other functions

2023-11-10 Thread Dawid Wysakowicz (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784990#comment-17784990
 ] 

Dawid Wysakowicz commented on FLINK-13786:
--

I'll close this ticket in favour of more granular approach

> Implement type inference for other functions
> 
>
> Key: FLINK-13786
> URL: https://issues.apache.org/jira/browse/FLINK-13786
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Jingsong Lee
>Assignee: Francesco Guardiani
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-13786) Implement type inference for other functions

2023-11-10 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz closed FLINK-13786.

Resolution: Won't Fix

> Implement type inference for other functions
> 
>
> Key: FLINK-13786
> URL: https://issues.apache.org/jira/browse/FLINK-13786
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Jingsong Lee
>Assignee: Francesco Guardiani
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [Flink 31966] Flink Kubernetes operator lacks TLS support [flink-kubernetes-operator]

2023-11-10 Thread via GitHub


gaborgsomogyi commented on PR #689:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/689#issuecomment-1806003173

   Makes sense, mTLS is one reason what justifies to add the remaining 3 
keystore related configs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-33520] Avoid using FLINK_HOME to find test scripts [flink]

2023-11-10 Thread via GitHub


RyanSkraba opened a new pull request, #23696:
URL: https://github.com/apache/flink/pull/23696

   ## What is the purpose of the change
   
   This is a *very* minor change to avoid depending on the value of the 
`FLINK_HOME` environment variable when discovering the value of the gpu 
discovery test script (only used while testing).
   
   Before this change, the following error occurs on the command line and IDE 
if the `FLINK_HOME` is set:
   
   ```
   java.io.FileNotFoundException: The gpu discovery script does not exist in 
path /opt/asf/flink/src/test/resources/testing-gpu-discovery.sh.
at 
org.apache.flink.externalresource.gpu.GPUDriver.(GPUDriver.java:98)
at 
org.apache.flink.externalresource.gpu.GPUDriverTest.testGPUDriverWithInvalidAmount(GPUDriverTest.java:64)
   ```
   
   ## Brief change log
   
   During unit tests, use an absolute path to the modules `src/test/resources` 
directory instead of a path relative to `FLINK_HOME` (if it exists).
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33520) FileNotFoundException when running GPUDriverTest

2023-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33520:
---
Labels: pull-request-available  (was: )

> FileNotFoundException when running GPUDriverTest
> 
>
> Key: FLINK-33520
> URL: https://issues.apache.org/jira/browse/FLINK-33520
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Tests
>Affects Versions: 1.18.0
>Reporter: Ryan Skraba
>Priority: Minor
>  Labels: pull-request-available
>
> I'd been running into a mysterious error running the 
> {{flink-external-resources}} module tests:
> {code}
> java.io.FileNotFoundException: The gpu discovery script does not exist in 
> path /opt/asf/flink/src/test/resources/testing-gpu-discovery.sh.
>   at 
> org.apache.flink.externalresource.gpu.GPUDriver.(GPUDriver.java:98)
>   at 
> org.apache.flink.externalresource.gpu.GPUDriverTest.testGPUDriverWithInvalidAmount(GPUDriverTest.java:64)
>   at
> {code}
> From the command line and IntelliJ, when it seems to works, it _always_ 
> works, and when it fails it _always_ fails. I finally took a moment to figure 
> it out: if the {{FLINK_HOME}} environment variable is set (to a valid Flink 
> distribution of any version), this test fails.
> This is a very minor irritation, but it's pretty easy to fix.
> The workaround is to launch the unit test in an environment where this 
> environment variable is not set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [Flink 31966] Flink Kubernetes operator lacks TLS support [flink-kubernetes-operator]

2023-11-10 Thread via GitHub


gaborgsomogyi commented on PR #689:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/689#issuecomment-1806013754

   Just for the reference for others: 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/security/security-ssl/#rest-endpoints-external-connectivity
   
   ```
   If mutual authentication is enabled, the keystore and the truststore are 
used by both, the server endpoint and the REST clients as with internal 
connectivity.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33520] Avoid using FLINK_HOME to find test scripts [flink]

2023-11-10 Thread via GitHub


flinkbot commented on PR #23696:
URL: https://github.com/apache/flink/pull/23696#issuecomment-1806015650

   
   ## CI report:
   
   * 741114833862d9def23b3ea09b886c8f5afe0b7b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33515][python] Stream python process output to log instead of collecting it in memory [flink]

2023-11-10 Thread via GitHub


gaborgsomogyi commented on PR #23697:
URL: https://github.com/apache/flink/pull/23697#issuecomment-1806023401

   cc @gyfora @mbalassi @morhidi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-33515][python] Stream python process output to log instead of collecting it in memory [flink]

2023-11-10 Thread via GitHub


gaborgsomogyi opened a new pull request, #23697:
URL: https://github.com/apache/flink/pull/23697

   ## What is the purpose of the change
   
   `PythonDriver` now collects the python process output in a `Stringbuilder` 
instead of streaming it. It can cause OOM when the python process is generating 
huge amount of output. In this PR I've changed the code not to collect the log 
but stream it into log line by line.
   
   ## Brief change log
   
   `PythonDriver` is collecting the python process output line by line and 
stream it into log.
   
   ## Verifying this change
   
   Manually on local machine.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33515) PythonDriver need to stream python process output to log instead of collecting it in memory

2023-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33515:
---
Labels: pull-request-available  (was: )

> PythonDriver need to stream python process output to log instead of 
> collecting it in memory
> ---
>
> Key: FLINK-33515
> URL: https://issues.apache.org/jira/browse/FLINK-33515
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.19.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>  Labels: pull-request-available
>
> PythonDriver now collects the python process output in a Stringbuilder 
> instead of streaming it. It can cause OOM when the python process is 
> generating huge amount of output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33148) Update Web UI to adopt the new "endpoint" field in REST API

2023-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33148:
---
Labels: pull-request-available  (was: )

> Update Web UI to adopt the new "endpoint" field in REST API
> ---
>
> Key: FLINK-33148
> URL: https://issues.apache.org/jira/browse/FLINK-33148
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Web Frontend
>Affects Versions: 1.18.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-33148] Update Web UI to adopt the new "endpoint" field in REST API [flink]

2023-11-10 Thread via GitHub


X-czh opened a new pull request, #23698:
URL: https://github.com/apache/flink/pull/23698

   
   
   ## What is the purpose of the change
   
   Update Web UI to adopt the new "endpoint" field in REST API introduced in 
[FLINK-33147](https://issues.apache.org/jira/browse/FLINK-33147).
   
   ## Brief change log
   
   Update Web UI to adopt the new "endpoint" field in REST API in the following 
places:
   - Job vertex subtask details
   - Job vertex task manager details
   - Job exceptions
   
   ## Verifying this change
   
   Manually verified the change in a local cluster setup.
   
   
![screenshot-20231110-232106](https://github.com/apache/flink/assets/22020529/3264f31e-cfd5-4902-8667-e820077dd1be)
   
![screenshot-20231110-232603](https://github.com/apache/flink/assets/22020529/c861259b-20b2-40ed-9e1e-70a413cd)
   
![screenshot-20231110-232730](https://github.com/apache/flink/assets/22020529/1e873a87-eb83-4a77-b95e-b8c58d7b1b02)
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33515][python] Stream python process output to log instead of collecting it in memory [flink]

2023-11-10 Thread via GitHub


flinkbot commented on PR #23697:
URL: https://github.com/apache/flink/pull/23697#issuecomment-1806033945

   
   ## CI report:
   
   * ac8e2bd3884973b10c8e17110f4dae8e625576fb UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33148] Update Web UI to adopt the new "endpoint" field in REST API [flink]

2023-11-10 Thread via GitHub


flinkbot commented on PR #23698:
URL: https://github.com/apache/flink/pull/23698#issuecomment-1806045977

   
   ## CI report:
   
   * 815fdd06b59c593124bfbe8d9443949c05b4dba4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31631][FileSystems] shade guava in gs-fs filesystem [flink]

2023-11-10 Thread via GitHub


sofam commented on PR #23489:
URL: https://github.com/apache/flink/pull/23489#issuecomment-1806052931

   Any progress on this? We need it for upgrading our jobs to 1.18 :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >