date:20230212

[GitHub] [flink] lindong28 commented on a diff in pull request #21589: [FLINK-25509][connector-base/connector-kafka] Add RecordEvaluator to dynamically stop source based on de-serialized records

2023-02-12 Thread via GitHub



lindong28 commented on code in PR #21589:
URL: https://github.com/apache/flink/pull/21589#discussion_r1103753156


##
flink-connectors/flink-connector-kafka/src/test/java/org/apache/flink/connector/kafka/source/reader/KafkaSourceReaderTest.java:
##
@@ -479,6 +486,134 @@ public void testSupportsPausingOrResumingSplits() throws 
Exception {
 }
 }
 
+@Test
+public void testSupportsUsingRecordEvaluatorWithSplitsFinishedAtMiddle() 
throws Exception {
+final Set finishedSplits = new HashSet<>();
+try (final KafkaSourceReader reader =
+(KafkaSourceReader)
+createReader(
+Boundedness.BOUNDED,
+"groupId",
+new TestingReaderContext(),
+finishedSplits::addAll,
+r -> r == 7 || r == NUM_RECORDS_PER_SPLIT + 
5)) {
+KafkaPartitionSplit split1 =
+new KafkaPartitionSplit(new TopicPartition(TOPIC, 0), 0, 
Integer.MAX_VALUE);
+KafkaPartitionSplit split2 =
+new KafkaPartitionSplit(new TopicPartition(TOPIC, 1), 0, 
Integer.MAX_VALUE);
+reader.addSplits(Arrays.asList(split1, split2));
+reader.notifyNoMoreSplits();
+
+TestingReaderOutput output = new TestingReaderOutput<>();
+pollUntil(
+reader,
+output,
+() -> finishedSplits.size() == 2,
+"The reader cannot get the excepted result before 
timeout.");
+InputStatus status;
+while (true) {
+status = reader.pollNext(output);
+if (status == InputStatus.END_OF_INPUT) {
+break;
+}
+if (status == InputStatus.NOTHING_AVAILABLE) {
+reader.isAvailable().get();
+}
+}
+
+assertThat(output.getEmittedRecords().size()).isEqualTo(12);
+List excepted =
+IntStream.concat(IntStream.range(0, 7), 
IntStream.range(10, 15))
+.boxed()
+.collect(Collectors.toList());
+assertThat(finishedSplits)
+.containsExactly(
+new TopicPartition(TOPIC, 0).toString(),
+new TopicPartition(TOPIC, 1).toString());
+assertThat(output.getEmittedRecords())
+.containsExactlyInAnyOrder(excepted.toArray(new 
Integer[12]));
+assertThat(status).isEqualTo(END_OF_INPUT);
+}
+}
+
+@Test
+public void testSupportsUsingRecordEvaluatorWithSplitsFinishedAtEnd() 
throws Exception {
+final Set finishedSplits = new HashSet<>();
+try (final KafkaSourceReader reader =
+(KafkaSourceReader)
+createReader(
+Boundedness.BOUNDED,
+"groupId",
+new TestingReaderContext(),
+finishedSplits::addAll,
+r -> r == 9 || r == 19)) {
+KafkaPartitionSplit split1 =
+new KafkaPartitionSplit(new TopicPartition(TOPIC, 0), 0, 
Integer.MAX_VALUE);
+KafkaPartitionSplit split2 =
+new KafkaPartitionSplit(new TopicPartition(TOPIC, 1), 0, 
Integer.MAX_VALUE);
+reader.addSplits(Arrays.asList(split1, split2));
+reader.notifyNoMoreSplits();
+
+TestingReaderOutput output = new TestingReaderOutput<>();
+pollUntil(
+reader,
+output,
+() -> finishedSplits.size() == 2,
+"The reader cannot get the excepted result before 
timeout.");
+InputStatus status = reader.pollNext(output);
+
+assertThat(output.getEmittedRecords().size()).isEqualTo(18);
+List excepted =
+IntStream.concat(IntStream.range(0, 9), 
IntStream.range(10, 19))
+.boxed()
+.collect(Collectors.toList());
+assertThat(finishedSplits)
+.containsExactly(
+new TopicPartition(TOPIC, 0).toString(),
+new TopicPartition(TOPIC, 1).toString());
+assertThat(output.getEmittedRecords())
+.containsExactlyInAnyOrder(excepted.toArray(new 
Integer[0]));
+assertThat(status).isEqualTo(END_OF_INPUT);
+}
+}
+
+@Test
+public void testSupportsUsingRecordEvaluatorWithUnfinishedSplit() throws 
Exception {
+final Set finishedSplits = new HashSet<>();
+
+try (final KafkaSourceReader reader =
+(KafkaSourceReader)
+

[GitHub] [flink-connector-pulsar] syhily commented on pull request #25: [FLINK-30606][Connector/Pulsar] Bump the pulsar-client-all to latest 2.11.0.

2023-02-12 Thread via GitHub



syhily commented on PR #25:
URL: 
https://github.com/apache/flink-connector-pulsar/pull/25#issuecomment-1426990810

   This PR is completed on top of #24, we will rebase it until the #24 is 
merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-connector-pulsar] syhily commented on pull request #25: [FLINK-30606][Connector/Pulsar] Bump the pulsar-client-all to latest 2.11.0.

2023-02-12 Thread via GitHub



syhily commented on PR #25:
URL: 
https://github.com/apache/flink-connector-pulsar/pull/25#issuecomment-1427018099

   ```
   2023-02-12T12:00:08.6685939Z [10.449s][error][gc] Failed to commit memory 
(Not enough space)
   2023-02-12T12:00:08.6686284Z [19.855s][error][gc] Forced to lower max Java 
heap size from 2048M(100%) to 2046M(100%)
   2023-02-12T12:00:08.6686711Z [19.855s][error][gc] Failed to allocate initial 
Java heap (2048M)
   2023-02-12T12:00:08.6687024Z Error: Could not create the Java Virtual 
Machine.
   2023-02-12T12:00:08.6688159Z Error: A fatal exception has occurred. Program 
will exit.
   ```
   
   @tisonkun @MartijnVisser Any ideas on this? The new Pulsar docker instance 
may require more memory than the limit?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-30519) Add e2e tests for operator dynamic config

2023-02-12 Thread haiqingchen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687572#comment-17687572
 ] 

haiqingchen commented on FLINK-30519:
-

 Hi [~gyfora]  could you assign this ticket to me? I'd like to work on this.

> Add e2e tests for operator dynamic config
> -
>
> Key: FLINK-30519
> URL: https://issues.apache.org/jira/browse/FLINK-30519
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Critical
>  Labels: starter
>
> The dynamic config feature is currently not covered by e2e tests and is 
> subject to accidental regressions, as shown in:
> https://issues.apache.org/jira/browse/FLINK-30329
> We should add an e2e test that covers this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] jadireddi commented on pull request #21594: [FLINK-30483] [Formats] [Avro] Add avro format support for TIMESTAMP_LTZ

2023-02-12 Thread via GitHub



jadireddi commented on PR #21594:
URL: https://github.com/apache/flink/pull/21594#issuecomment-1427037558

   Thank you @dawidwys @liuml07 for the review. Will make changes as suggested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-kubernetes-operator] gaborgsomogyi commented on pull request #530: [FLINK-30757] Upgrade busybox version to a pinned version for operator

2023-02-12 Thread via GitHub



gaborgsomogyi commented on PR #530:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/530#issuecomment-1427056226

   > Hi @gyfora @gaborgsomogyi , the pipeline checks with 1.35.0 busybox 
version passed. Should we wait for 
https://github.com/docker-library/busybox/issues/162 to be solved or we are 
good with 1.35.0 version?
   
   I'm fine with 1.35.0 as long as it's stable. The main feature what I was 
personally missing is the failure log for init containers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-kubernetes-operator] gaborgsomogyi commented on a diff in pull request #530: [FLINK-30757] Upgrade busybox version to a pinned version for operator

2023-02-12 Thread via GitHub



gaborgsomogyi commented on code in PR #530:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/530#discussion_r1103821473


##
e2e-tests/utils.sh:
##
@@ -175,20 +175,33 @@ function debug_and_show_logs {
 
 echo "Flink logs:"
 kubectl get pods -o jsonpath='{range 
.items[*]}{.metadata.name}{"\n"}{end}' | while read pod;do
-containers=(`kubectl get pods $pod -o 
jsonpath='{.spec.containers[*].name}'`)
-i=0
-for container in "${containers[@]}"; do
-  echo "Current logs for $pod:$container: "
-  kubectl logs $pod $container;
-  restart_count=$(kubectl get pod $pod -o 
jsonpath='{.status.containerStatuses['$i'].restartCount}')
-  if [[ ${restart_count} -gt 0 ]];then
-echo "Previous logs for $pod: "
-kubectl logs $pod $container --previous
-  fi
-done
+print_pod_container_logs "$pod" "{.spec.initContainers[*].name}" 
"{.status.initContainerStatuses[*].restartCount}"
+print_pod_container_logs "$pod" "{.spec.containers[*].name}" 
"{.status.containerStatuses[*].restartCount}"
 done
 }
 
+function print_pod_container_logs {
+  pod=$1
+  container_path=$2
+  restart_count_path=$3
+
+  containers=(`kubectl get pods $pod -o jsonpath=$container_path`)
+  restart_counts=(`kubectl get pod $pod -o jsonpath=$restart_count_path`)
+
+  if [[ -z "$containers" ]];then
+return 0
+  fi
+
+  for idx in "${!containers[@]}"; do
+echo "Current logs for $pod:${containers[idx]}: "

Review Comment:
   I know it was implemented this way so it's a "new feature" but can we add 
some begin/end marker for the logs?
   Maybe
   ```
   BEGIN CURRENT LOGS for 
flink-example-statemachine-dddb4d664-m6n5b:flink-main-container
   foo
   END CURRENT LOGS
   ```
   or something. My eye could parse it easier.
   



##
e2e-tests/utils.sh:
##
@@ -175,20 +175,33 @@ function debug_and_show_logs {
 
 echo "Flink logs:"
 kubectl get pods -o jsonpath='{range 
.items[*]}{.metadata.name}{"\n"}{end}' | while read pod;do
-containers=(`kubectl get pods $pod -o 
jsonpath='{.spec.containers[*].name}'`)
-i=0
-for container in "${containers[@]}"; do
-  echo "Current logs for $pod:$container: "
-  kubectl logs $pod $container;
-  restart_count=$(kubectl get pod $pod -o 
jsonpath='{.status.containerStatuses['$i'].restartCount}')
-  if [[ ${restart_count} -gt 0 ]];then
-echo "Previous logs for $pod: "
-kubectl logs $pod $container --previous
-  fi
-done
+print_pod_container_logs "$pod" "{.spec.initContainers[*].name}" 
"{.status.initContainerStatuses[*].restartCount}"
+print_pod_container_logs "$pod" "{.spec.containers[*].name}" 
"{.status.containerStatuses[*].restartCount}"

Review Comment:
   I would add `echo "Printing init container logs"` and `echo "Printing main 
container logs"` for better understanding.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] SwimSweet commented on pull request #20779: [FLINK-28915] Flink Native k8s mode jar localtion support s3 schema.

2023-02-12 Thread via GitHub



SwimSweet commented on PR #20779:
URL: https://github.com/apache/flink/pull/20779#issuecomment-1427078636

   @wangyang0918 Please take a look..thx..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-connector-mongodb] Jiabao-Sun opened a new pull request, #2: [FLINK-30967][doc] Add MongoDB connector documentation

2023-02-12 Thread via GitHub



Jiabao-Sun opened a new pull request, #2:
URL: https://github.com/apache/flink-connector-mongodb/pull/2

   ## What is the purpose of the change
   Add MongoDB connector documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-30967) Add MongoDB connector documentation

2023-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-30967:
---
Labels: pull-request-available  (was: )

> Add MongoDB connector documentation
> ---
>
> Key: FLINK-30967
> URL: https://issues.apache.org/jira/browse/FLINK-30967
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / MongoDB, Documentation
>Reporter: Chesnay Schepler
>Assignee: Jiabao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: mongodb-1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] swuferhong commented on pull request #21900: [FLINK-30971][table-runtime] Modify the default value of parameter 'table.exec.local-hash-agg.adaptive.sampling-threshold'

2023-02-12 Thread via GitHub



swuferhong commented on PR #21900:
URL: https://github.com/apache/flink/pull/21900#issuecomment-1427203536

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] FangYongs commented on a diff in pull request #522: [FLINK-30979] Support shuffling data by partition

2023-02-12 Thread via GitHub



FangYongs commented on code in PR #522:
URL: https://github.com/apache/flink-table-store/pull/522#discussion_r1103924487


##
flink-table-store-connector/src/main/java/org/apache/flink/table/store/connector/FlinkConnectorOptions.java:
##
@@ -69,6 +69,13 @@ public class FlinkConnectorOptions {
 
 public static final ConfigOption SINK_PARALLELISM = 
FactoryUtil.SINK_PARALLELISM;
 
+public static final ConfigOption SINK_SHUFFLE_BY_PARTITION =
+ConfigOptions.key("sink.shuffle-by-partition.enable")

Review Comment:
   Thanks @JingsongLi I have rebased from master and rename the config name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-31003) Flink SQL IF / CASE WHEN Funcation incorrect

2023-02-12 Thread weiqinpan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687663#comment-17687663
 ] 

weiqinpan commented on FLINK-31003:
---

BTW, CASE WHEN + IFNULL also have logic problem.

> Flink SQL IF / CASE WHEN Funcation incorrect
> 
>
> Key: FLINK-31003
> URL: https://issues.apache.org/jira/browse/FLINK-31003
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.15.0, 1.15.1, 1.16.0, 1.15.2, 1.15.3, 1.16.1
>Reporter: weiqinpan
>Priority: Major
>
> When I execute the below sql using sql-client，i found something wrong.
>  
> {code:java}
> CREATE TEMPORARY TABLE source (
>   mktgmsg_biz_type STRING,
>   marketing_flow_id STRING,
>   mktgmsg_campaign_id STRING
> )
> WITH
> (
>   'connector' = 'filesystem',
>   'path' = 'file:///Users/xxx/Desktop/demo.json',
>   'format' = 'json'
> ); 
> -- return correct value('marketing_flow_id') 
> SELECT IF(`marketing_flow_id` IS NOT NULL, `marketing_flow_id`, '') FROM 
> source;
> -- return incorrect value('')
> SELECT IF(`marketing_flow_id` IS  NULL, '', `marketing_flow_id`) FROM 
> source;{code}
> The demo.json data is 
>  
> {code:java}
> {"mktgmsg_biz_type": "marketing_flow", "marketing_flow_id": 
> "marketing_flow_id", "mktgmsg_campaign_id": "mktgmsg_campaign_id"} {code}
>  
>  
> BTW, use case when + if / ifnull also have something wrong.
>  
> {code:java}
> -- return wrong value(''), expect return marketing_flow_id
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN IF(`marketing_flow_id` 
> IS NULL, `marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IF(`mktgmsg_campaign_id` IS NULL, '', `mktgmsg_campaign_id`)
>   ELSE ''
>   END AS `message_campaign_instance_id` FROM source;
> -- return wrong value('')
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN 
> IFNULL(`marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IFNULL(`mktgmsg_campaign_id`, '')
>   ELSE ''
>   END AS `message_campaign_instance_id` FROM source;
> -- return correct value, the difference is [else return ' ']
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN 
> IFNULL(`marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IFNULL(`mktgmsg_campaign_id`, '')
>   ELSE ' '
>   END AS `message_campaign_instance_id` FROM source;
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31006) job is not finished when using pipeline mode to run bounded source like kafka/pulsar

2023-02-12 Thread Yufan Sheng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687666#comment-17687666
 ] 

Yufan Sheng commented on FLINK-31006:
-

Hi [~jackylau], can you give me a clear reproduce steps on using 
{{flink-connector-pulsar}}?

> job is not finished when using pipeline mode to run bounded source like 
> kafka/pulsar
> 
>
> Key: FLINK-31006
> URL: https://issues.apache.org/jira/browse/FLINK-31006
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Connectors / Pulsar
>Affects Versions: 1.17.0
>Reporter: jackylau
>Assignee: jackylau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
> Attachments: image-2023-02-10-13-20-52-890.png, 
> image-2023-02-10-13-23-38-430.png, image-2023-02-10-13-24-46-929.png
>
>
> when i do failover works like kill jm/tm when using  pipeline mode to run 
> bounded source like kafka, i found job is not finished, when every partition 
> data has consumed.
>  
> After dig into code, i found this logical not run when JM recover. the 
> partition infos are not changed. so noMoreNewPartitionSplits is not set to 
> true. then this will not run 
>  
> !image-2023-02-10-13-23-38-430.png!
>  
> !image-2023-02-10-13-24-46-929.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-29825) Improve benchmark stability

2023-02-12 Thread Yanfei Lei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-29825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687668#comment-17687668
 ] 

Yanfei Lei commented on FLINK-29825:


Thanks for taking the time to review the evaluation results. Writing a blog is 
a good idea👍,  and I also intend to implement Dong's algorithm completely (only 
the max-based algorithm under “moreisbetter" is implemented during evaluation) 
to replace the median-based algorithm.

> Improve benchmark stability
> ---
>
> Key: FLINK-29825
> URL: https://issues.apache.org/jira/browse/FLINK-29825
> Project: Flink
>  Issue Type: Improvement
>  Components: Benchmarks
>Affects Versions: 1.17.0
>Reporter: Yanfei Lei
>Assignee: Yanfei Lei
>Priority: Minor
>
> Currently, regressions are detected by a simple script which may have false 
> positives and false negatives, especially for benchmarks with small absolute 
> values, small value changes would cause large percentage changes. see 
> [here|https://github.com/apache/flink-benchmarks/blob/master/regression_report.py#L132-L136]
>  for details.
> And all benchmarks are executed on one physical machine, it might happen that 
> hardware issues affect performance, like "[FLINK-18614] Performance 
> regression 2020.07.13".
>  
> This ticket aims to improve the precision and recall of the regression-check 
> script.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30969) Pyflink table example throws "module 'pandas' has no attribute 'Int8Dtype'"

2023-02-12 Thread Juntao Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juntao Hu updated FLINK-30969:
--
Fix Version/s: 1.15.4
   1.16.2
Affects Version/s: 1.16.1
   1.15.3

> Pyflink table example throws "module 'pandas' has no attribute 'Int8Dtype'"
> ---
>
> Key: FLINK-30969
> URL: https://issues.apache.org/jira/browse/FLINK-30969
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0, 1.15.3, 1.16.1
>Reporter: Juntao Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.15.4, 1.16.2, 1.18.0
>
>
> After apache-beam is upgraded to 2.43.0 in 1.17, running `python 
> pyflink/examples/table/basic_operations.py` will throw error:
> {code:java}
> Traceback (most recent call last):
>   File "pyflink/examples/table/basic_operations.py", line 484, in 
>     basic_operations()
>   File "pyflink/examples/table/basic_operations.py", line 29, in 
> basic_operations
>     t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 121, in create
>     return TableEnvironment(j_tenv)
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 100, in __init__
>     self._open()
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 1640, in _open
>     startup_loopback_server()
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 1631, in startup_loopback_server
>     from pyflink.fn_execution.beam.beam_worker_pool_service import \
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/fn_execution/beam/beam_worker_pool_service.py",
>  line 31, in 
>     from apache_beam.options.pipeline_options import DebugOptions
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/__init__.py",
>  line 92, in 
>     from apache_beam import coders
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/coders/__init__.py",
>  line 17, in 
>     from apache_beam.coders.coders import *
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/coders/coders.py",
>  line 59, in 
>     from apache_beam.coders import coder_impl
>   File "apache_beam/coders/coder_impl.py", line 63, in init 
> apache_beam.coders.coder_impl
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/typehints/__init__.py",
>  line 31, in 
>     from apache_beam.typehints.pandas_type_compatibility import *
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/typehints/pandas_type_compatibility.py",
>  line 81, in 
>     (pd.Int8Dtype(), Optional[np.int8]),
> AttributeError: module 'pandas' has no attribute 'Int8Dtype' {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28505) Pulsar sink doesn't support topic auto creation

2023-02-12 Thread Yufan Sheng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687669#comment-17687669
 ] 

Yufan Sheng commented on FLINK-28505:
-

[~Tison] Can you help me close this issue? It's fixed in FLINK-28351.

> Pulsar sink doesn't support topic auto creation
> ---
>
> Key: FLINK-28505
> URL: https://issues.apache.org/jira/browse/FLINK-28505
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.15.0
>Reporter: Yufan Sheng
>Assignee: Yufan Sheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28351) Pulsar Sink should support dynamic generated topic from record

2023-02-12 Thread Yufan Sheng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687670#comment-17687670
 ] 

Yufan Sheng commented on FLINK-28351:
-

[~tison] Where is the release note? I can't see the link.

> Pulsar Sink should support dynamic generated topic from record
> --
>
> Key: FLINK-28351
> URL: https://issues.apache.org/jira/browse/FLINK-28351
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Pulsar
>Affects Versions: 1.15.0
>Reporter: Yufan Sheng
>Assignee: Yufan Sheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: pulsar-4.0.0
>
>
> Some people would like to use dynamically-generated topics from messages and 
> use the key hash range policy. This is not supported by the Pulsar sink 
> currently. We would introduce a new interface named TopicExacter and add a 
> new setTopics(TopicExacter) in PulsarSinkBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-28505) Pulsar sink doesn't support topic auto creation

2023-02-12 Thread Yufan Sheng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687669#comment-17687669
 ] 

Yufan Sheng edited comment on FLINK-28505 at 2/13/23 2:25 AM:
--

[~tison] Can you help me close this issue? It's fixed in FLINK-28351.


was (Author: syhily):
[~Tison] Can you help me close this issue? It's fixed in FLINK-28351.

> Pulsar sink doesn't support topic auto creation
> ---
>
> Key: FLINK-28505
> URL: https://issues.apache.org/jira/browse/FLINK-28505
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.15.0
>Reporter: Yufan Sheng
>Assignee: Yufan Sheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31006) job is not finished when using pipeline mode to run bounded source like kafka/pulsar

2023-02-12 Thread Yufan Sheng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687671#comment-17687671
 ] 

Yufan Sheng commented on FLINK-31006:
-

No more splits is only used in Pulsar connector for discovering the topic 
partitions change only once. The stop of the pipeline is only determined by 
{{StopCurosr}}. After reading the whole issue you submitted. I don't think this 
is a bug in {{flink-connector-pulsar}}. I'm still waiting for your further 
reply.

> job is not finished when using pipeline mode to run bounded source like 
> kafka/pulsar
> 
>
> Key: FLINK-31006
> URL: https://issues.apache.org/jira/browse/FLINK-31006
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Connectors / Pulsar
>Affects Versions: 1.17.0
>Reporter: jackylau
>Assignee: jackylau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
> Attachments: image-2023-02-10-13-20-52-890.png, 
> image-2023-02-10-13-23-38-430.png, image-2023-02-10-13-24-46-929.png
>
>
> when i do failover works like kill jm/tm when using  pipeline mode to run 
> bounded source like kafka, i found job is not finished, when every partition 
> data has consumed.
>  
> After dig into code, i found this logical not run when JM recover. the 
> partition infos are not changed. so noMoreNewPartitionSplits is not set to 
> true. then this will not run 
>  
> !image-2023-02-10-13-23-38-430.png!
>  
> !image-2023-02-10-13-24-46-929.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30703) PulsarOrderedPartitionSplitReaderTest>PulsarPartitionSplitReaderTestBase.consumeMessageCreatedBeforeHandleSplitsChangesAndResetToEarliestPosition fails

2023-02-12 Thread Yufan Sheng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687673#comment-17687673
 ] 

Yufan Sheng commented on FLINK-30703:
-

After debugging with Pulsar developers, we still can't reproduce this bug 
locally. I prefer to keep this issue opened for a while and close it latter if 
no such issue occur. WDYT [~mapohl]?

> PulsarOrderedPartitionSplitReaderTest>PulsarPartitionSplitReaderTestBase.consumeMessageCreatedBeforeHandleSplitsChangesAndResetToEarliestPosition
>  fails
> ---
>
> Key: FLINK-30703
> URL: https://issues.apache.org/jira/browse/FLINK-30703
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.16.0, 1.15.3, pulsar-3.0.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> A 1.15 build failed due to 
> {{PulsarOrderedPartitionSplitReaderTest>PulsarPartitionSplitReaderTestBase.consumeMessageCreatedBeforeHandleSplitsChangesAndResetToEarliestPosition}}:
> {code}
> an 11 03:08:20 [ERROR]   
> PulsarOrderedPartitionSplitReaderTest>PulsarPartitionSplitReaderTestBase.consumeMessageCreatedBeforeHandleSplitsChangesAndResetToEarliestPosition:260->PulsarPartitionSplitReaderTestBase.fetchedMessages:169->PulsarPartitionSplitReaderTestBase.fetchedMessages:199
>  [We should fetch the expected size] 
> Jan 11 03:08:20 Expected size: 20 but was: 25 in:
> Jan 11 03:08:20 [PulsarMessage{id=154:0:0, value=ZHiaOiOhFT, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:1:0, value=uoOmKceWTh, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:2:0, value=oeYYWzisge, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:3:0, value=yhpOOkNLER, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:4:0, value=MIbzxkfFfp, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:0:0, value=ZHiaOiOhFT, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:1:0, value=uoOmKceWTh, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:2:0, value=oeYYWzisge, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:3:0, value=yhpOOkNLER, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:4:0, value=MIbzxkfFfp, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:5:0, value=FaEpggGBTE, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:6:0, value=IlGbzPRuvi, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:7:0, value=aqgIeSbOzo, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:8:0, value=CUScdPriyM, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:9:0, value=zLRsvDxpJG, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:10:0, value=iqsGFVkXDz, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:11:0, value=hFyHYPldqN, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:12:0, value=ZYTlJcwSst, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:13:0, value=mOWkzJQQxE, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:14:0, value=CTfcVXhfUN, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:15:0, value=wAhqCGGwXO, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:16:0, value=LOoTbXaEgG, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:17:0, value=UvfzGIRURy, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:18:0, value=vlYYGbZGAH, eventTime=0},
> Jan 11 03:08:20 PulsarMessage{id=154:19:0, value=wrwPanXvql, eventTime=0}]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44691&view=logs&j=a5ef94ef-68c2-57fd-3794-dc108ed1c495&t=2c68b137-b01d-55c9-e603-3ff3f320364b&l=27466



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30292) Better support for conversion between DataType and TypeInformation

2023-02-12 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-30292:

Fix Version/s: 1.18.0

> Better support for conversion between DataType and TypeInformation
> --
>
> Key: FLINK-30292
> URL: https://issues.apache.org/jira/browse/FLINK-30292
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.15.3
>Reporter: Yunfeng Zhou
>Priority: Major
> Fix For: 1.18.0
>
>
> In Flink 1.15, we have the following ways to convert a DataType to a 
> TypeInformation. Each of them has some disadvantages.
> * `TypeConversions.fromDataTypeToLegacyInfo`
> It might lead to precision losses in face of some data types like timestamp.
> It has been deprecated.
> * `ExternalTypeInfo.of`
> It cannot be used to get detailed type information like `RowTypeInfo`
> It might bring some serialization overhead.
> Given that the ways mentioned above are both not perfect,  Flink SQL should 
> provide a better API to support DataType-TypeInformation conversions, and 
> thus better support Table-DataStream conversions.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31003) Flink SQL IF / CASE WHEN Funcation incorrect

2023-02-12 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687674#comment-17687674
 ] 

luoyuxia commented on FLINK-31003:
--

Hi, everyone. For me,  it seems the same issue of FLINK-30559. Is that right?

> Flink SQL IF / CASE WHEN Funcation incorrect
> 
>
> Key: FLINK-31003
> URL: https://issues.apache.org/jira/browse/FLINK-31003
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.15.0, 1.15.1, 1.16.0, 1.15.2, 1.15.3, 1.16.1
>Reporter: weiqinpan
>Priority: Major
>
> When I execute the below sql using sql-client，i found something wrong.
>  
> {code:java}
> CREATE TEMPORARY TABLE source (
>   mktgmsg_biz_type STRING,
>   marketing_flow_id STRING,
>   mktgmsg_campaign_id STRING
> )
> WITH
> (
>   'connector' = 'filesystem',
>   'path' = 'file:///Users/xxx/Desktop/demo.json',
>   'format' = 'json'
> ); 
> -- return correct value('marketing_flow_id') 
> SELECT IF(`marketing_flow_id` IS NOT NULL, `marketing_flow_id`, '') FROM 
> source;
> -- return incorrect value('')
> SELECT IF(`marketing_flow_id` IS  NULL, '', `marketing_flow_id`) FROM 
> source;{code}
> The demo.json data is 
>  
> {code:java}
> {"mktgmsg_biz_type": "marketing_flow", "marketing_flow_id": 
> "marketing_flow_id", "mktgmsg_campaign_id": "mktgmsg_campaign_id"} {code}
>  
>  
> BTW, use case when + if / ifnull also have something wrong.
>  
> {code:java}
> -- return wrong value(''), expect return marketing_flow_id
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN IF(`marketing_flow_id` 
> IS NULL, `marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IF(`mktgmsg_campaign_id` IS NULL, '', `mktgmsg_campaign_id`)
>   ELSE ''
>   END AS `message_campaign_instance_id` FROM source;
> -- return wrong value('')
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN 
> IFNULL(`marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IFNULL(`mktgmsg_campaign_id`, '')
>   ELSE ''
>   END AS `message_campaign_instance_id` FROM source;
> -- return correct value, the difference is [else return ' ']
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN 
> IFNULL(`marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IFNULL(`mktgmsg_campaign_id`, '')
>   ELSE ' '
>   END AS `message_campaign_instance_id` FROM source;
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-31003) Flink SQL IF / CASE WHEN Funcation incorrect

2023-02-12 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687674#comment-17687674
 ] 

luoyuxia edited comment on FLINK-31003 at 2/13/23 2:36 AM:
---

Hi, everyone. For me,  it seems the same issue of FLINK-30559. for which a pr 
is available.

Is that right?


was (Author: luoyuxia):
Hi, everyone. For me,  it seems the same issue of FLINK-30559. Is that right?

> Flink SQL IF / CASE WHEN Funcation incorrect
> 
>
> Key: FLINK-31003
> URL: https://issues.apache.org/jira/browse/FLINK-31003
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.15.0, 1.15.1, 1.16.0, 1.15.2, 1.15.3, 1.16.1
>Reporter: weiqinpan
>Priority: Major
>
> When I execute the below sql using sql-client，i found something wrong.
>  
> {code:java}
> CREATE TEMPORARY TABLE source (
>   mktgmsg_biz_type STRING,
>   marketing_flow_id STRING,
>   mktgmsg_campaign_id STRING
> )
> WITH
> (
>   'connector' = 'filesystem',
>   'path' = 'file:///Users/xxx/Desktop/demo.json',
>   'format' = 'json'
> ); 
> -- return correct value('marketing_flow_id') 
> SELECT IF(`marketing_flow_id` IS NOT NULL, `marketing_flow_id`, '') FROM 
> source;
> -- return incorrect value('')
> SELECT IF(`marketing_flow_id` IS  NULL, '', `marketing_flow_id`) FROM 
> source;{code}
> The demo.json data is 
>  
> {code:java}
> {"mktgmsg_biz_type": "marketing_flow", "marketing_flow_id": 
> "marketing_flow_id", "mktgmsg_campaign_id": "mktgmsg_campaign_id"} {code}
>  
>  
> BTW, use case when + if / ifnull also have something wrong.
>  
> {code:java}
> -- return wrong value(''), expect return marketing_flow_id
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN IF(`marketing_flow_id` 
> IS NULL, `marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IF(`mktgmsg_campaign_id` IS NULL, '', `mktgmsg_campaign_id`)
>   ELSE ''
>   END AS `message_campaign_instance_id` FROM source;
> -- return wrong value('')
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN 
> IFNULL(`marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IFNULL(`mktgmsg_campaign_id`, '')
>   ELSE ''
>   END AS `message_campaign_instance_id` FROM source;
> -- return correct value, the difference is [else return ' ']
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN 
> IFNULL(`marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IFNULL(`mktgmsg_campaign_id`, '')
>   ELSE ' '
>   END AS `message_campaign_instance_id` FROM source;
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30966) Flink SQL IF FUNCTION logic error

2023-02-12 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687676#comment-17687676
 ] 

luoyuxia commented on FLINK-30966:
--

Seems same to FLINK-31003? 

> Flink SQL IF FUNCTION logic error
> -
>
> Key: FLINK-30966
> URL: https://issues.apache.org/jira/browse/FLINK-30966
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.16.1
>Reporter: 谢波
>Priority: Major
>
> my data is 
> {code:java}
> //
> { "before": { "status": "sent" }, "after": { "status": "succeed" }, "op": 
> "u", "ts_ms": 1671926400225, "transaction": null } {code}
> my sql is 
>  
> {code:java}
> CREATE TABLE t
> (
> before ROW (
> status varchar (32)
> ),
> after ROW (
> status varchar (32)
> ),
> ts_msbigint,
> op   string,
> kafka_timestamp  timestamp METADATA FROM 'timestamp',
> -- @formatter:off
> proctime AS PROCTIME()
> -- @formatter:on
> ) WITH (
> 'connector' = 'kafka',
> -- 'topic' = '',
> 'topic' = 'test',
> 'properties.bootstrap.servers' = ' ',
> 'properties.group.id' = '',
> 'format' = 'json',
> 'scan.topic-partition-discovery.interval' = '60s',
> 'scan.startup.mode' = 'earliest-offset',
> 'json.ignore-parse-errors' = 'true'
>  );
> create table p
> (
> status  STRING ,
> before_status  STRING ,
> after_status  STRING ,
> metadata_operation  STRING COMMENT '源记录操作类型',
> dt  STRING
> )WITH (
> 'connector' = 'print'
>  );
> INSERT INTO p
> SELECT
>IF(op <> 'd', after.status, before.status),
> before.status,
> after.status,
>op AS metadata_operation,
>DATE_FORMAT(kafka_timestamp, '-MM-dd') AS dt
> FROM t;
>  {code}
>  my local env output is 
>  
>  
> {code:java}
> +I[null, sent, succeed, u, 2023-02-08] {code}
>  
>  my produtionc env output is 
> {code:java}
> +I[sent, sent, succeed, u, 2023-02-08]  {code}
> why?  
> This look like a bug.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-30969) Pyflink table example throws "module 'pandas' has no attribute 'Int8Dtype'"

2023-02-12 Thread Juntao Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juntao Hu updated FLINK-30969:
--
Fix Version/s: (was: 1.15.4)
   (was: 1.16.2)
Affects Version/s: (was: 1.15.3)
   (was: 1.16.1)

> Pyflink table example throws "module 'pandas' has no attribute 'Int8Dtype'"
> ---
>
> Key: FLINK-30969
> URL: https://issues.apache.org/jira/browse/FLINK-30969
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Juntao Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0, 1.18.0
>
>
> After apache-beam is upgraded to 2.43.0 in 1.17, running `python 
> pyflink/examples/table/basic_operations.py` will throw error:
> {code:java}
> Traceback (most recent call last):
>   File "pyflink/examples/table/basic_operations.py", line 484, in 
>     basic_operations()
>   File "pyflink/examples/table/basic_operations.py", line 29, in 
> basic_operations
>     t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 121, in create
>     return TableEnvironment(j_tenv)
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 100, in __init__
>     self._open()
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 1640, in _open
>     startup_loopback_server()
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 1631, in startup_loopback_server
>     from pyflink.fn_execution.beam.beam_worker_pool_service import \
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/fn_execution/beam/beam_worker_pool_service.py",
>  line 31, in 
>     from apache_beam.options.pipeline_options import DebugOptions
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/__init__.py",
>  line 92, in 
>     from apache_beam import coders
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/coders/__init__.py",
>  line 17, in 
>     from apache_beam.coders.coders import *
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/coders/coders.py",
>  line 59, in 
>     from apache_beam.coders import coder_impl
>   File "apache_beam/coders/coder_impl.py", line 63, in init 
> apache_beam.coders.coder_impl
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/typehints/__init__.py",
>  line 31, in 
>     from apache_beam.typehints.pandas_type_compatibility import *
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/typehints/pandas_type_compatibility.py",
>  line 81, in 
>     (pd.Int8Dtype(), Optional[np.int8]),
> AttributeError: module 'pandas' has no attribute 'Int8Dtype' {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30627) Refactor FileSystemTableSink to use FileSink instead of StreamingFileSink

2023-02-12 Thread Yuxin Tan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687680#comment-17687680
 ] 

Yuxin Tan commented on FLINK-30627:
---

[~martijnvisser] Sorry that I underestimated the complexity of this problem at 
the beginning. I don't have a good solution yet.

> Refactor FileSystemTableSink to use FileSink instead of StreamingFileSink
> -
>
> Key: FLINK-30627
> URL: https://issues.apache.org/jira/browse/FLINK-30627
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / FileSystem
>Reporter: Martijn Visser
>Priority: Major
>
> {{FileSystemTableSink}} currently depends on most of the capabilities from 
> {{StreamingFileSink}}, for example 
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/table/FileSystemTableSink.java#L223-L243
> This is necessary to complete FLINK-28641



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30486) Make the config documentation generator available to connector repos

2023-02-12 Thread Yufan Sheng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687681#comment-17687681
 ] 

Yufan Sheng commented on FLINK-30486:
-

I just copied the code from flink-docs and execute in flink-connector-pulsar to 
generate all the docs. It would be great to have a related jar file in maven.

> Make the config documentation generator available to connector repos
> 
>
> Key: FLINK-30486
> URL: https://issues.apache.org/jira/browse/FLINK-30486
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Common, Connectors / Pulsar, Documentation
>Reporter: Martijn Visser
>Priority: Major
>
> Most connectors can be externalized without any issues. This becomes 
> problematic when connectors provide configuration options, like Pulsar does. 
> As discussed in 
> https://github.com/apache/flink/pull/21501#discussion_r1046979593 we need to 
> make the config documentation generator available for connector repositories. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31003) Flink SQL IF / CASE WHEN Funcation incorrect

2023-02-12 Thread Shuiqiang Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687682#comment-17687682
 ] 

Shuiqiang Chen commented on FLINK-31003:


[~luoyuxia] +1

> Flink SQL IF / CASE WHEN Funcation incorrect
> 
>
> Key: FLINK-31003
> URL: https://issues.apache.org/jira/browse/FLINK-31003
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.15.0, 1.15.1, 1.16.0, 1.15.2, 1.15.3, 1.16.1
>Reporter: weiqinpan
>Priority: Major
>
> When I execute the below sql using sql-client，i found something wrong.
>  
> {code:java}
> CREATE TEMPORARY TABLE source (
>   mktgmsg_biz_type STRING,
>   marketing_flow_id STRING,
>   mktgmsg_campaign_id STRING
> )
> WITH
> (
>   'connector' = 'filesystem',
>   'path' = 'file:///Users/xxx/Desktop/demo.json',
>   'format' = 'json'
> ); 
> -- return correct value('marketing_flow_id') 
> SELECT IF(`marketing_flow_id` IS NOT NULL, `marketing_flow_id`, '') FROM 
> source;
> -- return incorrect value('')
> SELECT IF(`marketing_flow_id` IS  NULL, '', `marketing_flow_id`) FROM 
> source;{code}
> The demo.json data is 
>  
> {code:java}
> {"mktgmsg_biz_type": "marketing_flow", "marketing_flow_id": 
> "marketing_flow_id", "mktgmsg_campaign_id": "mktgmsg_campaign_id"} {code}
>  
>  
> BTW, use case when + if / ifnull also have something wrong.
>  
> {code:java}
> -- return wrong value(''), expect return marketing_flow_id
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN IF(`marketing_flow_id` 
> IS NULL, `marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IF(`mktgmsg_campaign_id` IS NULL, '', `mktgmsg_campaign_id`)
>   ELSE ''
>   END AS `message_campaign_instance_id` FROM source;
> -- return wrong value('')
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN 
> IFNULL(`marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IFNULL(`mktgmsg_campaign_id`, '')
>   ELSE ''
>   END AS `message_campaign_instance_id` FROM source;
> -- return correct value, the difference is [else return ' ']
> select CASE
>   WHEN `mktgmsg_biz_type` = 'marketing_flow'     THEN 
> IFNULL(`marketing_flow_id`, '')
>   WHEN `mktgmsg_biz_type` = 'mktgmsg_campaign'   THEN 
> IFNULL(`mktgmsg_campaign_id`, '')
>   ELSE ' '
>   END AS `message_campaign_instance_id` FROM source;
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30966) Flink SQL IF FUNCTION logic error

2023-02-12 Thread Shuiqiang Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687683#comment-17687683
 ] 

Shuiqiang Chen commented on FLINK-30966:


[~luoyuxia]The matter of 
[FLINK-31003|https://issues.apache.org/jira/browse/FLINK-31003] is return type 
inferencing, while there is another issue that the code block of result term 
casting might be wrong.

> Flink SQL IF FUNCTION logic error
> -
>
> Key: FLINK-30966
> URL: https://issues.apache.org/jira/browse/FLINK-30966
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.16.1
>Reporter: 谢波
>Priority: Major
>
> my data is 
> {code:java}
> //
> { "before": { "status": "sent" }, "after": { "status": "succeed" }, "op": 
> "u", "ts_ms": 1671926400225, "transaction": null } {code}
> my sql is 
>  
> {code:java}
> CREATE TABLE t
> (
> before ROW (
> status varchar (32)
> ),
> after ROW (
> status varchar (32)
> ),
> ts_msbigint,
> op   string,
> kafka_timestamp  timestamp METADATA FROM 'timestamp',
> -- @formatter:off
> proctime AS PROCTIME()
> -- @formatter:on
> ) WITH (
> 'connector' = 'kafka',
> -- 'topic' = '',
> 'topic' = 'test',
> 'properties.bootstrap.servers' = ' ',
> 'properties.group.id' = '',
> 'format' = 'json',
> 'scan.topic-partition-discovery.interval' = '60s',
> 'scan.startup.mode' = 'earliest-offset',
> 'json.ignore-parse-errors' = 'true'
>  );
> create table p
> (
> status  STRING ,
> before_status  STRING ,
> after_status  STRING ,
> metadata_operation  STRING COMMENT '源记录操作类型',
> dt  STRING
> )WITH (
> 'connector' = 'print'
>  );
> INSERT INTO p
> SELECT
>IF(op <> 'd', after.status, before.status),
> before.status,
> after.status,
>op AS metadata_operation,
>DATE_FORMAT(kafka_timestamp, '-MM-dd') AS dt
> FROM t;
>  {code}
>  my local env output is 
>  
>  
> {code:java}
> +I[null, sent, succeed, u, 2023-02-08] {code}
>  
>  my produtionc env output is 
> {code:java}
> +I[sent, sent, succeed, u, 2023-02-08]  {code}
> why?  
> This look like a bug.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-30966) Flink SQL IF FUNCTION logic error

2023-02-12 Thread Shuiqiang Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687683#comment-17687683
 ] 

Shuiqiang Chen edited comment on FLINK-30966 at 2/13/23 2:51 AM:
-

[~luoyuxia]The matter of 
[FLINK-31003|https://issues.apache.org/jira/browse/FLINK-31003] is return type 
inferencing, while there is another issue that the code block position of 
result term casting might be wrong.


was (Author: csq):
[~luoyuxia]The matter of 
[FLINK-31003|https://issues.apache.org/jira/browse/FLINK-31003] is return type 
inferencing, while there is another issue that the code block of result term 
casting might be wrong.

> Flink SQL IF FUNCTION logic error
> -
>
> Key: FLINK-30966
> URL: https://issues.apache.org/jira/browse/FLINK-30966
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.16.1
>Reporter: 谢波
>Priority: Major
>
> my data is 
> {code:java}
> //
> { "before": { "status": "sent" }, "after": { "status": "succeed" }, "op": 
> "u", "ts_ms": 1671926400225, "transaction": null } {code}
> my sql is 
>  
> {code:java}
> CREATE TABLE t
> (
> before ROW (
> status varchar (32)
> ),
> after ROW (
> status varchar (32)
> ),
> ts_msbigint,
> op   string,
> kafka_timestamp  timestamp METADATA FROM 'timestamp',
> -- @formatter:off
> proctime AS PROCTIME()
> -- @formatter:on
> ) WITH (
> 'connector' = 'kafka',
> -- 'topic' = '',
> 'topic' = 'test',
> 'properties.bootstrap.servers' = ' ',
> 'properties.group.id' = '',
> 'format' = 'json',
> 'scan.topic-partition-discovery.interval' = '60s',
> 'scan.startup.mode' = 'earliest-offset',
> 'json.ignore-parse-errors' = 'true'
>  );
> create table p
> (
> status  STRING ,
> before_status  STRING ,
> after_status  STRING ,
> metadata_operation  STRING COMMENT '源记录操作类型',
> dt  STRING
> )WITH (
> 'connector' = 'print'
>  );
> INSERT INTO p
> SELECT
>IF(op <> 'd', after.status, before.status),
> before.status,
> after.status,
>op AS metadata_operation,
>DATE_FORMAT(kafka_timestamp, '-MM-dd') AS dt
> FROM t;
>  {code}
>  my local env output is 
>  
>  
> {code:java}
> +I[null, sent, succeed, u, 2023-02-08] {code}
>  
>  my produtionc env output is 
> {code:java}
> +I[sent, sent, succeed, u, 2023-02-08]  {code}
> why?  
> This look like a bug.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30857) Create table does not create topic with multiple partitions

2023-02-12 Thread Yao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687684#comment-17687684
 ] 

Yao Zhang commented on FLINK-30857:
---

Hi [~lzljs3620320] ,

As the latest hotfix removed managed table related method in 
LogStoreFactoryTable, it seems that it is not planned to support managed table 
in the future. Should we support creating Kafka topic explicitly with specified 
replication factor/partitions?

> Create table does not create topic with multiple partitions
> ---
>
> Key: FLINK-30857
> URL: https://issues.apache.org/jira/browse/FLINK-30857
> Project: Flink
>  Issue Type: Bug
>  Components: Table Store
>Reporter: Vicky Papavasileiou
>Priority: Major
>
>  
> {code:java}
> CREATE CATALOG table_store_catalog WITH (
>     'type'='table-store',
>     'warehouse'='s3://my-bucket/table-store'
>  );
> USE CATALOG table_store_catalog;
> SET 'execution.checkpointing.interval' = '10 s';
> CREATE TABLE word_count_kafka (
>      word STRING PRIMARY KEY NOT ENFORCED,
>      cnt BIGINT
>  ) WITH (
>      'log.system' = 'kafka',
>      'kafka.bootstrap.servers' = 'broker:9092',
>      'kafka.topic' = 'word_count_log',
>  'bucket'='4'
>  );
> {code}
>  
> The created topic has only one partition
> {code:java}
> Topic: word_count_log    TopicId: udeJwBIkRsSybkf1EerphA    PartitionCount: 1 
>    ReplicationFactor: 1    Configs:
>     Topic: word_count_log    Partition: 0    Leader: 1    Replicas: 1    Isr: 
> 1{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30274) Upgrade commons-collections 3.x to commons-collections4

2023-02-12 Thread Ran Tao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687685#comment-17687685
 ] 

Ran Tao commented on FLINK-30274:
-

[~martijnvisser] I have updated the pr, PTAL if your are free. thanks.

> Upgrade commons-collections 3.x to commons-collections4
> ---
>
> Key: FLINK-30274
> URL: https://issues.apache.org/jira/browse/FLINK-30274
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Affects Versions: 1.16.0
>Reporter: Ran Tao
>Assignee: Ran Tao
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-12-02-16-40-22-172.png
>
>
> First, Apache commons-collections 3.x is a Java 1.3 compatible version, and 
> it does not use Java 5 generics. Apache commons-collections4 4.4 is an 
> upgraded version of commons-collections and it built by Java 8.
> The Apache Spark has same issue: [https://github.com/apache/spark/pull/35257]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Jane Chan (Jira)

Jane Chan created FLINK-31025:
-

 Summary: Release Testing: Verify FLINK-30650 Introduce EXPLAIN 
PLAN_ADVICE to provide SQL advice
 Key: FLINK-31025
 URL: https://issues.apache.org/jira/browse/FLINK-31025
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Affects Versions: 1.17.0
Reporter: Jane Chan
 Fix For: 1.17.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Jane Chan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jane Chan updated FLINK-31025:
--
Description: 
This ticket aims for verifying FLINK-30650: Adaptive Batch Scheduler should 
also work with hybrid shuffle mode.

More details about this feature and how to use it can be found in this 
[documentation|xxx].

The verification is divided into two parts:

Part I: Verify hybrid shuffle can work with AdaptiveBatchScheduler

Write a simple Flink batch job using hybrid shuffle mode and submit this job. 
Note that in flink-1.17, AdaptiveBatchScheduler is the default scheduler for 
batch job, so you do not need other configuration.

Suppose your job's topology like source -> map -> sink, if your cluster have 
enough slots, you should find that source and map are running at the same time.

Part II: Verify hybrid shuffle can work with Speculative Execution

Write a Flink batch job using hybrid shuffle mode which has a subtask running 
much slower than others (e.g. sleep indefinitely if it runs on a certain host, 
the hostname can be retrieved via InetAddress.getLocalHost().getHostName(), or 
if its (subtaskIndex + attemptNumer) % 2 == 0)

Modify Flink configuration file to enable speculative execution and tune the 
configuration as you like

Submit the job. Checking the web UI, logs, metrics and produced result.

You should find that once a producer task's one subtask finished, all its 
consumer tasks can be scheduled in log.

> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Priority: Blocker
> Fix For: 1.17.0
>
>
> This ticket aims for verifying FLINK-30650: Adaptive Batch Scheduler should 
> also work with hybrid shuffle mode.
> More details about this feature and how to use it can be found in this 
> [documentation|xxx].
> The verification is divided into two parts:
> Part I: Verify hybrid shuffle can work with AdaptiveBatchScheduler
> Write a simple Flink batch job using hybrid shuffle mode and submit this job. 
> Note that in flink-1.17, AdaptiveBatchScheduler is the default scheduler for 
> batch job, so you do not need other configuration.
> Suppose your job's topology like source -> map -> sink, if your cluster have 
> enough slots, you should find that source and map are running at the same 
> time.
> Part II: Verify hybrid shuffle can work with Speculative Execution
> Write a Flink batch job using hybrid shuffle mode which has a subtask running 
> much slower than others (e.g. sleep indefinitely if it runs on a certain 
> host, the hostname can be retrieved via 
> InetAddress.getLocalHost().getHostName(), or if its (subtaskIndex + 
> attemptNumer) % 2 == 0)
> Modify Flink configuration file to enable speculative execution and tune the 
> configuration as you like
> Submit the job. Checking the web UI, logs, metrics and produced result.
> You should find that once a producer task's one subtask finished, all its 
> consumer tasks can be scheduled in log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] fredia commented on pull request #21858: [FLINK-30729][state/changelog] Refactor checkState() to WARN of StateChangelogWriter to avoid confusing IllegalException

2023-02-12 Thread via GitHub



fredia commented on PR #21858:
URL: https://github.com/apache/flink/pull/21858#issuecomment-1427268488

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-29566) Reschedule the cleanup logic if cancel job failed

2023-02-12 Thread Shipeng Xie (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-29566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687686#comment-17687686
 ] 

Shipeng Xie commented on FLINK-29566:
-

Hi [~haoxin], did you start working on this issue? If not, I can help take a 
look. 

> Reschedule the cleanup logic if cancel job failed
> -
>
> Key: FLINK-29566
> URL: https://issues.apache.org/jira/browse/FLINK-29566
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Xin Hao
>Priority: Minor
>  Labels: starter
>
> Currently, when we remove the FlinkSessionJob object,
> we always remove the object even if the Flink job is not being canceled 
> successfully.
>  
> This is *not semantic consistent* if the FlinkSessionJob has been removed but 
> the Flink job is still running.
>  
> One of the scenarios is that if we deploy a FlinkDeployment with HA mode.
> When we remove the FlinkSessionJob and change the FlinkDeployment at the same 
> time,
> or if the TMs are restarting because of some bugs such as OOM.
> Both of these will cause the cancelation of the Flink job to fail because the 
> TMs are not available.
>  
> We should *reschedule* the cleanup logic if the FlinkDeployment is present.
> And we can add a new ReconciliationState DELETING to indicate the 
> FlinkSessionJob's status.
>  
> The logic will be
> {code:java}
> if the FlinkDeployment is not present
> delete the FlinkSessionJob object
> else
> if the JM is not available
>         reschedule
> else
> if cancel job successfully
>             delete the FlinkSessionJob object
> else
>             reschedule{code}
> When we cancel the Flink job, we need to verify all the jobs with the same 
> name have been deleted in case of the job id is changed after JM restarted.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] liming30 commented on pull request #519: [FLINK-31008] Fix the bug that ContinuousFileSplitEnumerator may be out of order when allocating splits.

2023-02-12 Thread via GitHub



liming30 commented on PR #519:
URL: 
https://github.com/apache/flink-table-store/pull/519#issuecomment-1427286817

   @JingsongLi Thanks for your comments, it has been resolved, please help to 
review again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu closed pull request #21903: [FLINK-30981][Python] Fix explain_sql throws method not exist

2023-02-12 Thread via GitHub



dianfu closed pull request #21903: [FLINK-30981][Python] Fix explain_sql throws 
method not exist
URL: https://github.com/apache/flink/pull/21903


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-30981) explain_sql throws java method not exist

2023-02-12 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-30981.
---
Fix Version/s: (was: 1.18.0)
 Assignee: Juntao Hu
   Resolution: Fixed

Fixed in:
- master via a92892fea747f81f0e8a6cd4ec4ee207c95fa625
- release-1.17 via c33ee8decd7733436bec3ed102c6422e9083c558

> explain_sql throws java method not exist
> 
>
> Key: FLINK-30981
> URL: https://issues.apache.org/jira/browse/FLINK-30981
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Juntao Hu
>Assignee: Juntao Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> Execute `t_env.explainSql("ANY VALID SQL")` will throw error:
> {code:java}
> Traceback (most recent call last):
>   File "ISSUE/FLINK-25622.py", line 42, in 
>     main()
>   File "ISSUE/FLINK-25622.py", line 34, in main
>     print(t_env.explain_sql(
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 799, in explain_sql
>     return self._j_tenv.explainSql(stmt, j_extra_details)
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/py4j/java_gateway.py",
>  line 1322, in __call__
>     return_value = get_return_value(
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/util/exceptions.py",
>  line 146, in deco
>     return f(*a, **kw)
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/py4j/protocol.py",
>  line 330, in get_return_value
>     raise Py4JError(
> py4j.protocol.Py4JError: An error occurred while calling o11.explainSql. 
> Trace:
> org.apache.flink.api.python.shaded.py4j.Py4JException: Method 
> explainSql([class java.lang.String, class 
> [Lorg.apache.flink.table.api.ExplainDetail;]) does not exist
>     at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
>     at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329)
>     at 
> org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:274)
>     at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>     at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
>     at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
>     at java.base/java.lang.Thread.run(Thread.java:829) {code}
> [30668|https://issues.apache.org/jira/browse/FLINK-30668] changed 
> TableEnvironment#explainSql to an interface default method, while both 
> TableEnvironmentInternal and TableEnvironmentImpl not overwriting it, it 
> triggers a bug in py4j, see [https://github.com/py4j/py4j/issues/506] .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Jane Chan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jane Chan updated FLINK-31025:
--
Description: 
This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
provide SQL advice.

More details about this feature and how to use it can be found in this 
[documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].

The verification is divided into two parts:

Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under the 
streaming mode, such as a single select/insert/statement set w/ or w/o sub-plan 
reuse (configured by "table.optimizer.reuse-sub-plan-enabled").

This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
{code:sql}
EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
SET BEGIN INSERT INTO ... END]
{code}
You should find the output should be like the following format (note that the 
title is changed to "Optimized Physical Plan with Advice")
{code:sql}
== Abstract Syntax Tree ==
...
 
== Optimized Physical Plan With Advice ==
...
 
 
== Optimized Execution Plan ==
...
{code}
The available advice is attached at the end of "== Optimized Physical Plan With 
Advice ==", and in front of  "== Optimized Execution Plan =="

If switching to batch mode, you should find the "EXPLAIN PLAN_ADVICE" should 
throw UnsupportedOperationException as expected.

 

Part II: Verify the advice content

Write a group aggregate query, and enable/disable the local-global two-phase 
configuration, and test the output.

You should find once the following configurations are enabled, you will get the 
"no available advice..." output.
{code:java}
table.exec.mini-batch.enabled
table.exec.mini-batch.allow-latency
table.exec.mini-batch.size
table.optimizer.agg-phase-strategy {code}

  was:
This ticket aims for verifying FLINK-30650: Adaptive Batch Scheduler should 
also work with hybrid shuffle mode.

More details about this feature and how to use it can be found in this 
[documentation|xxx].

The verification is divided into two parts:

Part I: Verify hybrid shuffle can work with AdaptiveBatchScheduler

Write a simple Flink batch job using hybrid shuffle mode and submit this job. 
Note that in flink-1.17, AdaptiveBatchScheduler is the default scheduler for 
batch job, so you do not need other configuration.

Suppose your job's topology like source -> map -> sink, if your cluster have 
enough slots, you should find that source and map are running at the same time.

Part II: Verify hybrid shuffle can work with Speculative Execution

Write a Flink batch job using hybrid shuffle mode which has a subtask running 
much slower than others (e.g. sleep indefinitely if it runs on a certain host, 
the hostname can be retrieved via InetAddress.getLocalHost().getHostName(), or 
if its (subtaskIndex + attemptNumer) % 2 == 0)

Modify Flink configuration file to enable speculative execution and tune the 
configuration as you like

Submit the job. Checking the web UI, logs, metrics and produced result.

You should find that once a producer task's one subtask finished, all its 
consumer tasks can be scheduled in log.


> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Priority: Blocker
> Fix For: 1.17.0
>
>
> This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
> provide SQL advice.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].
> The verification is divided into two parts:
> Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under 
> the streaming mode, such as a single select/insert/statement set w/ or w/o 
> sub-plan reuse (configured by "table.optimizer.reuse-sub-plan-enabled").
> This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
> {code:sql}
> EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
> SET BEGIN INSERT INTO ... END]
> {code}
> You should find the output should be like the following format (note that the 
> title is changed to "Optimized Physical Plan with Advice")
> {code:sql}
> == Abstract Syntax Tree ==
> ...
>  
> == Optimized Physical Plan With Advice ==
> ...
>  
>  
> == Optimized Execution Plan ==
> ...
> {code}
> The available advice is attached at the end of "== Optimized Physical Plan 
> With Advice ==", and in front of  "== Optimized Execution Plan =="
> If switching to batch

[GitHub] [flink] fsk119 commented on pull request #21500: [FLINK-27995][table] Upgrade Janino version

2023-02-12 Thread via GitHub



fsk119 commented on PR #21500:
URL: https://github.com/apache/flink/pull/21500#issuecomment-1427291971

   @snuyanzin I am not familiar with janino... I think you may need to find 
others to review this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu closed pull request #21899: [FLINK-30969][Python] Fix pandas error when running table example

2023-02-12 Thread via GitHub



dianfu closed pull request #21899: [FLINK-30969][Python] Fix pandas error when 
running table example
URL: https://github.com/apache/flink/pull/21899


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-30969) Pyflink table example throws "module 'pandas' has no attribute 'Int8Dtype'"

2023-02-12 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-30969.
---
Fix Version/s: (was: 1.18.0)
 Assignee: Juntao Hu
   Resolution: Fixed

Fixed in:
- master via c096c03df70648b60b665a09816635b956b201cc
- release-1.17 via 19c05ef0c864644512e5643589fe550f7e281254

> Pyflink table example throws "module 'pandas' has no attribute 'Int8Dtype'"
> ---
>
> Key: FLINK-30969
> URL: https://issues.apache.org/jira/browse/FLINK-30969
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.17.0
>Reporter: Juntao Hu
>Assignee: Juntao Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.17.0
>
>
> After apache-beam is upgraded to 2.43.0 in 1.17, running `python 
> pyflink/examples/table/basic_operations.py` will throw error:
> {code:java}
> Traceback (most recent call last):
>   File "pyflink/examples/table/basic_operations.py", line 484, in 
>     basic_operations()
>   File "pyflink/examples/table/basic_operations.py", line 29, in 
> basic_operations
>     t_env = TableEnvironment.create(EnvironmentSettings.in_streaming_mode())
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 121, in create
>     return TableEnvironment(j_tenv)
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 100, in __init__
>     self._open()
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 1640, in _open
>     startup_loopback_server()
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/table/table_environment.py",
>  line 1631, in startup_loopback_server
>     from pyflink.fn_execution.beam.beam_worker_pool_service import \
>   File 
> "/Users/vancior/Documents/Github/flink-back/flink-python/pyflink/fn_execution/beam/beam_worker_pool_service.py",
>  line 31, in 
>     from apache_beam.options.pipeline_options import DebugOptions
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/__init__.py",
>  line 92, in 
>     from apache_beam import coders
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/coders/__init__.py",
>  line 17, in 
>     from apache_beam.coders.coders import *
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/coders/coders.py",
>  line 59, in 
>     from apache_beam.coders import coder_impl
>   File "apache_beam/coders/coder_impl.py", line 63, in init 
> apache_beam.coders.coder_impl
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/typehints/__init__.py",
>  line 31, in 
>     from apache_beam.typehints.pandas_type_compatibility import *
>   File 
> "/Users/vancior/miniconda3/envs/flink-python/lib/python3.8/site-packages/apache_beam/typehints/pandas_type_compatibility.py",
>  line 81, in 
>     (pd.Int8Dtype(), Optional[np.int8]),
> AttributeError: module 'pandas' has no attribute 'Int8Dtype' {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #192: [FLINK-30451] Add AlgoOperator for Swing

2023-02-12 Thread via GitHub



zhipeng93 commented on code in PR #192:
URL: https://github.com/apache/flink-ml/pull/192#discussion_r1103961097


##
flink-ml-examples/src/main/java/org/apache/flink/ml/examples/recommendation/SwingExample.java:
##
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.examples.recommendation;
+
+import org.apache.flink.ml.recommendation.swing.Swing;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.CloseableIterator;
+
+/** Simple program that creates a Swing instance and uses it to give 
recommendations for items. */

Review Comment:
   nit: ... and uses it to `generate` recommendations for items.



##
flink-ml-lib/src/main/java/org/apache/flink/ml/recommendation/swing/Swing.java:
##
@@ -0,0 +1,415 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.recommendation.swing;
+
+import org.apache.flink.api.common.state.ListState;
+import org.apache.flink.api.common.state.ListStateDescriptor;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.common.typeinfo.Types;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.iteration.operator.OperatorStateUtils;
+import org.apache.flink.ml.api.AlgoOperator;
+import org.apache.flink.ml.common.datastream.TableUtils;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.runtime.state.StateInitializationContext;
+import org.apache.flink.runtime.state.StateSnapshotContext;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
+import org.apache.flink.streaming.api.operators.BoundedOneInput;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.table.catalog.ResolvedSchema;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * An AlgoOperator which implements the Swing algorithm.
+ *
+ * Swing is an item recall algorithm. The topology of user-item graph 
usually can be described as
+ * user-item-user or item-user-item, which are like 'swing'. For example, if 
both user u
+ * and user v have purchased the same commodity i , they 
will form a relationship
+ * diagram similar to a swing. If u and v have purchased 
commodity j in
+ * addition to i, it is supposed i and j are 
similar. The similar

[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #192: [FLINK-30451] Add AlgoOperator for Swing

2023-02-12 Thread via GitHub



zhipeng93 commented on code in PR #192:
URL: https://github.com/apache/flink-ml/pull/192#discussion_r1103974792


##
flink-ml-lib/src/test/java/org/apache/flink/ml/recommendation/SwingTest.java:
##
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.recommendation;
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.ml.recommendation.swing.Swing;
+import org.apache.flink.ml.util.TestUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.collections.IteratorUtils;
+import org.apache.commons.lang3.exception.ExceptionUtils;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+/** Tests {@link Swing}. */
+public class SwingTest {
+@Rule public final TemporaryFolder tempFolder = new TemporaryFolder();
+private StreamExecutionEnvironment env;
+private StreamTableEnvironment tEnv;
+private Table trainData;
+RowTypeInfo trainDataTypeInfo =
+new RowTypeInfo(
+new TypeInformation[] {
+BasicTypeInfo.LONG_TYPE_INFO, 
BasicTypeInfo.LONG_TYPE_INFO
+},
+new String[] {"user_id", "item_id"});
+private static final List trainRows =
+new ArrayList<>(
+Arrays.asList(
+Row.of(0L, 10L),
+Row.of(0L, 11L),
+Row.of(0L, 12L),
+Row.of(1L, 13L),
+Row.of(1L, 12L),
+Row.of(2L, 10L),
+Row.of(2L, 11L),
+Row.of(2L, 12L),
+Row.of(3L, 13L),
+Row.of(3L, 12L)));
+
+private static final List expectedScoreRows =
+new ArrayList<>(
+Arrays.asList(
+Row.of(10L, 
"11,0.058845768947156235;12,0.058845768947156235"),
+Row.of(11L, 
"10,0.058845768947156235;12,0.058845768947156235"),
+Row.of(
+12L,
+
"13,0.09134833828228624;10,0.058845768947156235;11,0.058845768947156235"),
+Row.of(13L, "12,0.09134833828228624")));
+
+@Before
+public void before() {
+env = TestUtils.getExecutionEnvironment();
+tEnv = StreamTableEnvironment.create(env);
+trainData = tEnv.fromDataStream(env.fromCollection(trainRows, 
trainDataTypeInfo));
+}
+
+private void compareResultAndExpected(List results) {
+results.sort((o1, o2) -> Long.compare(o1.getFieldAs(0), 
o2.getFieldAs(0)));
+
+for (int i = 0; i < results.size(); i++) {
+Row result = results.get(i);
+String itemRankScore = result.getFieldAs(1);
+Row expect = expectedScoreRows.get(i);
+assertEquals(expect.getField(0), result.getField(0));
+assertEquals(expect.getField(1), itemRankScore);
+}
+}
+
+@Test
+public void testParam() {
+Swing swing = new Swing();
+
+assertEquals("item", swing.getItemCol());
+assertEquals("user", swing.getUserCol());
+assertEquals(100, swing.getK());
+assertEquals(1000, swing.getMaxUserNumPerItem());
+assertEquals(10, swing.getMinUserBehavior());
+assertEquals(1000, swing.getMaxUserBehavior());
+assertEquals(15, swing.getAlpha1());
+assertEquals(0, swing.getAlpha2(

[GitHub] [flink-ml] zhipeng93 commented on a diff in pull request #192: [FLINK-30451] Add AlgoOperator for Swing

2023-02-12 Thread via GitHub



zhipeng93 commented on code in PR #192:
URL: https://github.com/apache/flink-ml/pull/192#discussion_r1103973578


##
flink-ml-lib/src/main/java/org/apache/flink/ml/recommendation/swing/SwingParams.java:
##
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.recommendation.swing;
+
+import org.apache.flink.ml.common.param.HasOutputCol;
+import org.apache.flink.ml.param.DoubleParam;
+import org.apache.flink.ml.param.IntParam;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.param.ParamValidators;
+import org.apache.flink.ml.param.StringParam;
+import org.apache.flink.ml.param.WithParams;
+
+/**
+ * Params for {@link Swing}.
+ *
+ * @param  The class type of this instance.
+ */
+public interface SwingParams extends WithParams, HasOutputCol {
+Param USER_COL =
+new StringParam("userCol", "Name of user column.", "user", 
ParamValidators.notNull());
+
+Param ITEM_COL =
+new StringParam("itemCol", "Name of item column.", "item", 
ParamValidators.notNull());
+
+Param MAX_USER_NUM_PER_ITEM =
+new IntParam(
+"maxUserNumPerItem",
+"The max number of users that has purchased for each item. 
If the number of users that have "
++ "purchased this item is larger than this value, 
then only maxUserNumPerItem users will "
++ "be sampled and used in the computation logic.",
+1000,
+ParamValidators.gt(0));
+
+Param K =
+new IntParam(
+"k",
+"The max number of similar items to output for each item.",
+100,
+ParamValidators.gt(0));
+
+Param MIN_USER_BEHAVIOR =
+new IntParam(
+"minUserBehavior",
+"The min number of interaction behavior between item and 
user.",
+10,
+ParamValidators.gt(0));
+
+Param MAX_USER_BEHAVIOR =
+new IntParam(
+"maxUserBehavior",
+"The max number of interaction behavior between item and 
user. "
++ "The algorithm filters out activate users.",
+1000,
+ParamValidators.gt(0));
+
+Param ALPHA1 =
+new IntParam(
+"alpha1",
+"This parameter is used to calculate weight of each user. "
++ "The higher alpha1 is, the smaller weight each 
user gets.",
+15,
+ParamValidators.gtEq(0));
+
+Param ALPHA2 =
+new IntParam(
+"alpha2",
+"This parameter is used to calculate similarity of users. "
++ "The higher alpha2 is, the less the similarity 
score is.",
+0,
+ParamValidators.gtEq(0));
+
+Param BETA =
+new DoubleParam(
+"beta",
+"This parameter is used to calculate weight of each user. "

Review Comment:
   How about the following description: 
   `Decay factor for number of users that have purchased one item. The higher 
beta is, the less purchasing behavior contributes to the similarity score.`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-31026) KBinsDiscretizer should gives binEdges wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)

Fan Hong created FLINK-31026:


 Summary: KBinsDiscretizer should gives binEdges wrong bin edges 
when all values are same.
 Key: FLINK-31026
 URL: https://issues.apache.org/jira/browse/FLINK-31026
 Project: Flink
  Issue Type: Bug
  Components: Library / Machine Learning
Reporter: Fan Hong






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31026) KBinsDiscretizer should gives wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Hong updated FLINK-31026:
-
Summary: KBinsDiscretizer should gives wrong bin edges when all values are 
same.  (was: KBinsDiscretizer should gives binEdges wrong bin edges when all 
values are same.)

> KBinsDiscretizer should gives wrong bin edges when all values are same.
> ---
>
> Key: FLINK-31026
> URL: https://issues.apache.org/jira/browse/FLINK-31026
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Machine Learning
>Reporter: Fan Hong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31026) KBinsDiscretizer should gives wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Hong updated FLINK-31026:
-
Description: 
Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE} 
when all values are same.
But the correct bin edges should be \{-Double.MAX_VALUE, Double.MAX_VALUE}.

> KBinsDiscretizer should gives wrong bin edges when all values are same.
> ---
>
> Key: FLINK-31026
> URL: https://issues.apache.org/jira/browse/FLINK-31026
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Machine Learning
>Reporter: Fan Hong
>Priority: Major
>
> Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE} 
> when all values are same.
> But the correct bin edges should be \{-Double.MAX_VALUE, Double.MAX_VALUE}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31026) KBinsDiscretizer should gives wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Hong updated FLINK-31026:
-
Description: 
Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE} 
when all values are same.
However, this bin cannot cover negative values and 0.

  was:
Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE} 
when all values are same.
But the correct bin edges should be \{-Double.MAX_VALUE, Double.MAX_VALUE}.


> KBinsDiscretizer should gives wrong bin edges when all values are same.
> ---
>
> Key: FLINK-31026
> URL: https://issues.apache.org/jira/browse/FLINK-31026
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Machine Learning
>Reporter: Fan Hong
>Priority: Major
>
> Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE} 
> when all values are same.
> However, this bin cannot cover negative values and 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-web] reswqa commented on pull request #607: Add Weijie Guo to the committer list

2023-02-12 Thread via GitHub



reswqa commented on PR #607:
URL: https://github.com/apache/flink-web/pull/607#issuecomment-1427334809

   Thanks @xintongsong for reminding me, I have removed the rebuild commit and 
fixed yangze's incorrect order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] ruanhang1993 commented on a diff in pull request #21889: [FLINK-29801][runtime] FLIP-274: Introduce JobManagerOperatorMetricGroup for the components running on a JobManager of the ope

2023-02-12 Thread via GitHub



ruanhang1993 commented on code in PR #21889:
URL: https://github.com/apache/flink/pull/21889#discussion_r1103993973


##
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/InternalOperatorCoordinatorMetricGroup.java:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.metrics.groups;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.metrics.groups.OperatorCoordinatorMetricGroup;
+
+/** Special {@link org.apache.flink.metrics.MetricGroup} representing an 
OperatorCoordinator. */
+@Internal
+public class InternalOperatorCoordinatorMetricGroup extends 
ProxyMetricGroup
+implements OperatorCoordinatorMetricGroup {
+
+public 
InternalOperatorCoordinatorMetricGroup(JobManagerOperatorMetricGroup parent) {

Review Comment:
   OK, I will apply this change to the other PR, which will introduce the 
internal operator coordinator metric group.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-31021) JavaCodeSplitter doesn't split static method properly

2023-02-12 Thread Xingcan Cui (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687728#comment-17687728
 ] 

Xingcan Cui commented on FLINK-31021:
-

I'm playing with 
[https://github.com/apache/flink/blob/c096c03df70648b60b665a09816635b956b201cc/flink-formats/flink-protobuf/src/main/java/org/apache/flink/formats/protobuf/deserialize/ProtoToRowConverter.java#L98]
 

The generated converter source is too large sometimes. I fixed it locally by 
removing the static keyword for now. FYI [~libenchao] 

It's fine if the code splitter doesn't support static methods. But at least we 
should inform users with a proper message instead of generating incorrect code.

 

> JavaCodeSplitter doesn't split static method properly
> -
>
> Key: FLINK-31021
> URL: https://issues.apache.org/jira/browse/FLINK-31021
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.4, 1.15.3, 1.16.1
>Reporter: Xingcan Cui
>Priority: Minor
>
> The exception while compiling the generated source
> {code:java}
> cause=org.codehaus.commons.compiler.CompileException: Line 3383, Column 90: 
> Instance method "default void 
> org.apache.flink.formats.protobuf.deserialize.GeneratedProtoToRow_655d75db1cf943838f5500013edfba82.decodeImpl(foo.bar.LogData)"
>  cannot be invoked in static context,{code}
> The original method header 
> {code:java}
> public static RowData decode(foo.bar.LogData message){{code}
> The code after split
>  
> {code:java}
> Line 3383: public static RowData decode(foo.bar.LogData message){ 
> decodeImpl(message); return decodeReturnValue$0; } 
> Line 3384:
> Line 3385: void decodeImpl(foo.bar.LogData message) {{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31027) Introduce annotation for table store

2023-02-12 Thread Shammon (Jira)

Shammon created FLINK-31027:
---

 Summary: Introduce annotation for table store
 Key: FLINK-31027
 URL: https://issues.apache.org/jira/browse/FLINK-31027
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Affects Versions: table-store-0.4.0
Reporter: Shammon


Introduce annotation for table store



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31028) Provide different compression methods for per level

2023-02-12 Thread Jun Zhang (Jira)

Jun Zhang created FLINK-31028:
-

 Summary: Provide different compression methods for per level
 Key: FLINK-31028
 URL: https://issues.apache.org/jira/browse/FLINK-31028
 Project: Flink
  Issue Type: New Feature
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Jun Zhang
 Fix For: table-store-0.4.0


Different compression are provided for different levels.

For level 0 ,because the amount of data in this level is not large, we do not 
want to use compression in exchange for better write performance . For normal 
levels, we use lz4 . For the last level, access is generally less and data 
volume is large. we hope to use gzip to reduce space size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31021) JavaCodeSplitter doesn't split static method properly

2023-02-12 Thread Benchao Li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687738#comment-17687738
 ] 

Benchao Li commented on FLINK-31021:


[~xccui] Thanks for involving me.

I totally agree that the generated code should also be split because it will 
hit some JIT optimization limitations when the code grows too large. Do we have 
any issue tracking this?

I'm fine with both way:
- change flink-protobuf to do not use static methods

- change code-splitter to support static methods

> JavaCodeSplitter doesn't split static method properly
> -
>
> Key: FLINK-31021
> URL: https://issues.apache.org/jira/browse/FLINK-31021
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.4, 1.15.3, 1.16.1
>Reporter: Xingcan Cui
>Priority: Minor
>
> The exception while compiling the generated source
> {code:java}
> cause=org.codehaus.commons.compiler.CompileException: Line 3383, Column 90: 
> Instance method "default void 
> org.apache.flink.formats.protobuf.deserialize.GeneratedProtoToRow_655d75db1cf943838f5500013edfba82.decodeImpl(foo.bar.LogData)"
>  cannot be invoked in static context,{code}
> The original method header 
> {code:java}
> public static RowData decode(foo.bar.LogData message){{code}
> The code after split
>  
> {code:java}
> Line 3383: public static RowData decode(foo.bar.LogData message){ 
> decodeImpl(message); return decodeReturnValue$0; } 
> Line 3384:
> Line 3385: void decodeImpl(foo.bar.LogData message) {{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] reswqa commented on pull request #21573: [FLINK-30524] Support full screen viewing on the Logs/Stdout/Thread D…

2023-02-12 Thread via GitHub



reswqa commented on PR #21573:
URL: https://github.com/apache/flink/pull/21573#issuecomment-1427386691

   @yangjunhan Would you mind reviwing this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-30519) Add e2e tests for operator dynamic config

2023-02-12 Thread Gyula Fora (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora reassigned FLINK-30519:
--

Assignee: haiqingchen

> Add e2e tests for operator dynamic config
> -
>
> Key: FLINK-30519
> URL: https://issues.apache.org/jira/browse/FLINK-30519
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: haiqingchen
>Priority: Critical
>  Labels: starter
>
> The dynamic config feature is currently not covered by e2e tests and is 
> subject to accidental regressions, as shown in:
> https://issues.apache.org/jira/browse/FLINK-30329
> We should add an e2e test that covers this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] FangYongs commented on pull request #523: [FLINK-31022] Using new Serializer for table store

2023-02-12 Thread via GitHub



FangYongs commented on PR #523:
URL: 
https://github.com/apache/flink-table-store/pull/523#issuecomment-1427399487

   LGTM +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-31022) Using new Serializer for table store

2023-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31022:
---
Labels: pull-request-available  (was: )

> Using new Serializer for table store
> 
>
> Key: FLINK-31022
> URL: https://issues.apache.org/jira/browse/FLINK-31022
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31026) KBinsDiscretizer gives wrong bin edges when all values are same.

2023-02-12 Thread Fan Hong (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Hong updated FLINK-31026:
-
Summary: KBinsDiscretizer gives wrong bin edges when all values are same.  
(was: KBinsDiscretizer should gives wrong bin edges when all values are same.)

> KBinsDiscretizer gives wrong bin edges when all values are same.
> 
>
> Key: FLINK-31026
> URL: https://issues.apache.org/jira/browse/FLINK-31026
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Machine Learning
>Reporter: Fan Hong
>Priority: Major
>
> Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE} 
> when all values are same.
> However, this bin cannot cover negative values and 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31026) KBinsDiscretizer gives wrong bin edges when all values are same.

2023-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31026:
---
Labels: pull-request-available  (was: )

> KBinsDiscretizer gives wrong bin edges when all values are same.
> 
>
> Key: FLINK-31026
> URL: https://issues.apache.org/jira/browse/FLINK-31026
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Machine Learning
>Reporter: Fan Hong
>Priority: Major
>  Labels: pull-request-available
>
> Current implements gives bin edges of \{Double.MIN_VALUE, Double.MAX_VALUE} 
> when all values are same.
> However, this bin cannot cover negative values and 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-ml] Fanoid opened a new pull request, #211: [FLINK-31026] Fix KBinsDiscretizer when the input values of one column are same

2023-02-12 Thread via GitHub



Fanoid opened a new pull request, #211:
URL: https://github.com/apache/flink-ml/pull/211

   ## What is the purpose of the change
   
   KBinsDiscretizer produces `{Double.MIN_VALUE, Double.MAX_VALUE}` for columns 
whose input values are same which is not correct, as it doesn't cover negative 
values and 0. A more reasonable value should be `{Double.NEGATIVE_INFINITY, 
Double.POSITIVE_INFINITY}`.
   
   ## Brief change log
   
 - Change bin edges for columns whose input values are same to 
`{Double.NEGATIVE_INFINITY, Double.POSITIVE_INFINITY}`.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-31029) KBinsDiscretizer gives wrong bin edges when input data contains only 2 distinct values

2023-02-12 Thread Fan Hong (Jira)

Fan Hong created FLINK-31029:


 Summary: KBinsDiscretizer gives wrong bin edges when input data 
contains only 2 distinct values
 Key: FLINK-31029
 URL: https://issues.apache.org/jira/browse/FLINK-31029
 Project: Flink
  Issue Type: Bug
  Components: Library / Machine Learning
Reporter: Fan Hong






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31029) KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data contains only 2 distinct values

2023-02-12 Thread Fan Hong (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Hong updated FLINK-31029:
-
Summary: KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when 
input data contains only 2 distinct values  (was: KBinsDiscretizer gives wrong 
bin edges when input data contains only 2 distinct values)

> KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data 
> contains only 2 distinct values
> -
>
> Key: FLINK-31029
> URL: https://issues.apache.org/jira/browse/FLINK-31029
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Machine Learning
>Reporter: Fan Hong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-31007) The code generated by the IF function throws NullPointerException

2023-02-12 Thread lincoln lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee reassigned FLINK-31007:
---

Assignee: xzw0223

> The code generated by the IF function throws NullPointerException
> -
>
> Key: FLINK-31007
> URL: https://issues.apache.org/jira/browse/FLINK-31007
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Table SQL / Runtime
>Affects Versions: 1.15.2, 1.15.3
> Environment: {code:java}
> // code placeholder
> final StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(1);
> final StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
> final DataStream tab =
> env.fromCollection(Arrays.asList(
> new Tuple2<>(1L, "a_b_c"),
> new Tuple2<>(-1L, "a_b_c")));
> final Table tableA = tableEnv.fromDataStream(tab);
> tableEnv.executeSql("SELECT if(f0 = -1, '', split_index(f1, '_', 0)) as id 
> FROM " + tableA)
> .print(); {code}
>Reporter: tivanli
>Assignee: xzw0223
>Priority: Major
> Attachments: StreamExecCalc$19.java, image-2023-02-10-17-20-51-619.png
>
>
> Caused by: java.lang.NullPointerException
>     at StreamExecCalc$19.processElement_split1(Unknown Source)
>     at StreamExecCalc$19.processElement(Unknown Source)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>     at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>     at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>     at 
> org.apache.flink.table.runtime.operators.source.InputConversionOperator.processElement(InputConversionOperator.java:128)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>     at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>     at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>     at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:418)
>     at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:513)
>     at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$SwitchingOnClose.collect(StreamSourceContexts.java:103)
>     at 
> org.apache.flink.streaming.api.functions.source.FromElementsFunction.run(FromElementsFunction.java:231)
>     at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
>     at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
>     at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:332)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31007) The code generated by the IF function throws NullPointerException

2023-02-12 Thread lincoln lee (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687750#comment-17687750
 ] 

lincoln lee commented on FLINK-31007:
-

This is a bug for IfCallGen, all operands initialization and null check should 
be processed well.

[~xzw0223] assigned to you, welcome for contributing!

> The code generated by the IF function throws NullPointerException
> -
>
> Key: FLINK-31007
> URL: https://issues.apache.org/jira/browse/FLINK-31007
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Table SQL / Runtime
>Affects Versions: 1.15.2, 1.15.3
> Environment: {code:java}
> // code placeholder
> final StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(1);
> final StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
> final DataStream tab =
> env.fromCollection(Arrays.asList(
> new Tuple2<>(1L, "a_b_c"),
> new Tuple2<>(-1L, "a_b_c")));
> final Table tableA = tableEnv.fromDataStream(tab);
> tableEnv.executeSql("SELECT if(f0 = -1, '', split_index(f1, '_', 0)) as id 
> FROM " + tableA)
> .print(); {code}
>Reporter: tivanli
>Assignee: xzw0223
>Priority: Major
> Attachments: StreamExecCalc$19.java, image-2023-02-10-17-20-51-619.png
>
>
> Caused by: java.lang.NullPointerException
>     at StreamExecCalc$19.processElement_split1(Unknown Source)
>     at StreamExecCalc$19.processElement(Unknown Source)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>     at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>     at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>     at 
> org.apache.flink.table.runtime.operators.source.InputConversionOperator.processElement(InputConversionOperator.java:128)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
>     at 
> org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
>     at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
>     at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
>     at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:418)
>     at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:513)
>     at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$SwitchingOnClose.collect(StreamSourceContexts.java:103)
>     at 
> org.apache.flink.streaming.api.functions.source.FromElementsFunction.run(FromElementsFunction.java:231)
>     at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
>     at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
>     at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:332)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31029) KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data contains only 2 distinct values

2023-02-12 Thread Fan Hong (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Hong updated FLINK-31029:
-
Description: When a input column contains only 2 distinct values, and their 
counts are same, 

> KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data 
> contains only 2 distinct values
> -
>
> Key: FLINK-31029
> URL: https://issues.apache.org/jira/browse/FLINK-31029
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Machine Learning
>Reporter: Fan Hong
>Priority: Major
>
> When a input column contains only 2 distinct values, and their counts are 
> same, 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31029) KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data contains only 2 distinct values

2023-02-12 Thread Fan Hong (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Hong updated FLINK-31029:
-
Description: 
When one input column contains only 2 distinct values and their counts are 
same, KBinsDiscretizer transforms this column to all 0s using `quantile` 
strategy. An example of such column is `[0, 0, 0, 1, 1, 1]`.

When the 2 distinct values have different counts, the transformed values are 
also all 0s, which cannot distinguish them.

  was:When a input column contains only 2 distinct values, and their counts are 
same, 


> KBinsDiscretizer gives wrong bin edges in 'quantile' strategy when input data 
> contains only 2 distinct values
> -
>
> Key: FLINK-31029
> URL: https://issues.apache.org/jira/browse/FLINK-31029
> Project: Flink
>  Issue Type: Bug
>  Components: Library / Machine Learning
>Reporter: Fan Hong
>Priority: Major
>
> When one input column contains only 2 distinct values and their counts are 
> same, KBinsDiscretizer transforms this column to all 0s using `quantile` 
> strategy. An example of such column is `[0, 0, 0, 1, 1, 1]`.
> When the 2 distinct values have different counts, the transformed values are 
> also all 0s, which cannot distinguish them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31030) Support more binary classification evaluation metrics.

2023-02-12 Thread Fan Hong (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Hong updated FLINK-31030:
-
Description: Current `BinaryClassificationEvaluator` only supports 
'areaUnderROC', 'areaUnderPR', 'ks' and 'areaUnderLorenz'. We should support 
more evaluation metrics, some of which are basic ones, e.g., precision, recall, 
F-measure, and so on.  (was: Current `BinaryClassificationEvaluator` only 
supports 
'areaUnderROC', 'areaUnderPR', 'ks' and 'areaUnderLorenz'. We should support 
more evaluation metrics, some of which are basic ones, e.g., precision, recall, 
F-measure, and so on.)

> Support more binary classification evaluation metrics.
> --
>
> Key: FLINK-31030
> URL: https://issues.apache.org/jira/browse/FLINK-31030
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Fan Hong
>Priority: Major
>
> Current `BinaryClassificationEvaluator` only supports 'areaUnderROC', 
> 'areaUnderPR', 'ks' and 'areaUnderLorenz'. We should support more evaluation 
> metrics, some of which are basic ones, e.g., precision, recall, F-measure, 
> and so on.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31030) Support more binary classification evaluation metrics.

2023-02-12 Thread Fan Hong (Jira)

Fan Hong created FLINK-31030:


 Summary: Support more binary classification evaluation metrics.
 Key: FLINK-31030
 URL: https://issues.apache.org/jira/browse/FLINK-31030
 Project: Flink
  Issue Type: Improvement
  Components: Library / Machine Learning
Reporter: Fan Hong


Current `BinaryClassificationEvaluator` only supports 
'areaUnderROC', 'areaUnderPR', 'ks' and 'areaUnderLorenz'. We should support 
more evaluation metrics, some of which are basic ones, e.g., precision, recall, 
F-measure, and so on.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Weijie Guo (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687755#comment-17687755
 ] 

Weijie Guo commented on FLINK-31025:


Thanks [~qingyue] for creating this, I'd like to do this testing work.

> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Priority: Blocker
> Fix For: 1.17.0
>
>
> This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
> provide SQL advice.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].
> The verification is divided into two parts:
> Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under 
> the streaming mode, such as a single select/insert/statement set w/ or w/o 
> sub-plan reuse (configured by "table.optimizer.reuse-sub-plan-enabled").
> This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
> {code:sql}
> EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
> SET BEGIN INSERT INTO ... END]
> {code}
> You should find the output should be like the following format (note that the 
> title is changed to "Optimized Physical Plan with Advice")
> {code:sql}
> == Abstract Syntax Tree ==
> ...
>  
> == Optimized Physical Plan With Advice ==
> ...
>  
>  
> == Optimized Execution Plan ==
> ...
> {code}
> The available advice is attached at the end of "== Optimized Physical Plan 
> With Advice ==", and in front of  "== Optimized Execution Plan =="
> If switching to batch mode, you should find the "EXPLAIN PLAN_ADVICE" should 
> throw UnsupportedOperationException as expected.
>  
> Part II: Verify the advice content
> Write a group aggregate query, and enable/disable the local-global two-phase 
> configuration, and test the output.
> You should find once the following configurations are enabled, you will get 
> the "no available advice..." output.
> {code:java}
> table.exec.mini-batch.enabled
> table.exec.mini-batch.allow-latency
> table.exec.mini-batch.size
> table.optimizer.agg-phase-strategy {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo reassigned FLINK-31025:
--

Assignee: Weijie Guo

> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Assignee: Weijie Guo
>Priority: Blocker
> Fix For: 1.17.0
>
>
> This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
> provide SQL advice.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].
> The verification is divided into two parts:
> Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under 
> the streaming mode, such as a single select/insert/statement set w/ or w/o 
> sub-plan reuse (configured by "table.optimizer.reuse-sub-plan-enabled").
> This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
> {code:sql}
> EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
> SET BEGIN INSERT INTO ... END]
> {code}
> You should find the output should be like the following format (note that the 
> title is changed to "Optimized Physical Plan with Advice")
> {code:sql}
> == Abstract Syntax Tree ==
> ...
>  
> == Optimized Physical Plan With Advice ==
> ...
>  
>  
> == Optimized Execution Plan ==
> ...
> {code}
> The available advice is attached at the end of "== Optimized Physical Plan 
> With Advice ==", and in front of  "== Optimized Execution Plan =="
> If switching to batch mode, you should find the "EXPLAIN PLAN_ADVICE" should 
> throw UnsupportedOperationException as expected.
>  
> Part II: Verify the advice content
> Write a group aggregate query, and enable/disable the local-global two-phase 
> configuration, and test the output.
> You should find once the following configurations are enabled, you will get 
> the "no available advice..." output.
> {code:java}
> table.exec.mini-batch.enabled
> table.exec.mini-batch.allow-latency
> table.exec.mini-batch.size
> table.optimizer.agg-phase-strategy {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31025:
---
Attachment: test1.png

> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Assignee: Weijie Guo
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: test1.png
>
>
> This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
> provide SQL advice.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].
> The verification is divided into two parts:
> Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under 
> the streaming mode, such as a single select/insert/statement set w/ or w/o 
> sub-plan reuse (configured by "table.optimizer.reuse-sub-plan-enabled").
> This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
> {code:sql}
> EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
> SET BEGIN INSERT INTO ... END]
> {code}
> You should find the output should be like the following format (note that the 
> title is changed to "Optimized Physical Plan with Advice")
> {code:sql}
> == Abstract Syntax Tree ==
> ...
>  
> == Optimized Physical Plan With Advice ==
> ...
>  
>  
> == Optimized Execution Plan ==
> ...
> {code}
> The available advice is attached at the end of "== Optimized Physical Plan 
> With Advice ==", and in front of  "== Optimized Execution Plan =="
> If switching to batch mode, you should find the "EXPLAIN PLAN_ADVICE" should 
> throw UnsupportedOperationException as expected.
>  
> Part II: Verify the advice content
> Write a group aggregate query, and enable/disable the local-global two-phase 
> configuration, and test the output.
> You should find once the following configurations are enabled, you will get 
> the "no available advice..." output.
> {code:java}
> table.exec.mini-batch.enabled
> table.exec.mini-batch.allow-latency
> table.exec.mini-batch.size
> table.optimizer.agg-phase-strategy {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31025:
---
Attachment: image-2023-02-13-14-48-53-758.png

> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Assignee: Weijie Guo
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-13-14-48-53-758.png, test1.png
>
>
> This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
> provide SQL advice.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].
> The verification is divided into two parts:
> Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under 
> the streaming mode, such as a single select/insert/statement set w/ or w/o 
> sub-plan reuse (configured by "table.optimizer.reuse-sub-plan-enabled").
> This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
> {code:sql}
> EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
> SET BEGIN INSERT INTO ... END]
> {code}
> You should find the output should be like the following format (note that the 
> title is changed to "Optimized Physical Plan with Advice")
> {code:sql}
> == Abstract Syntax Tree ==
> ...
>  
> == Optimized Physical Plan With Advice ==
> ...
>  
>  
> == Optimized Execution Plan ==
> ...
> {code}
> The available advice is attached at the end of "== Optimized Physical Plan 
> With Advice ==", and in front of  "== Optimized Execution Plan =="
> If switching to batch mode, you should find the "EXPLAIN PLAN_ADVICE" should 
> throw UnsupportedOperationException as expected.
>  
> Part II: Verify the advice content
> Write a group aggregate query, and enable/disable the local-global two-phase 
> configuration, and test the output.
> You should find once the following configurations are enabled, you will get 
> the "no available advice..." output.
> {code:java}
> table.exec.mini-batch.enabled
> table.exec.mini-batch.allow-latency
> table.exec.mini-batch.size
> table.optimizer.agg-phase-strategy {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] fsk119 commented on a diff in pull request #21912: [FLINK-28658][docs] Add docs for job statements

2023-02-12 Thread via GitHub



fsk119 commented on code in PR #21912:
URL: https://github.com/apache/flink/pull/21912#discussion_r1104041538


##
docs/content/docs/dev/table/sql/show.md:
##
@@ -747,4 +748,14 @@ Show all added jars in the session classloader which are 
added by [`ADD JAR`]({{
 
 Attention Currently `SHOW JARS` only 
works in the [SQL CLI]({{< ref "docs/dev/table/sqlClient" >}}).
 
+## SHOW JOBS
+
+```sql
+SHOW JOBS
+```
+
+Show the jobs in the Flink cluster.
+
+Attention Currently `SHOW JARS` only 
works in the [SQL CLI]({{< ref "docs/dev/table/sqlClient" >}}).

Review Comment:
   SHOW JARS/SHOW JOBS both work in the SQL Gateway && SQL Client



##
docs/content/docs/dev/table/sqlClient.md:
##
@@ -812,4 +812,35 @@ Flink SQL> RESET pipeline.name;
 
 If the option `pipeline.name` is not specified, SQL Client will generate a 
default name for the submitted job, e.g. `insert-into_` for 
`INSERT INTO` statements.
 
+### Monitoring job status
+
+SQL Client supports to list jobs status in the cluster through `SHOW JOBS` 
statements.
+
+```sql
+Flink SQL> SHOW JOBS;
++--+---+--+-+
+|   job id |  job name |   status |  
start time |
++--+---+--+-+
+| 228d70913eab60dda85c5e7f78b5782c | kafka-to-hive |  RUNNING | 
2023-02-11T05:03:51.523 |
++--+---+--+-+
+```
+
+### Terminating a job
+
+SQL client supports to stop jobs with or without savepoints through `STOP JOB` 
statements.

Review Comment:
   Use `SQL Client` rather than `SQL client` to align with the style.



##
docs/content/docs/dev/table/sqlClient.md:
##
@@ -812,4 +812,35 @@ Flink SQL> RESET pipeline.name;
 
 If the option `pipeline.name` is not specified, SQL Client will generate a 
default name for the submitted job, e.g. `insert-into_` for 
`INSERT INTO` statements.
 
+### Monitoring job status

Review Comment:
   Please also modify the expression about the management of the job lifecycle 
here.
   
![image](https://user-images.githubusercontent.com/33114724/218389489-6b17e64b-0f52-4e53-b84b-2ab5e06ca7af.png)
   



##
docs/content/docs/dev/table/sqlClient.md:
##
@@ -812,4 +812,35 @@ Flink SQL> RESET pipeline.name;
 
 If the option `pipeline.name` is not specified, SQL Client will generate a 
default name for the submitted job, e.g. `insert-into_` for 
`INSERT INTO` statements.
 
+### Monitoring job status

Review Comment:
   I think we can merge these two parts into one section like `Managing job 
lifecycle` and link to the specified page about this.



##
docs/content/docs/dev/table/sql/jar.md:
##
@@ -94,6 +94,6 @@ REMOVE JAR '.jar'
 
 Remove the specified jar that is added by the [`ADD JAR`](#add-jar) statements.
 
-Attention REMOVE JAR statement only 
work in the [SQL CLI]({{< ref "docs/dev/table/sqlClient" >}}).
+Attention REMOVE JAR statements only 
work in [SQL CLI]({{< ref "docs/dev/table/sqlClient" >}}) or [SQL Gateway]({{< 
ref "docs/dev/table/sql-gateway/overview" >}}).

Review Comment:
   SQL Gateway still doesn't support REMOVE JAR now. I think we can use the 
origin statement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-31031) Disable the output buffer of Python process to make it more convenient for interactive users

2023-02-12 Thread Dian Fu (Jira)

Dian Fu created FLINK-31031:
---

 Summary: Disable the output buffer of Python process to make it 
more convenient for interactive users 
 Key: FLINK-31031
 URL: https://issues.apache.org/jira/browse/FLINK-31031
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Weijie Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-31025:
---
Attachment: image-2023-02-13-14-52-41-855.png

> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Assignee: Weijie Guo
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-13-14-48-53-758.png, 
> image-2023-02-13-14-52-41-855.png, test1.png
>
>
> This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
> provide SQL advice.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].
> The verification is divided into two parts:
> Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under 
> the streaming mode, such as a single select/insert/statement set w/ or w/o 
> sub-plan reuse (configured by "table.optimizer.reuse-sub-plan-enabled").
> This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
> {code:sql}
> EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
> SET BEGIN INSERT INTO ... END]
> {code}
> You should find the output should be like the following format (note that the 
> title is changed to "Optimized Physical Plan with Advice")
> {code:sql}
> == Abstract Syntax Tree ==
> ...
>  
> == Optimized Physical Plan With Advice ==
> ...
>  
>  
> == Optimized Execution Plan ==
> ...
> {code}
> The available advice is attached at the end of "== Optimized Physical Plan 
> With Advice ==", and in front of  "== Optimized Execution Plan =="
> If switching to batch mode, you should find the "EXPLAIN PLAN_ADVICE" should 
> throw UnsupportedOperationException as expected.
>  
> Part II: Verify the advice content
> Write a group aggregate query, and enable/disable the local-global two-phase 
> configuration, and test the output.
> You should find once the following configurations are enabled, you will get 
> the "no available advice..." output.
> {code:java}
> table.exec.mini-batch.enabled
> table.exec.mini-batch.allow-latency
> table.exec.mini-batch.size
> table.optimizer.agg-phase-strategy {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] ruanhang1993 commented on pull request #21889: [FLINK-29801][runtime] FLIP-274: Introduce metric group for OperatorCoordinator

2023-02-12 Thread via GitHub



ruanhang1993 commented on PR #21889:
URL: https://github.com/apache/flink/pull/21889#issuecomment-1427435469

   Hi, @zentol ,
   I decide not to split it into two PRs and I have pushed the changes. Please 
review it again if you have time. Thanks~ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Weijie Guo (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687761#comment-17687761
 ] 

Weijie Guo commented on FLINK-31025:


I verified this feature according to the description, and everything seems to 
meet expectations.

 

Verification steps:

I just create a simple table like 
```
CREATE TABLE MyTable(
f1 INT,
f2 STRING
) WITH (
'connector' = 'filesystem',
'path' = '/data.csv',
'format' = 'csv'
);
```
1. Execute `EXPLAIN PLAN_ADVICE SELECT max(f1) FROM MyTable GROUP BY f2;`, the 
results are shown in the figure below:
!test1.png|width=563,height=326!

We can see the advice before `Optimized Execution Plan`.

2. Set runtime-mode to batch, re-execute the explain cmd, the results are shown 
in the figure below:

!image-2023-02-13-14-48-53-758.png|width=573,height=99!

3. Set parameters as follow:

```
SET 'table.exec.mini-batch.enabled' = 'true';
SET 'table.exec.mini-batch.allow-latency' = '5s';
SET 'table.exec.mini-batch.size' = '200';
SET 'table.optimizer.agg-phase-strategy' = 'AUTO';
```

re-execute the explain cmd, the result are shown in the figure blow:

!image-2023-02-13-14-52-41-855.png|width=475,height=244!

> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Assignee: Weijie Guo
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-13-14-48-53-758.png, 
> image-2023-02-13-14-52-41-855.png, test1.png
>
>
> This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
> provide SQL advice.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].
> The verification is divided into two parts:
> Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under 
> the streaming mode, such as a single select/insert/statement set w/ or w/o 
> sub-plan reuse (configured by "table.optimizer.reuse-sub-plan-enabled").
> This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
> {code:sql}
> EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
> SET BEGIN INSERT INTO ... END]
> {code}
> You should find the output should be like the following format (note that the 
> title is changed to "Optimized Physical Plan with Advice")
> {code:sql}
> == Abstract Syntax Tree ==
> ...
>  
> == Optimized Physical Plan With Advice ==
> ...
>  
>  
> == Optimized Execution Plan ==
> ...
> {code}
> The available advice is attached at the end of "== Optimized Physical Plan 
> With Advice ==", and in front of  "== Optimized Execution Plan =="
> If switching to batch mode, you should find the "EXPLAIN PLAN_ADVICE" should 
> throw UnsupportedOperationException as expected.
>  
> Part II: Verify the advice content
> Write a group aggregate query, and enable/disable the local-global two-phase 
> configuration, and test the output.
> You should find once the following configurations are enabled, you will get 
> the "no available advice..." output.
> {code:java}
> table.exec.mini-batch.enabled
> table.exec.mini-batch.allow-latency
> table.exec.mini-batch.size
> table.optimizer.agg-phase-strategy {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-31031) Disable the output buffer of Python process to make it more convenient for interactive users

2023-02-12 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-31031:

Description: See 
[https://apache-flink.slack.com/archives/C03G7LJTS2G/p1676270127585559?thread_ts=1676238623.588979&cid=C03G7LJTS2G]
 for more details.

> Disable the output buffer of Python process to make it more convenient for 
> interactive users 
> -
>
> Key: FLINK-31031
> URL: https://issues.apache.org/jira/browse/FLINK-31031
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>
> See 
> [https://apache-flink.slack.com/archives/C03G7LJTS2G/p1676270127585559?thread_ts=1676238623.588979&cid=C03G7LJTS2G]
>  for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] liming30 commented on a diff in pull request #21503: [FLINK-30251] Move the IO with DFS during abort checkpoint to an asynchronous thread pool.

2023-02-12 Thread via GitHub



liming30 commented on code in PR #21503:
URL: https://github.com/apache/flink/pull/21503#discussion_r1104060745


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SubtaskCheckpointCoordinatorImpl.java:
##
@@ -177,6 +182,14 @@ class SubtaskCheckpointCoordinatorImpl implements 
SubtaskCheckpointCoordinator {
 this.checkpoints = new HashMap<>();
 this.lock = new Object();
 this.asyncOperationsThreadPool = 
checkNotNull(asyncOperationsThreadPool);
+this.asyncDisposeThreadPool =
+new ThreadPoolExecutor(
+0,
+4,
+60L,
+TimeUnit.SECONDS,
+new LinkedBlockingQueue<>(),
+new ExecutorThreadFactory("AsyncDispose"));

Review Comment:
   @pnowojski Resolved according to the comments, please review it again if you 
have time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] reswqa opened a new pull request, #21913: [hotfix] Fix typo in explain.md

2023-02-12 Thread via GitHub



reswqa opened a new pull request, #21913:
URL: https://github.com/apache/flink/pull/21913

   ## What is the purpose of the change
   
   *Fix typo in explain.md.*
   
   
   ## Brief change log
   
 - *Fix typo(`OHE_PHASE` -> `ONE_PHASE`) in explain.md.*
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Weijie Guo (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687766#comment-17687766
 ] 

Weijie Guo commented on FLINK-31025:


BTW. I also found a type in `explain.md`, I have created 
`https://github.com/apache/flink/pull/21913` to fix this, could you help 
reviewing this?

> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Assignee: Weijie Guo
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-13-14-48-53-758.png, 
> image-2023-02-13-14-52-41-855.png, test1.png
>
>
> This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
> provide SQL advice.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].
> The verification is divided into two parts:
> Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under 
> the streaming mode, such as a single select/insert/statement set w/ or w/o 
> sub-plan reuse (configured by "table.optimizer.reuse-sub-plan-enabled").
> This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
> {code:sql}
> EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
> SET BEGIN INSERT INTO ... END]
> {code}
> You should find the output should be like the following format (note that the 
> title is changed to "Optimized Physical Plan with Advice")
> {code:sql}
> == Abstract Syntax Tree ==
> ...
>  
> == Optimized Physical Plan With Advice ==
> ...
>  
>  
> == Optimized Execution Plan ==
> ...
> {code}
> The available advice is attached at the end of "== Optimized Physical Plan 
> With Advice ==", and in front of  "== Optimized Execution Plan =="
> If switching to batch mode, you should find the "EXPLAIN PLAN_ADVICE" should 
> throw UnsupportedOperationException as expected.
>  
> Part II: Verify the advice content
> Write a group aggregate query, and enable/disable the local-global two-phase 
> configuration, and test the output.
> You should find once the following configurations are enabled, you will get 
> the "no available advice..." output.
> {code:java}
> table.exec.mini-batch.enabled
> table.exec.mini-batch.allow-latency
> table.exec.mini-batch.size
> table.optimizer.agg-phase-strategy {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-31025) Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide SQL advice

2023-02-12 Thread Weijie Guo (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-31025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687766#comment-17687766
 ] 

Weijie Guo edited comment on FLINK-31025 at 2/13/23 7:04 AM:
-

BTW. I also found a typo in `explain.md`, I have created 
`[https://github.com/apache/flink/pull/21913]` to fix this, could you help 
reviewing this?


was (Author: weijie guo):
BTW. I also found a type in `explain.md`, I have created 
`https://github.com/apache/flink/pull/21913` to fix this, could you help 
reviewing this?

> Release Testing: Verify FLINK-30650 Introduce EXPLAIN PLAN_ADVICE to provide 
> SQL advice
> ---
>
> Key: FLINK-31025
> URL: https://issues.apache.org/jira/browse/FLINK-31025
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.17.0
>Reporter: Jane Chan
>Assignee: Weijie Guo
>Priority: Blocker
> Fix For: 1.17.0
>
> Attachments: image-2023-02-13-14-48-53-758.png, 
> image-2023-02-13-14-52-41-855.png, test1.png
>
>
> This ticket aims for verifying FLINK-30650: Introduce EXPLAIN PLAN_ADVICE to 
> provide SQL advice.
> More details about this feature and how to use it can be found in this 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/explain/].
> The verification is divided into two parts:
> Part I: Verify "EXPLAIN PLAN_ADVICE" can work with different queries under 
> the streaming mode, such as a single select/insert/statement set w/ or w/o 
> sub-plan reuse (configured by "table.optimizer.reuse-sub-plan-enabled").
> This indicates once specifying the ExplainDetail as "PLAN_ADVICE"
> {code:sql}
> EXPLAIN PLAN_ADVICE [SELECT ... FROM ...| INSERT INTO ...| EXECUTE STATEMENT 
> SET BEGIN INSERT INTO ... END]
> {code}
> You should find the output should be like the following format (note that the 
> title is changed to "Optimized Physical Plan with Advice")
> {code:sql}
> == Abstract Syntax Tree ==
> ...
>  
> == Optimized Physical Plan With Advice ==
> ...
>  
>  
> == Optimized Execution Plan ==
> ...
> {code}
> The available advice is attached at the end of "== Optimized Physical Plan 
> With Advice ==", and in front of  "== Optimized Execution Plan =="
> If switching to batch mode, you should find the "EXPLAIN PLAN_ADVICE" should 
> throw UnsupportedOperationException as expected.
>  
> Part II: Verify the advice content
> Write a group aggregate query, and enable/disable the local-global two-phase 
> configuration, and test the output.
> You should find once the following configurations are enabled, you will get 
> the "no available advice..." output.
> {code:java}
> table.exec.mini-batch.enabled
> table.exec.mini-batch.allow-latency
> table.exec.mini-batch.size
> table.optimizer.agg-phase-strategy {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-kubernetes-operator] gaborgsomogyi commented on pull request #530: [FLINK-30757] Upgrade busybox version to a pinned version for operator

2023-02-12 Thread via GitHub



gaborgsomogyi commented on PR #530:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/530#issuecomment-1427448370

   cc @gyfora @mbalassi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] JingsongLi commented on pull request #526: [FLINK-31027] Introduce annotations for table store

2023-02-12 Thread via GitHub



JingsongLi commented on PR #526:
URL: 
https://github.com/apache/flink-table-store/pull/526#issuecomment-1427449782

   Thanks @FangYongs for the contribution.
   - `VisibleForTesting` looks good to me. 
   - `Documentation` is introduced in 
https://github.com/apache/flink-table-store/pull/524
   - `Internal` I think we don't need to introduce this one. It is meaningless, 
all classes are internal by default. We can just remove `@Internal`s.
   - `Public` and `Experimental`, we don't need to introduce these two, we 
don't have public api now. Maybe we will have different ideas in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-31027) Introduce annotation for table store

2023-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31027:
---
Labels: pull-request-available  (was: )

> Introduce annotation for table store
> 
>
> Key: FLINK-31027
> URL: https://issues.apache.org/jira/browse/FLINK-31027
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table Store
>Affects Versions: table-store-0.4.0
>Reporter: Shammon
>Priority: Major
>  Labels: pull-request-available
>
> Introduce annotation for table store



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-30966) Flink SQL IF FUNCTION logic error

2023-02-12 Thread lincoln lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee reassigned FLINK-30966:
---

Assignee: Shuiqiang Chen

> Flink SQL IF FUNCTION logic error
> -
>
> Key: FLINK-30966
> URL: https://issues.apache.org/jira/browse/FLINK-30966
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.16.1
>Reporter: 谢波
>Assignee: Shuiqiang Chen
>Priority: Major
>
> my data is 
> {code:java}
> //
> { "before": { "status": "sent" }, "after": { "status": "succeed" }, "op": 
> "u", "ts_ms": 1671926400225, "transaction": null } {code}
> my sql is 
>  
> {code:java}
> CREATE TABLE t
> (
> before ROW (
> status varchar (32)
> ),
> after ROW (
> status varchar (32)
> ),
> ts_msbigint,
> op   string,
> kafka_timestamp  timestamp METADATA FROM 'timestamp',
> -- @formatter:off
> proctime AS PROCTIME()
> -- @formatter:on
> ) WITH (
> 'connector' = 'kafka',
> -- 'topic' = '',
> 'topic' = 'test',
> 'properties.bootstrap.servers' = ' ',
> 'properties.group.id' = '',
> 'format' = 'json',
> 'scan.topic-partition-discovery.interval' = '60s',
> 'scan.startup.mode' = 'earliest-offset',
> 'json.ignore-parse-errors' = 'true'
>  );
> create table p
> (
> status  STRING ,
> before_status  STRING ,
> after_status  STRING ,
> metadata_operation  STRING COMMENT '源记录操作类型',
> dt  STRING
> )WITH (
> 'connector' = 'print'
>  );
> INSERT INTO p
> SELECT
>IF(op <> 'd', after.status, before.status),
> before.status,
> after.status,
>op AS metadata_operation,
>DATE_FORMAT(kafka_timestamp, '-MM-dd') AS dt
> FROM t;
>  {code}
>  my local env output is 
>  
>  
> {code:java}
> +I[null, sent, succeed, u, 2023-02-08] {code}
>  
>  my produtionc env output is 
> {code:java}
> +I[sent, sent, succeed, u, 2023-02-08]  {code}
> why?  
> This look like a bug.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink] flinkbot commented on pull request #21913: [hotfix] Fix typo in explain.md

2023-02-12 Thread via GitHub



flinkbot commented on PR #21913:
URL: https://github.com/apache/flink/pull/21913#issuecomment-1427459585

   
   ## CI report:
   
   * 6a8d8ce72f8f9cad0907a1a057ff53b9d7cbe86e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-30966) Flink SQL IF FUNCTION logic error

2023-02-12 Thread lincoln lee (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687768#comment-17687768
 ] 

lincoln lee commented on FLINK-30966:
-

[~csq] assigned to you

> Flink SQL IF FUNCTION logic error
> -
>
> Key: FLINK-30966
> URL: https://issues.apache.org/jira/browse/FLINK-30966
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0, 1.16.1
>Reporter: 谢波
>Assignee: Shuiqiang Chen
>Priority: Major
>
> my data is 
> {code:java}
> //
> { "before": { "status": "sent" }, "after": { "status": "succeed" }, "op": 
> "u", "ts_ms": 1671926400225, "transaction": null } {code}
> my sql is 
>  
> {code:java}
> CREATE TABLE t
> (
> before ROW (
> status varchar (32)
> ),
> after ROW (
> status varchar (32)
> ),
> ts_msbigint,
> op   string,
> kafka_timestamp  timestamp METADATA FROM 'timestamp',
> -- @formatter:off
> proctime AS PROCTIME()
> -- @formatter:on
> ) WITH (
> 'connector' = 'kafka',
> -- 'topic' = '',
> 'topic' = 'test',
> 'properties.bootstrap.servers' = ' ',
> 'properties.group.id' = '',
> 'format' = 'json',
> 'scan.topic-partition-discovery.interval' = '60s',
> 'scan.startup.mode' = 'earliest-offset',
> 'json.ignore-parse-errors' = 'true'
>  );
> create table p
> (
> status  STRING ,
> before_status  STRING ,
> after_status  STRING ,
> metadata_operation  STRING COMMENT '源记录操作类型',
> dt  STRING
> )WITH (
> 'connector' = 'print'
>  );
> INSERT INTO p
> SELECT
>IF(op <> 'd', after.status, before.status),
> before.status,
> after.status,
>op AS metadata_operation,
>DATE_FORMAT(kafka_timestamp, '-MM-dd') AS dt
> FROM t;
>  {code}
>  my local env output is 
>  
>  
> {code:java}
> +I[null, sent, succeed, u, 2023-02-08] {code}
>  
>  my produtionc env output is 
> {code:java}
> +I[sent, sent, succeed, u, 2023-02-08]  {code}
> why?  
> This look like a bug.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-30559) May get wrong result for `if` expression if it's string data type

2023-02-12 Thread lincoln lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee reassigned FLINK-30559:
---

Assignee: miamiaoxyz

> May get wrong result for `if` expression if it's string data type
> -
>
> Key: FLINK-30559
> URL: https://issues.apache.org/jira/browse/FLINK-30559
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: luoyuxia
>Assignee: miamiaoxyz
>Priority: Major
>  Labels: pull-request-available
>
> Can be reproduced by the folowing code in 
> `org.apache.flink.table.planner.runtime.batch.sql.CalcITCase`
>  
> {code:java}
> checkResult("SELECT if(b > 10, 'ua', c) from Table3", data3) {code}
> The actual result is [co, He, He, ...].
> Seems it will only get the first two characters.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] FangYongs commented on pull request #526: [FLINK-31027] Introduce annotations for table store

2023-02-12 Thread via GitHub



FangYongs commented on PR #526:
URL: 
https://github.com/apache/flink-table-store/pull/526#issuecomment-1427476456

   > 
   
   Sounds good to me, then we should add `VisibleForTesting` to 
`flink-table-store-common` only, I'll updated this PR later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-connector-pulsar] syhily opened a new pull request, #26: [FLINK-30489][Connector/Pulsar] Shade all the dependencies in flink-sql-connector-pulsar.

2023-02-12 Thread via GitHub



syhily opened a new pull request, #26:
URL: https://github.com/apache/flink-connector-pulsar/pull/26

   This is a PR after https://github.com/apache/flink-connector-pulsar/pull/25, 
but we split them for speeding up the reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-30489) flink-sql-connector-pulsar doesn't shade all dependencies

2023-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-30489:
---
Labels: pull-request-available  (was: )

> flink-sql-connector-pulsar doesn't shade all dependencies
> -
>
> Key: FLINK-30489
> URL: https://issues.apache.org/jira/browse/FLINK-30489
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.16.0
>Reporter: Henri Yandell
>Priority: Major
>  Labels: pull-request-available
>
> Looking at 
> [https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-pulsar/1.16.0/flink-sql-connector-pulsar-1.16.0.jar]
>  I'm seeing that some dependencies are shaded (com.fasterxml, com.yahoo etc), 
> but others are not (org.sfl4j, org.bouncycastel, com.scurrilous, ...) and 
> will presumably clash with other jar files.
> Additionally, this bundling is going on in the '.jar' file rather than in a 
> more clearly indicated separate -bundle or -shaded jar file. 
> As a jar file this seems confusing and potentially bug inducing; though I 
> note I'm just a review of the jar and not Flink experienced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30489) flink-sql-connector-pulsar doesn't shade all dependencies

2023-02-12 Thread Yufan Sheng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687779#comment-17687779
 ] 

Yufan Sheng commented on FLINK-30489:
-

[~tison] Can you assign this issue to me?

> flink-sql-connector-pulsar doesn't shade all dependencies
> -
>
> Key: FLINK-30489
> URL: https://issues.apache.org/jira/browse/FLINK-30489
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.16.0
>Reporter: Henri Yandell
>Priority: Major
>  Labels: pull-request-available
>
> Looking at 
> [https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-pulsar/1.16.0/flink-sql-connector-pulsar-1.16.0.jar]
>  I'm seeing that some dependencies are shaded (com.fasterxml, com.yahoo etc), 
> but others are not (org.sfl4j, org.bouncycastel, com.scurrilous, ...) and 
> will presumably clash with other jar files.
> Additionally, this bundling is going on in the '.jar' file rather than in a 
> more clearly indicated separate -bundle or -shaded jar file. 
> As a jar file this seems confusing and potentially bug inducing; though I 
> note I'm just a review of the jar and not Flink experienced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [flink-table-store] JingsongLi merged pull request #523: [FLINK-31022] Using new Serializer for table store

2023-02-12 Thread via GitHub



JingsongLi merged PR #523:
URL: https://github.com/apache/flink-table-store/pull/523


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-31022) Using new Serializer for table store

2023-02-12 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-31022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-31022.

Resolution: Fixed

master: 5b6ff938922aa067b59a0132c62e6cd675680b63

> Using new Serializer for table store
> 
>
> Key: FLINK-31022
> URL: https://issues.apache.org/jira/browse/FLINK-31022
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

99 matches

Mail list logo