date:20210909



flinkbot edited a comment on pull request #17188:
URL: https://github.com/apache/flink/pull/17188#issuecomment-914560045


   
   ## CI report:
   
   * 12809bb05f44589b9dde53d53397a06c3d6a03c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23875)
 
   * e45b87d2011fe81949b65094ebbb475423640bc1 UNKNOWN
   * 287cf49f0032a03a59234f5330e6171bad2fbf67 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23895)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zentol commented on a change in pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



zentol commented on a change in pull request #17227:
URL: https://github.com/apache/flink/pull/17227#discussion_r705915129



##
File path: flink-core/src/test/java/org/apache/flink/util/NetUtilsTest.java
##
@@ -61,25 +56,45 @@ public void testParseHostPortAddress() {
 @Test
 public void testAcceptWithoutTimeout() throws IOException {
 // Validates that acceptWithoutTimeout suppresses all 
SocketTimeoutExceptions
-ServerSocket serverSocket = mock(ServerSocket.class);
-when(serverSocket.accept())
-.thenAnswer(
-new Answer() {
-private int count = 0;
-
-@Override
-public Socket answer(InvocationOnMock 
invocationOnMock)
-throws Throwable {
-if (count < 2) {
-count++;
-throw new SocketTimeoutException();
-}
-
-return new Socket();
-}
-});
-
-assertNotNull(NetUtils.acceptWithoutTimeout(serverSocket));
+Socket expected = new Socket();
+ServerSocket serverSocket =
+new ServerSocket() {
+private int count = 0;
+
+@Override
+public Socket accept() throws IOException {
+if (count < 2) {
+count++;
+throw new SocketTimeoutException();
+}
+
+return expected;
+}
+};
+
+assertEquals(expected, NetUtils.acceptWithoutTimeout(serverSocket));
+
+// Validates timeout option precondition
+serverSocket =
+new ServerSocket() {
+@Override
+public Socket accept() throws IOException {
+return expected;
+}
+};
+
+// non-zero timeout (throw exception)
+serverSocket.setSoTimeout(5);
+try {
+assertEquals(expected, 
NetUtils.acceptWithoutTimeout(serverSocket));
+fail("Expected IllegalArgumentException due to timeout is set to 
non-zero value");
+} catch (IllegalArgumentException e) {
+// Pass
+}
+
+// zero timeout (don't throw exception)
+serverSocket.setSoTimeout(0);
+assertEquals(expected, NetUtils.acceptWithoutTimeout(serverSocket));

Review comment:
   As is this check doesn't add value because it will never fail without 
the first check in this test failing as well. As a separate test however it 
would be useful because it would make it obvious what the issue is.

##
File path: flink-core/src/test/java/org/apache/flink/util/NetUtilsTest.java
##
@@ -61,25 +56,45 @@ public void testParseHostPortAddress() {
 @Test
 public void testAcceptWithoutTimeout() throws IOException {
 // Validates that acceptWithoutTimeout suppresses all 
SocketTimeoutExceptions
-ServerSocket serverSocket = mock(ServerSocket.class);
-when(serverSocket.accept())
-.thenAnswer(
-new Answer() {
-private int count = 0;
-
-@Override
-public Socket answer(InvocationOnMock 
invocationOnMock)
-throws Throwable {
-if (count < 2) {
-count++;
-throw new SocketTimeoutException();
-}
-
-return new Socket();
-}
-});
-
-assertNotNull(NetUtils.acceptWithoutTimeout(serverSocket));
+Socket expected = new Socket();
+ServerSocket serverSocket =
+new ServerSocket() {
+private int count = 0;
+
+@Override
+public Socket accept() throws IOException {
+if (count < 2) {
+count++;
+throw new SocketTimeoutException();
+}
+
+return expected;
+}
+};
+
+assertEquals(expected, NetUtils.acceptWithoutTimeout(serverSocket));
+
+// Validates timeout option precondition
+serverSocket =
+new ServerSocket() {
+@Override
+public Socket accept() throws IOException {
+return expected;
+}
+};
+
+// non-zero timeout (throw exception)

Review comment:

[GitHub] [flink] flinkbot edited a comment on pull request #17023: [FLINK-24043][runtime] Reuse the code of 'check savepoint preconditions'.



flinkbot edited a comment on pull request #17023:
URL: https://github.com/apache/flink/pull/17023#issuecomment-907189438


   
   ## CI report:
   
   * 2a25c295ad6e3fee8cb14a56c8bc1517100d938a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23052)
 
   * 4ce65172415a932038aed6eadd2e7ca1c0499c23 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23896)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-20410) SQLClientSchemaRegistryITCase.testWriting failed with "Subject 'user_behavior' not found.; error code: 40401"



 [ 
https://issues.apache.org/jira/browse/FLINK-20410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-20410:
-
Priority: Major  (was: Critical)

> SQLClientSchemaRegistryITCase.testWriting failed with "Subject 
> 'user_behavior' not found.; error code: 40401"
> -
>
> Key: FLINK-20410
> URL: https://issues.apache.org/jira/browse/FLINK-20410
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10276=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=3425d8ba-5f03-540a-c64b-51b8481bf7d6
> {code}
> 2020-11-28T01:14:08.6444305Z Nov 28 01:14:08 [ERROR] 
> testWriting(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 74.818 s  <<< ERROR!
> 2020-11-28T01:14:08.6445353Z Nov 28 01:14:08 
> io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: 
> Subject 'user_behavior' not found.; error code: 40401
> 2020-11-28T01:14:08.6446071Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
> 2020-11-28T01:14:08.6446910Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
> 2020-11-28T01:14:08.6447522Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:769)
> 2020-11-28T01:14:08.6448352Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:760)
> 2020-11-28T01:14:08.6449091Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getAllVersions(CachedSchemaRegistryClient.java:364)
> 2020-11-28T01:14:08.6449878Z Nov 28 01:14:08  at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testWriting(SQLClientSchemaRegistryITCase.java:195)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20410) SQLClientSchemaRegistryITCase.testWriting failed with "Subject 'user_behavior' not found.; error code: 40401"



[ 
https://issues.apache.org/jira/browse/FLINK-20410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412968#comment-17412968
 ] 

Xintong Song commented on FLINK-20410:
--

There's only one instance reported in recent nearly one year. Downgrading to 
major.

> SQLClientSchemaRegistryITCase.testWriting failed with "Subject 
> 'user_behavior' not found.; error code: 40401"
> -
>
> Key: FLINK-20410
> URL: https://issues.apache.org/jira/browse/FLINK-20410
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.12.0, 1.13.0, 1.14.0
>Reporter: Dian Fu
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=10276=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=3425d8ba-5f03-540a-c64b-51b8481bf7d6
> {code}
> 2020-11-28T01:14:08.6444305Z Nov 28 01:14:08 [ERROR] 
> testWriting(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 74.818 s  <<< ERROR!
> 2020-11-28T01:14:08.6445353Z Nov 28 01:14:08 
> io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: 
> Subject 'user_behavior' not found.; error code: 40401
> 2020-11-28T01:14:08.6446071Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
> 2020-11-28T01:14:08.6446910Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
> 2020-11-28T01:14:08.6447522Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:769)
> 2020-11-28T01:14:08.6448352Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:760)
> 2020-11-28T01:14:08.6449091Z Nov 28 01:14:08  at 
> io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getAllVersions(CachedSchemaRegistryClient.java:364)
> 2020-11-28T01:14:08.6449878Z Nov 28 01:14:08  at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testWriting(SQLClientSchemaRegistryITCase.java:195)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] Huanli-Meng commented on a change in pull request #17188: [FLINK-23864][docs] Add pulsar connector document



Huanli-Meng commented on a change in pull request #17188:
URL: https://github.com/apache/flink/pull/17188#discussion_r705164297



##
File path: docs/layouts/shortcodes/generated/pulsar_client_configuration.html
##
@@ -0,0 +1,222 @@
+
+
+
+Key
+Default
+Type
+Description
+
+
+
+
+pulsar.client.authParamMap
+
+Map
+Map which represents parameters for the authentication 
plugin.
+
+
+pulsar.client.authParams
+(none)

Review comment:
   change (none) to N/A, please check through all docs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Huanli-Meng commented on a change in pull request #17188: [FLINK-23864][docs] Add pulsar connector document



Huanli-Meng commented on a change in pull request #17188:
URL: https://github.com/apache/flink/pull/17188#discussion_r704903345



##
File path: docs/content/docs/connectors/datastream/pulsar.md
##
@@ -0,0 +1,409 @@
+---
+title: Pulsar
+weight: 9
+type: docs
+---
+
+
+# Apache Pulsar Connector
+
+Flink provides an [Apache Pulsar](https://pulsar.apache.org) source for 
reading data from Pulsar topics with exactly-once guarantees.
+
+## Dependency
+
+You can use the connector with Pulsar 2.7.0 or later, but since connector uses
+Pulsar [transactions](https://pulsar.apache.org/docs/en/txn-what/),
+the minimum recommend version is 2.8.0, where Pulsar transactions became 
stable.
+For details on Pulsar compatibility, please refer to the 
[PIP-72](https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification).
+
+{{< artifact flink-connector-pulsar withScalaVersion >}}
+
+if you are using Pulsar source, ```flink-connector-base``` is also required as 
dependency:
+
+{{< artifact flink-connector-base >}}
+
+Flink's streaming connectors are not currently part of the binary distribution.
+See how to link with them for cluster execution [here]({{< ref 
"docs/dev/datastream/project-configuration" >}}).
+
+## Pulsar Source
+
+{{< hint info >}}
+This part describes the Pulsar source based on the new
+[data source]({{< ref "docs/dev/datastream/sources.md" >}}) API.
+
+If you want to use the legacy ```SourceFunction``` or on Flink below 1.14.0, 
just use the StreamNative's 
[pulsar-flink](https://github.com/streamnative/pulsar-flink)
+{{< /hint >}}

Review comment:
   ```suggestion
   {{< /hint >}}.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] SteNicholas edited a comment on pull request #17105: [FLINK-23704][streaming] FLIP-27 sources are not generating LatencyMarkers



SteNicholas edited a comment on pull request #17105:
URL: https://github.com/apache/flink/pull/17105#issuecomment-915681646


   > @SteNicholas , do you think we can get this into 1.14? Do you have time to 
finish it this week?
   
   @AHeise , of course yes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-23747) Testing Window TVF offset



[ 
https://issues.apache.org/jira/browse/FLINK-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412967#comment-17412967
 ] 

Xintong Song commented on FLINK-23747:
--

[~liliwei],
I'm not an expert in this area, thus I cannot comment on what you've raised. 
Maybe [~qingru zhang] and [~jark] can advise on this.
You absolutely can open another issue to have the discussion and track this.

> Testing Window TVF offset
> -
>
> Key: FLINK-23747
> URL: https://issues.apache.org/jira/browse/FLINK-23747
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: JING ZHANG
>Assignee: liwei li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.14.0
>
>
> Window offset is an optional parameter which could be used to change the 
> alignment of windows.
> There are something we need clarify about window offset:
> (1) In SQL, window offset is an optional parameter, if it is specified, it is 
> the last parameter of the window.
> for Tumble window
> {code:java}
> TUMBLE(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '15' MINUTE, INTERVAL '5' 
> MINUTE){code}
> for Hop Window
> {code:java}
> HOP(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '1' MINUTE, INTERVAL '15' 
> MINUTE,
> INTERVAL '5' MINUTE){code}
> for Cumulate Window
> {code:java}
> CUMULATE(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '1' MINUTE, INTERVAL 
> '15' MINUTE, INTERVAL '5' MINUTE){code}
> (2) Window offset could be positive duration and negative duration.
> (3) Window offset is used to change the alignment of Windows. The same record 
> may be assigned to the different window after set window offset. But it 
> always apply a rule, timestamp >= window_start && timestamp < window_end.
> Give a demo, for a tumble window, window size is 10 MINUTE, which window 
> would be assigned to for a record with timestamp 2021-06-30 00:00:04?
>  # offset is '-16 MINUTE',  the record assigns to window [2021-06-29 
> 23:54:00, 2021-06-30 00:04:00)
>  # offset is '-6 MINUTE', the record assigns to window [2021-06-29 23:54:00, 
> 2021-06-30 00:04:00)
>  # offset is '-4 MINUTE', the record assigns to window [2021-06-29 23:56:00, 
> 2021-06-30 00:06:00)
>  # offset is '0', the record assigns to window [2021-06-30 00:00:00, 
> 2021-06-30 00:10:00)
>  # offset is '4 MINUTE', the record assigns to window [2021-06-29 23:54:00, 
> 2021-06-30 00:04:00)
>  # offset is '6 MINUTE, the record assigns to window [2021-06-29 23:56:00, 
> 2021-06-30 00:06:00)
>  # offset is '16 MINUTE', the record assigns to window [2021-06-29 23:56:00, 
> 2021-06-30 00:06:00)
> (4) We could find that, some window offset parameters may have same effect on 
> the alignment of windows, in the above case,  '-16 MINUTE' /'-6 MINUTE'/'4 
> MINUTE' have same effect on a tumble window with '10 MINUTE'  size.
> (5) Window offset is only used to change the alignment of Windows, it has no 
> effect on watermark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-24239) Event time temporal join should support values from array, map, row, etc. as join key

2021-09-09 Thread Caizhi Weng (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caizhi Weng updated FLINK-24239:

Description: 
This ticket is from the [mailing 
list|https://lists.apache.org/thread.html/r90cab9c5026e527357d58db70d7e9b5875e57b942738f032bd54bfd3%40%3Cuser-zh.flink.apache.org%3E].

Currently in event time temporal join when join keys are from an array, map or 
row, an exception will be thrown saying "Currently the join key in Temporal 
Table Join can not be empty". This is quite confusing for users as they've 
already set the join keys.

Add the following test case to {{TableEnvironmentITCase}} to reproduce this 
issue.
{code:scala}
@Test
def myTest(): Unit = {
  tEnv.executeSql(
"""
  |CREATE TABLE A (
  |  a MAP,
  |  ts TIMESTAMP(3),
  |  WATERMARK FOR ts AS ts
  |) WITH (
  |  'connector' = 'values'
  |)
  |""".stripMargin)
  tEnv.executeSql(
"""
  |CREATE TABLE B (
  |  id INT,
  |  ts TIMESTAMP(3),
  |  WATERMARK FOR ts AS ts
  |) WITH (
  |  'connector' = 'values'
  |)
  |""".stripMargin)
  tEnv.executeSql("SELECT * FROM A LEFT JOIN B FOR SYSTEM_TIME AS OF A.ts AS b 
ON A.a['ID'] = id").print()
}
{code}
The exception stack is
{code:java}
org.apache.flink.table.api.ValidationException: Currently the join key in 
Temporal Table Join can not be empty.

at 
org.apache.flink.table.planner.plan.rules.logical.LogicalCorrelateToJoinFromGeneralTemporalTableRule.onMatch(LogicalCorrelateToJoinFromTemporalTableRule.scala:272)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:69)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:87)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.optimize(FlinkGroupProgram.scala:55)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:62)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:58)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)

[jira] [Updated] (FLINK-24213) Java deadlock in QueryableState ClientTest



 [ 
https://issues.apache.org/jira/browse/FLINK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-24213:
-
Fix Version/s: 1.13.3
   1.12.6

> Java deadlock in QueryableState ClientTest
> --
>
> Key: FLINK-24213
> URL: https://issues.apache.org/jira/browse/FLINK-24213
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Queryable State
>Affects Versions: 1.14.0, 1.12.6, 1.13.3
>Reporter: Dawid Wysakowicz
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0, 1.12.6, 1.13.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23750=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d=15476
> {code}
>  Found one Java-level deadlock:
> Sep 08 11:12:50 =
> Sep 08 11:12:50 "Flink Test Client Event Loop Thread 0":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4e380309c8 (object 
> 0x86b2cd50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "main"
> Sep 08 11:12:50 "main":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4ea4004068 (object 
> 0x86b2cf50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "Flink Test Client Event Loop Thread 0"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-24213) Java deadlock in QueryableState ClientTest



[ 
https://issues.apache.org/jira/browse/FLINK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412550#comment-17412550
 ] 

Chesnay Schepler edited comment on FLINK-24213 at 9/10/21, 5:30 AM:


master:
b5ac92eb5282ecdedf2daf89c1e9a1a737499836
2e721ab3fa501ce5c4ac4f9fe032a770aa66bc9a
1.14:
a2185c0f9bef64b103a4e1ae2c33619d83d54921
2459a3c3001c2ef263536276b7d7109752f93286
1.13:
640797f39454223503c097f4762ae4f8d5d2e768
4db7f4c502ba6428bf4f3f7d52b00a8cbada29fa
1.12:
fa2ab610c1c20ff828db0fc928b6328c2f440e9d
de69c3e3e7a4c29e99acd91129937db24afaefb2


was (Author: zentol):
master:
b5ac92eb5282ecdedf2daf89c1e9a1a737499836
2e721ab3fa501ce5c4ac4f9fe032a770aa66bc9a
1.14:
a2185c0f9bef64b103a4e1ae2c33619d83d54921
2459a3c3001c2ef263536276b7d7109752f93286

> Java deadlock in QueryableState ClientTest
> --
>
> Key: FLINK-24213
> URL: https://issues.apache.org/jira/browse/FLINK-24213
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Queryable State
>Affects Versions: 1.12.5, 1.15.0
>Reporter: Dawid Wysakowicz
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23750=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d=15476
> {code}
>  Found one Java-level deadlock:
> Sep 08 11:12:50 =
> Sep 08 11:12:50 "Flink Test Client Event Loop Thread 0":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4e380309c8 (object 
> 0x86b2cd50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "main"
> Sep 08 11:12:50 "main":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4ea4004068 (object 
> 0x86b2cf50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "Flink Test Client Event Loop Thread 0"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-24213) Java deadlock in QueryableState ClientTest



 [ 
https://issues.apache.org/jira/browse/FLINK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-24213:
-
Affects Version/s: (was: 1.15.0)
   (was: 1.12.5)
   1.13.3
   1.12.6
   1.14.0

> Java deadlock in QueryableState ClientTest
> --
>
> Key: FLINK-24213
> URL: https://issues.apache.org/jira/browse/FLINK-24213
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Queryable State
>Affects Versions: 1.14.0, 1.12.6, 1.13.3
>Reporter: Dawid Wysakowicz
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23750=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d=15476
> {code}
>  Found one Java-level deadlock:
> Sep 08 11:12:50 =
> Sep 08 11:12:50 "Flink Test Client Event Loop Thread 0":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4e380309c8 (object 
> 0x86b2cd50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "main"
> Sep 08 11:12:50 "main":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4ea4004068 (object 
> 0x86b2cf50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "Flink Test Client Event Loop Thread 0"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-24213) Java deadlock in QueryableState ClientTest



 [ 
https://issues.apache.org/jira/browse/FLINK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-24213.

Resolution: Fixed

> Java deadlock in QueryableState ClientTest
> --
>
> Key: FLINK-24213
> URL: https://issues.apache.org/jira/browse/FLINK-24213
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Queryable State
>Affects Versions: 1.14.0, 1.12.6, 1.13.3
>Reporter: Dawid Wysakowicz
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0, 1.12.6, 1.13.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23750=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d=15476
> {code}
>  Found one Java-level deadlock:
> Sep 08 11:12:50 =
> Sep 08 11:12:50 "Flink Test Client Event Loop Thread 0":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4e380309c8 (object 
> 0x86b2cd50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "main"
> Sep 08 11:12:50 "main":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4ea4004068 (object 
> 0x86b2cf50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "Flink Test Client Event Loop Thread 0"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-24239) Event time temporal join should support values from array, map, row, etc. as join key

2021-09-09 Thread Caizhi Weng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412962#comment-17412962
 ] 

Caizhi Weng commented on FLINK-24239:
-

cc [~Leonard Xu]

> Event time temporal join should support values from array, map, row, etc. as 
> join key
> -
>
> Key: FLINK-24239
> URL: https://issues.apache.org/jira/browse/FLINK-24239
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.12.6, 1.13.3, 1.15.0, 1.14.1
>Reporter: Caizhi Weng
>Priority: Major
>
> This ticket is from the [mailing 
> list|https://lists.apache.org/thread.html/r90cab9c5026e527357d58db70d7e9b5875e57b942738f032bd54bfd3%40%3Cuser-zh.flink.apache.org%3E].
> Currently in event time temporal join when join keys are from an array, map 
> or row, an exception will be thrown saying "Currently the join key in 
> Temporal Table Join can not be empty". This is quite confusing for users as 
> they've already set the join keys.
> Add the following test case to {{TableEnvironmentITCase}} to reproduce this 
> issue.
> {code:scala}
> @Test
> def myTest(): Unit = {
>   tEnv.executeSql(
> """
>   |CREATE TABLE A (
>   |  a MAP,
>   |  ts TIMESTAMP(3),
>   |  WATERMARK FOR ts AS ts
>   |) WITH (
>   |  'connector' = 'values'
>   |)
>   |""".stripMargin)
>   tEnv.executeSql(
> """
>   |CREATE TABLE B (
>   |  id INT,
>   |  ts TIMESTAMP(3),
>   |  WATERMARK FOR ts AS ts
>   |) WITH (
>   |  'connector' = 'values'
>   |)
>   |""".stripMargin)
>   tEnv.executeSql("SELECT * FROM A LEFT JOIN B FOR SYSTEM_TIME AS OF A.ts AS 
> b ON A.a['ID'] = id").print()
> }
> {code}
> The exception stack is
> {code:java}
> org.apache.flink.table.api.ValidationException: Currently the join key in 
> Temporal Table Join can not be empty.
>   at 
> org.apache.flink.table.planner.plan.rules.logical.LogicalCorrelateToJoinFromGeneralTemporalTableRule.onMatch(LogicalCorrelateToJoinFromTemporalTableRule.scala:272)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:69)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:87)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
>   at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
>   at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.optimize(FlinkGroupProgram.scala:55)
>   at 
>

[jira] [Created] (FLINK-24239) Event time temporal join should support values from array, map, row, etc. as join key

2021-09-09 Thread Caizhi Weng (Jira)

Caizhi Weng created FLINK-24239:
---

 Summary: Event time temporal join should support values from 
array, map, row, etc. as join key
 Key: FLINK-24239
 URL: https://issues.apache.org/jira/browse/FLINK-24239
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.12.6, 1.13.3, 1.15.0, 1.14.1
Reporter: Caizhi Weng


This ticket is from the [mailing 
list|https://lists.apache.org/thread.html/r90cab9c5026e527357d58db70d7e9b5875e57b942738f032bd54bfd3%40%3Cuser-zh.flink.apache.org%3E].

Currently in event time temporal join when join keys are from an array, map or 
row, an exception will be thrown saying "Currently the join key in Temporal 
Table Join can not be empty". This is quite confusing for users as they've 
already set the join keys.

Add the following test case to {{TableEnvironmentITCase}} to reproduce this 
issue.
{code:scala}
@Test
def myTest(): Unit = {
  tEnv.executeSql(
"""
  |CREATE TABLE A (
  |  a MAP,
  |  ts TIMESTAMP(3),
  |  WATERMARK FOR ts AS ts
  |) WITH (
  |  'connector' = 'values'
  |)
  |""".stripMargin)
  tEnv.executeSql(
"""
  |CREATE TABLE B (
  |  id INT,
  |  ts TIMESTAMP(3),
  |  WATERMARK FOR ts AS ts
  |) WITH (
  |  'connector' = 'values'
  |)
  |""".stripMargin)
  tEnv.executeSql("SELECT * FROM A LEFT JOIN B FOR SYSTEM_TIME AS OF A.ts AS b 
ON A.a['ID'] = id").print()
}
{code}
The exception stack is
{code:java}
org.apache.flink.table.api.ValidationException: Currently the join key in 
Temporal Table Join can not be empty.

at 
org.apache.flink.table.planner.plan.rules.logical.LogicalCorrelateToJoinFromGeneralTemporalTableRule.onMatch(LogicalCorrelateToJoinFromTemporalTableRule.scala:272)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:69)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:87)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.optimize(FlinkGroupProgram.scala:55)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:62)
at 
org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:58)
at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at

[jira] [Commented] (FLINK-24149) Make checkpoint self-contained and relocatable



[ 
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412958#comment-17412958
 ] 

Feifan Wang commented on FLINK-24149:
-

Hi [~yunta], [~pnowojski] , I rephrased description, please review it again in 
your free time.

> Make checkpoint self-contained and relocatable
> --
>
> Key: FLINK-24149
> URL: https://issues.apache.org/jira/browse/FLINK-24149
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Feifan Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-09-08-17-06-31-560.png, 
> image-2021-09-08-17-10-28-240.png, image-2021-09-08-17-55-46-898.png, 
> image-2021-09-08-18-01-03-176.png
>
>
> h1. Backgroud
> We have many jobs with large state size in production environment. According 
> to the operation practice of these jobs and the analysis of some specific 
> problems, we believe that RocksDBStateBackend's incremental checkpoint has 
> many advantages over savepoint：
>  # Savepoint takes much longer time then incremental checkpoint in jobs with 
> large state. The figure below is a job in our production environment, it 
> takes nearly 7 minutes to complete a savepoint, while checkpoint only takes a 
> few seconds.( checkpoint after savepoint takes longer time is a problem 
> described in -FLINK-23949-)
>  !image-2021-09-08-17-55-46-898.png|width=723,height=161!
>  # Savepoint causes excessive cpu usage. The figure below shows the CPU usage 
> of the same job in the above figure :
>  !image-2021-09-08-18-01-03-176.png|width=516,height=148!
>  # Savepoint may cause excessive native memory usage and eventually cause the 
> TaskManager process memory usage to exceed the limit. (We did not further 
> investigate the cause and did not try to reproduce the problem on other large 
> state jobs, but only increased the overhead memory. So this reason may not be 
> so conclusive. )
> For the above reasons, we tend to use retained incremental checkpoint to 
> completely replace savepoint for jobs with large state size.
> h1. Problems
>  * *Problem 1 : retained incremental checkpoint difficult to clean up once 
> they used for recovery*
> This problem caused by jobs recoveryed from a retained incremental checkpoint 
> may reference files on this retained incremental checkpoint's shared 
> directory in subsequent checkpoints, even they are not in a same job 
> instance. The worst case is that the retained checkpoint will be referenced 
> one by one, forming a very long reference chain.This makes it difficult for 
> users to manage retained checkpoints. In fact, we have also suffered failures 
> caused by incorrect deletion of retained checkpoints.
> Although we can use the file handle in checkpoint metadata to figure out 
> which files can be deleted, but I think it is inappropriate to let users do 
> this.
>  * *Problem 2 : checkpoint not relocatable*
> Even if we can figure out all files referenced by a checkpoint, moving these 
> files will invalidate the checkpoint as well, because the metadata file 
> references absolute file paths.
> Since savepoint already be self-contained and relocatable (FLINK-5763), why 
> don't we use savepoint just for migrate jobs to another place ? In addition 
> to the savepoint performance problem in the background description, a very 
> important reason is that the migration requirement may come from the failure 
> of the original cluster. In this case, there is no opportunity to trigger 
> savepoint.
> h1. Proposal
>  * *job's checkpoint directory (user-defined-checkpoint-dir/) contains 
> all their state files (self-contained)*
>  As far as I know, in the current status, only the subsequent checkpoints of 
> the jobs restored from the retained checkpoint violate this constraint. One 
> possible solution is to re-upload all shared files at the first incremental 
> checkpoint after the job started, but we need to discuss how to distinguish 
> between a new job instance and a restart.
>  * *use relative file path in checkpoint metadata (relocatable)*
> Change all file references in checkpoint metadata to the relative path 
> relative to the _metadata file, so we can copy 
> user-defined-checkpoint-dir/ to any other place.
>  
> BTW, this issue is so similar to FLINK-5763 , we can read it as a background 
> supplement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #17188: [FLINK-23864][docs] Add pulsar connector document



flinkbot edited a comment on pull request #17188:
URL: https://github.com/apache/flink/pull/17188#issuecomment-914560045


   
   ## CI report:
   
   * 12809bb05f44589b9dde53d53397a06c3d6a03c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23875)
 
   * e45b87d2011fe81949b65094ebbb475423640bc1 UNKNOWN
   * 287cf49f0032a03a59234f5330e6171bad2fbf67 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17023: [FLINK-24043][runtime] Reuse the code of 'check savepoint preconditions'.



flinkbot edited a comment on pull request #17023:
URL: https://github.com/apache/flink/pull/17023#issuecomment-907189438


   
   ## CI report:
   
   * 2a25c295ad6e3fee8cb14a56c8bc1517100d938a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23052)
 
   * 4ce65172415a932038aed6eadd2e7ca1c0499c23 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #16994: [FLINK-23991][yarn] Specifying yarn.staging-dir fail when staging scheme is…



flinkbot edited a comment on pull request #16994:
URL: https://github.com/apache/flink/pull/16994#issuecomment-906233672


   
   ## CI report:
   
   * 9e78c26b7915d68b148af6ba782e45add05123e2 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23462)
 
   * 9ad206f8b390dba9437cafe3414905946d84e933 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23894)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-24149) Make checkpoint self-contained and relocatable

[
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Feifan Wang updated FLINK-24149:

Description:
h1. Backgroud

We have many jobs with large state size in production environment. According to
the operation practice of these jobs and the analysis of some specific
problems, we believe that RocksDBStateBackend's incremental checkpoint has many
advantages over savepoint：
# Savepoint takes much longer time then incremental checkpoint in jobs with
large state. The figure below is a job in our production environment, it takes
nearly 7 minutes to complete a savepoint, while checkpoint only takes a few
seconds.( checkpoint after savepoint takes longer time is a problem described
in -FLINK-23949-)
!image-2021-09-08-17-55-46-898.png|width=723,height=161!
# Savepoint causes excessive cpu usage. The figure below shows the CPU usage
of the same job in the above figure :
!image-2021-09-08-18-01-03-176.png|width=516,height=148!
# Savepoint may cause excessive native memory usage and eventually cause the
TaskManager process memory usage to exceed the limit. (We did not further
investigate the cause and did not try to reproduce the problem on other large
state jobs, but only increased the overhead memory. So this reason may not be
so conclusive. )

For the above reasons, we tend to use retained incremental checkpoint to
completely replace savepoint for jobs with large state size.
h1. Problems
* *Problem 1 : retained incremental checkpoint difficult to clean up once they
used for recovery*
This problem caused by jobs recoveryed from a retained incremental checkpoint
may reference files on this retained incremental checkpoint's shared directory
in subsequent checkpoints, even they are not in a same job instance. The worst
case is that the retained checkpoint will be referenced one by one, forming a
very long reference chain.This makes it difficult for users to manage retained
checkpoints. In fact, we have also suffered failures caused by incorrect
deletion of retained checkpoints.
Although we can use the file handle in checkpoint metadata to figure out which
files can be deleted, but I think it is inappropriate to let users do this.

* *Problem 2 : checkpoint not relocatable*
Even if we can figure out all files referenced by a checkpoint, moving these
files will invalidate the checkpoint as well, because the metadata file
references absolute file paths.
Since savepoint already be self-contained and relocatable (FLINK-5763), why
don't we use savepoint just for migrate jobs to another place ? In addition to
the savepoint performance problem in the background description, a very
important reason is that the migration requirement may come from the failure of
the original cluster. In this case, there is no opportunity to trigger
savepoint.

h1. Proposal
* *job's checkpoint directory (user-defined-checkpoint-dir/) contains
all their state files (self-contained)*
As far as I know, in the current status, only the subsequent checkpoints of
the jobs restored from the retained checkpoint violate this constraint. One
possible solution is to re-upload all shared files at the first incremental
checkpoint after the job started, but we need to discuss how to distinguish
between a new job instance and a restart.

* *use relative file path in checkpoint metadata (relocatable)*
Change all file references in checkpoint metadata to the relative path relative
to the _metadata file, so we can copy user-defined-checkpoint-dir/ to
any other place.

BTW, this issue is so similar to FLINK-5763 , we can read it as a background
supplement.

was:
h1. Backgroud

For the above reasons, we tend to use retained incremental checkpoint to
completely replace

[jira] [Updated] (FLINK-24149) Make checkpoint self-contained and relocatable

[
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Feifan Wang updated FLINK-24149:

Description:
h1. Backgroud

This problem caused by jobs recoveryed from a retained incremental checkpoint
may reference files on this retained incremental checkpoint's shared directory
in subsequent checkpoints, even they are not in a same job instance. The worst
case is that the retained checkpoint will be referenced one by one, forming a
very long reference chain.This makes it difficult for users to manage retained
checkpoints. In fact, we have also suffered failures caused by incorrect
deletion of retained checkpoints.

Although we can use the file handle in checkpoint metadata to figure out which
files can be deleted, but I think it is inappropriate to let users do this.

* *Problem 2 : checkpoint not relocatable*

Even if we can figure out all files referenced by a checkpoint, moving these
files will invalidate the checkpoint as well, because the metadata file
references absolute file paths.

Since savepoint already be self-contained and relocatable (FLINK-5763), why
don't we use savepoint just for migrate jobs to another place ? In addition to
the savepoint performance problem in the background description, a very
important reason is that the migration requirement may come from the failure of
the original cluster. In this case, there is no opportunity to trigger
savepoint.

h1. Proposal
* *job's checkpoint directory (user-defined-checkpoint-dir/) contains
all their state files (self-contained)*
As far as I know, in the current status, only the subsequent checkpoints of the
jobs restored from the retained checkpoint violate this constraint. One
possible solution is to re-upload all shared files at the first incremental
checkpoint after the job started, but we need to discuss how to distinguish
between a new job instance and a restart.

* *use relative file path in checkpoint metadata (relocatable)*

Change all file references in checkpoint metadata to the relative path relative
to the _metadata file, so we can copy user-defined-checkpoint-dir/ to
any other place.

BTW, this issue is so similar to FLINK-5763 , we can read it as a background
supplement.

was:
h1. Backgroud

For the above reasons, we tend to use retained incremental checkpoint to
completely

[jira] [Updated] (FLINK-24149) Make checkpoint self-contained and relocatable

[
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Feifan Wang updated FLINK-24149:

Description:
h1. Backgroud

**This problem caused by jobs recoveryed from a retained incremental checkpoint
may reference files on this retained incremental checkpoint's shared directory
in subsequent checkpoints, even they are not in a same job instance. The worst
case is that the retained checkpoint will be referenced one by one, forming a
very long reference chain.This makes it difficult for users to manage retained
checkpoints. In fact, we have also suffered failures caused by incorrect
deletion of retained checkpoints.
**
Although we can use the file handle in checkpoint metadata to figure out which
files can be deleted, but I think it is inappropriate to let users do this.
* *Problem 2 : checkpoint not relocatable*

**Even if we can figure out all files referenced by a checkpoint, moving these
files will invalidate the checkpoint as well, because the metadata file
references absolute file paths.

h1. Proposal
* *job's checkpoint directory (user-defined-checkpoint-dir/) contains
all their state files (self-contained)*

**As far as I know, in the current status, only the subsequent checkpoints of
the jobs restored from the retained checkpoint violate this constraint. One
possible solution is to re-upload all shared files at the first incremental
checkpoint after the job started, but we need to discuss how to distinguish
between a new job instance and a restart.
**
* *use relative file path in checkpoint metadata (relocatable)*

**Change all file references in checkpoint metadata to the relative path
relative to the _metadata file, so we can copy
user-defined-checkpoint-dir/ to any other place.

BTW, this issue is so similar to FLINK-5763 , we can read it as a background
supplement.

was:
h1. Backgroud

We have many job with large state size in production environment. According to
the operation practice of these jobs and the analysis of some specific
problems, we believe that RocksDBStateBackend's incremental checkpoint has many
advantages over savepoint：
# Savepoint cost much longer time then incremental checkpoint in jobs with
large state. The figure below is a job in our production environment, it takes
nearly 7 minutes to complete a savepoint, while checkpoint only takes a few
seconds.( checkpoint after savepoint case longer time is a problem described in
-FLINK-23949-)
!image-2021-09-08-17-55-46-898.png|width=723,height=161!
# Savepoint causes excessive cpu usage. The figure below shows the CPU usage
of the same job in the above figure :
!image-2021-09-08-18-01-03-176.png|width=516,height=148!
# Savepoint may cause excessive native memory usage and eventually cause the
TaskManager process memory usage to exceed the limit. (We did not further
investigate the cause and did not try to reproduce the problem on other large
state jobs, but only increased the overhead memory. So this reason may not be
so conclusive. )

For the above reasons, we tend to use retained incremental checkpoint to

[jira] [Updated] (FLINK-24149) Make checkpoint self-contained and relocatable

[
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Feifan Wang updated FLINK-24149:

Description:
h1. Backgroud

* *Problem 2 : checkpoint not relocatable*

**Even if we can figure out all files referenced by a checkpoint, moving these
files will invalidate the checkpoint as well, because the metadata file
references absolute file paths.

h1. Proposal
* *job's checkpoint directory (user-defined-checkpoint-dir/) contains
all their state files (self-contained)*

* *use relative file path in checkpoint metadata (relocatable)*

**Change all file references in checkpoint metadata to the relative path
relative to the _metadata file, so we can copy
user-defined-checkpoint-dir/ to any other place.

BTW, this issue is so similar to FLINK-5763 , we can read it as a background
supplement.

was:
h1. Backgroud

We have many job with large state size in production environment. According to
the operation practice of these jobs and the analysis of some specific
problems, we believe that RocksDBStateBackend's incremental checkpoint has many
advantages over savepoint：
# Savepoint cost much longer time then incremental checkpoint in jobs with
large state. The figure below is a job in our production environment, it takes
nearly 7 minutes to complete a savepoint, while checkpoint only takes a few
seconds.( checkpoint after savepoint case longer time is a problem described in
-FLINK-23949-)
!image-2021-09-08-17-55-46-898.png|width=723,height=161!
# Savepoint causes excessive cpu usage. The figure below shows the CPU usage
of the same job in the above figure :
# Savepoint may cause excessive native memory usage and eventually cause the
TaskManager process memory usage to exceed the limit. (We did not further
investigate the cause and did not try to reproduce the problem on other large
state jobs, but only increased the overhead memory. So this reason may not be
so conclusive. )

For the above reasons, we tend to use retained incremental checkpoint to
completely replace savepoint for jobs with large state size.

[jira] [Commented] (FLINK-23747) Testing Window TVF offset

2021-09-09 Thread liwei li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412953#comment-17412953
 ] 

liwei li commented on FLINK-23747:
--

yes, thx. How about opening an issue to track whether to uniformly limit the 
value of ABS(offset)  in all windows？
 

> Testing Window TVF offset
> -
>
> Key: FLINK-23747
> URL: https://issues.apache.org/jira/browse/FLINK-23747
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: JING ZHANG
>Assignee: liwei li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.14.0
>
>
> Window offset is an optional parameter which could be used to change the 
> alignment of windows.
> There are something we need clarify about window offset:
> (1) In SQL, window offset is an optional parameter, if it is specified, it is 
> the last parameter of the window.
> for Tumble window
> {code:java}
> TUMBLE(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '15' MINUTE, INTERVAL '5' 
> MINUTE){code}
> for Hop Window
> {code:java}
> HOP(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '1' MINUTE, INTERVAL '15' 
> MINUTE,
> INTERVAL '5' MINUTE){code}
> for Cumulate Window
> {code:java}
> CUMULATE(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '1' MINUTE, INTERVAL 
> '15' MINUTE, INTERVAL '5' MINUTE){code}
> (2) Window offset could be positive duration and negative duration.
> (3) Window offset is used to change the alignment of Windows. The same record 
> may be assigned to the different window after set window offset. But it 
> always apply a rule, timestamp >= window_start && timestamp < window_end.
> Give a demo, for a tumble window, window size is 10 MINUTE, which window 
> would be assigned to for a record with timestamp 2021-06-30 00:00:04?
>  # offset is '-16 MINUTE',  the record assigns to window [2021-06-29 
> 23:54:00, 2021-06-30 00:04:00)
>  # offset is '-6 MINUTE', the record assigns to window [2021-06-29 23:54:00, 
> 2021-06-30 00:04:00)
>  # offset is '-4 MINUTE', the record assigns to window [2021-06-29 23:56:00, 
> 2021-06-30 00:06:00)
>  # offset is '0', the record assigns to window [2021-06-30 00:00:00, 
> 2021-06-30 00:10:00)
>  # offset is '4 MINUTE', the record assigns to window [2021-06-29 23:54:00, 
> 2021-06-30 00:04:00)
>  # offset is '6 MINUTE, the record assigns to window [2021-06-29 23:56:00, 
> 2021-06-30 00:06:00)
>  # offset is '16 MINUTE', the record assigns to window [2021-06-29 23:56:00, 
> 2021-06-30 00:06:00)
> (4) We could find that, some window offset parameters may have same effect on 
> the alignment of windows, in the above case,  '-16 MINUTE' /'-6 MINUTE'/'4 
> MINUTE' have same effect on a tumble window with '10 MINUTE'  size.
> (5) Window offset is only used to change the alignment of Windows, it has no 
> effect on watermark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #17188: [FLINK-23864][docs] Add pulsar connector document



flinkbot edited a comment on pull request #17188:
URL: https://github.com/apache/flink/pull/17188#issuecomment-914560045


   
   ## CI report:
   
   * 12809bb05f44589b9dde53d53397a06c3d6a03c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23875)
 
   * e45b87d2011fe81949b65094ebbb475423640bc1 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #16994: [FLINK-23991][yarn] Specifying yarn.staging-dir fail when staging scheme is…



flinkbot edited a comment on pull request #16994:
URL: https://github.com/apache/flink/pull/16994#issuecomment-906233672


   
   ## CI report:
   
   * 9e78c26b7915d68b148af6ba782e45add05123e2 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23462)
 
   * 9ad206f8b390dba9437cafe3414905946d84e933 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] syhily commented on a change in pull request #17188: [FLINK-23864][docs] Add pulsar connector document



syhily commented on a change in pull request #17188:
URL: https://github.com/apache/flink/pull/17188#discussion_r705860316



##
File path: docs/layouts/shortcodes/generated/pulsar_consumer_configuration.html
##
@@ -0,0 +1,180 @@
+
+
+
+Key
+Default
+Type
+Description
+
+
+
+
+pulsar.consumer.ackReceiptEnabled
+false
+Boolean
+Ack will return receipt but does not mean that the message 
will not be resent after get receipt.
+
+
+pulsar.consumer.ackTimeoutMillis
+0
+Long
+Set the timeout (in ms) for unacknowledged messages, truncated 
to the nearest millisecond. The timeout needs to be greater than 1 second.By default, the acknowledge timeout is disabled and that means that messages 
delivered to a consumer will not be re-delivered unless the consumer 
crashes.When enabling ack timeout, if a message is not acknowledged 
within the specified timeout it will be re-delivered to the consumer (possibly 
to a different consumer in case of a shared subscription).
+
+
+pulsar.consumer.acknowledgementsGroupTimeMicros
+10
+Long
+Group a consumer acknowledgment for a specified time (in μs). 
By default, a consumer uses 100μs 
grouping time to send out acknowledgments to a broker. Setting a group time of 
0 sends out acknowledgments immediately. 
A longer ack group time is more efficient at the expense of a slight increase 
in message re-deliveries after a failure.
+
+
+
pulsar.consumer.autoAckOldestChunkedMessageOnQueueFull
+false
+Boolean
+Buffering large number of outstanding uncompleted chunked 
messages can create memory pressure and it can be guarded by providing this 
pulsar.consumer.maxPendingChunkedMessage 
threshold. Once consumer reaches this threshold, it drops the outstanding 
unchunked-messages by silently acking if pulsar.consumer.autoAckOldestChunkedMessageOnQueueFull
 is true else it marks them for redelivery.
+
+
+pulsar.consumer.autoUpdatePartitions
+true
+Boolean
+If autoUpdatePartitions 
is enabled, a consumer subscribes to partition increase automatically.Note: this is only for partitioned consumers.

Review comment:
   This config option should be removed in the Flink connector.

##
File path: docs/content.zh/docs/connectors/datastream/pulsar.md
##
@@ -0,0 +1,356 @@
+---
+title: Pulsar
+weight: 9
+type: docs
+---
+
+
+# Apache Pulsar 连接器
+
+Flink 当前只提供 [Apache Pulsar](https://pulsar.apache.org) 数据源，你可以使用它从 Pulsar 
读取数据，并保证每条数据只被处理一次。
+
+## 添加依赖
+
+连接器当前支持 Pulsar 2.7.0 之后的版本，但是连接器使用到了 Pulsar 
的[事务机制](https://pulsar.apache.org/docs/en/txn-what/),我们建议在 Pulsar 2.8.0
+及其之后的版本上使用连接器进行数据读取。
+
+如果想要了解更多关于 Pulsar API 兼容性设计，可以阅读文档 
[PIP-72](https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification)。
+
+{{< artifact flink-connector-pulsar withScalaVersion >}}
+
+使用本连接器的同时，记得把 `flink-connector-base` 也加到你的依赖里面：
+
+{{< artifact flink-connector-base >}}
+
+Flink 的流连接器并不会放到发行文件里面一同发布，阅读[此文档]({{< ref 
"docs/dev/datastream/project-configuration" >}})了解如何将连接器加入到集群实例里面执行。
+
+## Pulsar 数据源
+
+{{< hint info >}}
+Pulsar 数据源基于 Flink最新的[批流一体 API]({{< ref "docs/dev/datastream/sources.md" >}}) 
进行开发。
+
+如果要想使用基于旧版的 `SourceFunction` 实现的 Pulsar 数据源，或者是你的 Flink 版本低于 1.14，你可以使用 
StreamNative 单独维护的 [pulsar-flink](https://github.com/streamnative/pulsar-flink)。
+{{< /hint >}}
+
+### 使用示例
+
+Pulsar 数据源提供了 builder 类来构造数据源实例。下面的代码实例使用 builder 类创建的数据源会从 topic 
"persistent://public/default/my-topic" 的数据开始端进行消费。
+我们使用了 **Exclusive**（独占）的订阅方式消费消息，订阅名称为 `my-subscription`，并把消息体的二进制字节流以 UTF-8 
的方式编码为字符串。
+
+```java
+PulsarSource pulsarSource = PulsarSource.builder()
+.setServiceUrl(serviceUrl)
+.setAdminUrl(adminUrl)
+.setStartCursor(StartCursor.earliest())
+.setTopics("my-topic")
+.setDeserializationSchema(PulsarDeserializationSchema.flinkSchema(new 
SimpleStringSchema()))
+.setSubscriptionName("my-subscription")
+.setSubscriptionType(SubscriptionType.Exclusive)
+.build();
+
+env.fromSource(source, WatermarkStrategy.noWatermarks(), "Pulsar Source");
+```
+
+如果使用构造类构造 Pulsar 数据源，一定要提供下面几个属性：
+
+- Pulsar 数据消费的地址，使用 `setServiceUrl(String)` 方法提供
+- Pulsar HTTP 管理地址，使用 `setAdminUrl(String)` 方法提供
+- Pulsar 订阅名称，使用 `setSubscriptionName(String)` 方法提供
+- 需要消费的 topic 或者是 topic 下面的分区，详见[指定消费的 Topic 或者 Topic 
分区](#指定消费的-topic-或者-topic-分区)
+- 解码 Pulsar 消息的反序列化器，详见[反序列化器](#反序列化器)
+
+### 指定消费的 Topic 或者 Topic 分区
+
+Pulsar 数据源提供了两种订阅 topic 或 topic 分区的方式。
+
+- Topic 列表，从这个 Topic 的所有分区上消费消息，例如：
+  ```java
+  PulsarSource.builder().setTopics("some-topic1", "some-topic2")
+
+  // 从 topic "topic-a" 的 0 和 1 分区上消费
+

[jira] [Updated] (FLINK-24149) Make checkpoint self-contained and relocatable

[
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Feifan Wang updated FLINK-24149:

Description:
h1. Backgroud

We have many job with large state size in production environment. According to
the operation practice of these jobs and the analysis of some specific
problems, we believe that RocksDBStateBackend's incremental checkpoint has many
advantages over savepoint：
# Savepoint cost much longer time then incremental checkpoint in jobs with
large state. The figure below is a job in our production environment, it takes
nearly 7 minutes to complete a savepoint, while checkpoint only takes a few
seconds.( checkpoint after savepoint case longer time is a problem described in
-FLINK-23949-)
# Savepoint causes excessive cpu usage. The figure below shows the CPU usage
of the same job in the above figure :
# Savepoint may cause excessive native memory usage and eventually cause the
TaskManager process memory usage to exceed the limit. (We did not further
investigate the cause and did not try to reproduce the problem on other large
state jobs, but only increased the overhead memory. So this reason may not be
so conclusive. )

For the above reasons, we tend to use retained incremental checkpoint to
completely replace savepoint for jobs with large state size.

was:
#
h1. Backgroud

## Savepoint cost much longer time then incremental checkpoint in jobs with
large state. The figure below is a job in our production environment, it takes
nearly 7 minutes to complete a savepoint, while checkpoint only takes a few
seconds.( checkpoint after savepoint case longer time is a problem described in
-FLINK-23949-)
!https://km.sankuai.com/api/file/cdn/1200816615/1166124322?contentType=1=false=false|width=322!

## Savepoint causes excessive cpu usage. The figure below shows the CPU usage
of the same job in the above figure :
!https://km.sankuai.com/api/file/cdn/1200816615/1166239767?contentType=1=false=false|width=344!

## Savepoint may cause excessive native memory usage and eventually cause the
TaskManager process memory usage to exceed the limit. (We did not further
investigate the cause and did not try to reproduce the problem on other large
state jobs, but only increased the overhead memory. So this reason may not be
so conclusive. )

For the above reasons, we tend to use retained incremental checkpoint to
completely replace savepoint for jobs with large state size.

#
h1. Problems

## Problem 1 : retained incremental checkpoint difficult to clean up once they
used for recovery

Although we can use the file handle in checkpoint metadata to figure out which
files can be deleted, but I think it is inappropriate to let users do this.

## Problem 2 : checkpoint not relocatable

Even if we can figure out all files referenced by a checkpoint, moving these
files will invalidate the checkpoint as well, because the metadata file
references absolute file paths.
Since savepoint already be self-contained and relocatable (FLINK-5763), why
don't we use savepoint just for migrate jobs to another place ? In addition to
the savepoint performance problem in the background description, a very
important reason is that the migration requirement may come from the failure of
the original cluster. In this case, there is no opportunity to trigger
savepoint.

#
h1. Proposal

## job's checkpoint directory (user-defined-checkpoint-dir/) contains
all their state files (self-contained)

As far as I know, in the current status, only the subsequent checkpoints of the
jobs restored from the retained checkpoint violate this constraint. One
possible solution is to re-upload all shared files at the first incremental
checkpoint after the job started, but we need to discuss how to distinguish
between a new job instance and a restart.

## use relative file path in checkpoint metadata (relocatable)

Change all file references in checkpoint metadata to the relative path relative
to the _metadata file, so we can copy user-defined-checkpoint-dir/ to
any other place.

> Make checkpoint self-contained and relocatable
>

[jira] [Updated] (FLINK-24149) Make checkpoint self-contained and relocatable



 [ 
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feifan Wang updated FLINK-24149:

Description: 
h1. Backgroud

We have many job with large state size in production environment. According to 
the operation practice of these jobs and the analysis of some specific 
problems, we believe that RocksDBStateBackend's incremental checkpoint has many 
advantages over savepoint：
 # Savepoint cost much longer time then incremental checkpoint in jobs with 
large state. The figure below is a job in our production environment, it takes 
nearly 7 minutes to complete a savepoint, while checkpoint only takes a few 
seconds.( checkpoint after savepoint case longer time is a problem described in 
-FLINK-23949-)
!image-2021-09-08-17-55-46-898.png|width=723,height=161!
 # Savepoint causes excessive cpu usage. The figure below shows the CPU usage 
of the same job in the above figure :
 # Savepoint may cause excessive native memory usage and eventually cause the 
TaskManager process memory usage to exceed the limit. (We did not further 
investigate the cause and did not try to reproduce the problem on other large 
state jobs, but only increased the overhead memory. So this reason may not be 
so conclusive. )

For the above reasons, we tend to use retained incremental checkpoint to 
completely replace savepoint for jobs with large state size.

 

  was:
h1. Backgroud

We have many job with large state size in production environment. According to 
the operation practice of these jobs and the analysis of some specific 
problems, we believe that RocksDBStateBackend's incremental checkpoint has many 
advantages over savepoint：
 # Savepoint cost much longer time then incremental checkpoint in jobs with 
large state. The figure below is a job in our production environment, it takes 
nearly 7 minutes to complete a savepoint, while checkpoint only takes a few 
seconds.( checkpoint after savepoint case longer time is a problem described in 
-FLINK-23949-)
 # Savepoint causes excessive cpu usage. The figure below shows the CPU usage 
of the same job in the above figure :
 # Savepoint may cause excessive native memory usage and eventually cause the 
TaskManager process memory usage to exceed the limit. (We did not further 
investigate the cause and did not try to reproduce the problem on other large 
state jobs, but only increased the overhead memory. So this reason may not be 
so conclusive. )

For the above reasons, we tend to use retained incremental checkpoint to 
completely replace savepoint for jobs with large state size.

 


> Make checkpoint self-contained and relocatable
> --
>
> Key: FLINK-24149
> URL: https://issues.apache.org/jira/browse/FLINK-24149
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Feifan Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-09-08-17-06-31-560.png, 
> image-2021-09-08-17-10-28-240.png, image-2021-09-08-17-55-46-898.png, 
> image-2021-09-08-18-01-03-176.png
>
>
> h1. Backgroud
> We have many job with large state size in production environment. According 
> to the operation practice of these jobs and the analysis of some specific 
> problems, we believe that RocksDBStateBackend's incremental checkpoint has 
> many advantages over savepoint：
>  # Savepoint cost much longer time then incremental checkpoint in jobs with 
> large state. The figure below is a job in our production environment, it 
> takes nearly 7 minutes to complete a savepoint, while checkpoint only takes a 
> few seconds.( checkpoint after savepoint case longer time is a problem 
> described in -FLINK-23949-)
> !image-2021-09-08-17-55-46-898.png|width=723,height=161!
>  # Savepoint causes excessive cpu usage. The figure below shows the CPU usage 
> of the same job in the above figure :
>  # Savepoint may cause excessive native memory usage and eventually cause the 
> TaskManager process memory usage to exceed the limit. (We did not further 
> investigate the cause and did not try to reproduce the problem on other large 
> state jobs, but only increased the overhead memory. So this reason may not be 
> so conclusive. )
> For the above reasons, we tend to use retained incremental checkpoint to 
> completely replace savepoint for jobs with large state size.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-24149) Make checkpoint self-contained and relocatable

[
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Feifan Wang updated FLINK-24149:

Description:
#
h1. Backgroud

For the above reasons, we tend to use retained incremental checkpoint to
completely replace savepoint for jobs with large state size.

#
h1. Problems

## Problem 1 : retained incremental checkpoint difficult to clean up once they
used for recovery

Although we can use the file handle in checkpoint metadata to figure out which
files can be deleted, but I think it is inappropriate to let users do this.

## Problem 2 : checkpoint not relocatable

#
h1. Proposal

## job's checkpoint directory (user-defined-checkpoint-dir/) contains
all their state files (self-contained)

## use relative file path in checkpoint metadata (relocatable)

Change all file references in checkpoint metadata to the relative path relative
to the _metadata file, so we can copy user-defined-checkpoint-dir/ to
any other place.

was:
#
h1. Backgroud

## Savepoint causes excessive cpu usage. The figure below shows the CPU usage
of the same job in the above figure :

For the above reasons, we tend to use retained incremental checkpoint to
completely replace savepoint for jobs with large state size.

#
h1. Problems

##
h2. Problem 1 : retained incremental

[jira] [Updated] (FLINK-24149) Make checkpoint self-contained and relocatable

[
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Feifan Wang updated FLINK-24149:

Description:
#
h1. Backgroud

## Savepoint causes excessive cpu usage. The figure below shows the CPU usage
of the same job in the above figure :

For the above reasons, we tend to use retained incremental checkpoint to
completely replace savepoint for jobs with large state size.

#
h1. Problems

##
h2. Problem 1 : retained incremental checkpoint difficult to clean up once they
used for recovery

Although we can use the file handle in checkpoint metadata to figure out which
files can be deleted, but I think it is inappropriate to let users do this.

##
h2. Problem 2 : checkpoint not relocatable

#
h1. Proposal

## job's checkpoint directory (user-defined-checkpoint-dir/) contains
all their state files (self-contained)

## use relative file path in checkpoint metadata (relocatable)

Change all file references in checkpoint metadata to the relative path relative
to the _metadata file, so we can copy user-defined-checkpoint-dir/ to
any other place.

was:
h3. 1. Backgroud

FLINK-5763 proposal make savepoint relocatable, checkpoint has similar
requirements. For example, to migrate jobs to other HDFS clusters, although it
can be achieved through a savepoint, but we prefer to use persistent
checkpoints, especially RocksDBStateBackend incremental checkpoints have better
performance than savepoint during snapshot and restore.

FLINK-8531 standardized directory layout :
{code:java}
/user-defined-checkpoint-dir
|
+ 1b080b6e710aabbef8993ab18c6de98b (job's ID)
|
+ --shared/
+ --taskowned/
+ --chk-1/
+ --chk-2/
+ --chk-3/
...
{code}
* State backend will create a subdirectory with the job's ID that will contain
the actual checkpoints, such as:
user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/
* Each checkpoint individually will store all its files in a subdirectory that
includes the checkpoint number, such as:
user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/chk-3/
* Files shared between checkpoints will be stored in the shared/ directory in
the same parent directory as the separate checkpoint directory, such as:
user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/shared/
* Similar to shared files, files owned strictly by tasks will be stored in the
taskowned/ directory in the same parent directory as the separate checkpoint
directory, such as:

[GitHub] [flink] zuston commented on pull request #16994: [FLINK-23991][yarn] Specifying yarn.staging-dir fail when staging scheme is…



zuston commented on pull request #16994:
URL: https://github.com/apache/flink/pull/16994#issuecomment-916611717


   Added test cases. Thanks reply @gaborgsomogyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-23316) There's no test for custom PartitionCommitPolicy

2021-09-09 Thread Rui Li (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned FLINK-23316:
--

Assignee: jackwangcs

> There's no test for custom PartitionCommitPolicy
> 
>
> Key: FLINK-23316
> URL: https://issues.apache.org/jira/browse/FLINK-23316
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Ecosystem
>Reporter: Rui Li
>Assignee: jackwangcs
>Priority: Critical
>
> We support custom PartitionCommitPolicy but currently there's no test 
> coverage for that use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-23316) There's no test for custom PartitionCommitPolicy

2021-09-09 Thread Rui Li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412942#comment-17412942
 ] 

Rui Li commented on FLINK-23316:


[~jackwangcs] Thanks for volunteering to help. Assigning to you.

> There's no test for custom PartitionCommitPolicy
> 
>
> Key: FLINK-23316
> URL: https://issues.apache.org/jira/browse/FLINK-23316
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Ecosystem
>Reporter: Rui Li
>Priority: Critical
>
> We support custom PartitionCommitPolicy but currently there's no test 
> coverage for that use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #17228: [FLINK-24236] Migrate tests to factory approach



flinkbot edited a comment on pull request #17228:
URL: https://github.com/apache/flink/pull/17228#issuecomment-916400792


   
   ## CI report:
   
   * 6688227d5c3120d7dc893b22892435f1c14539c0 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23881)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



flinkbot edited a comment on pull request #17227:
URL: https://github.com/apache/flink/pull/17227#issuecomment-916400703


   
   ## CI report:
   
   * 4d1caa7cda89a0f6df2010925d4543507cf53dbf Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23880)
 
   * 4c250aeecd45453d6028d0bf25174d5812893da6 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23882)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-23969) Test Pulsar source end 2 end

2021-09-09 Thread Yufan Sheng (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-23969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412938#comment-17412938
 ] 

Yufan Sheng commented on FLINK-23969:
-

[~Jiangang] Any progress on this e2e test?

> Test Pulsar source end 2 end
> 
>
> Key: FLINK-23969
> URL: https://issues.apache.org/jira/browse/FLINK-23969
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Pulsar
>Reporter: Arvid Heise
>Assignee: Liu
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.14.0
>
>
> Write a test application using Pulsar Source and execute it in distributed 
> fashion. Check fault-tolerance by crashing and restarting a TM.
> Ideally, we test different subscription modes and sticky keys in particular.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] godfreyhe commented on pull request #17089: [FLINK-22601][table-planner] PushWatermarkIntoScan should produce digest created by Expression.



godfreyhe commented on pull request #17089:
URL: https://github.com/apache/flink/pull/17089#issuecomment-916602109


   this pr can be closed, which is done in #17118


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-22601) PushWatermarkIntoScan should produce digest created by Expression



 [ 
https://issues.apache.org/jira/browse/FLINK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he closed FLINK-22601.
--
Resolution: Fixed

Resolved in FLINK-22603

> PushWatermarkIntoScan should produce digest created by Expression
> -
>
> Key: FLINK-22601
> URL: https://issues.apache.org/jira/browse/FLINK-22601
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Reporter: Jingsong Lee
>Assignee: xuyangzhong
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.15.0
>
>
> In PushWatermarkIntoTableSourceScanRuleBase, the digest is created by RexNode 
> instead of Expression, RexNode rely on field index but Expression rely on 
> field name.
> We should adjust it to names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-22601) PushWatermarkIntoScan should produce digest created by Expression



 [ 
https://issues.apache.org/jira/browse/FLINK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-22601:
---
Fix Version/s: (was: 1.14.0)
   1.15.0

> PushWatermarkIntoScan should produce digest created by Expression
> -
>
> Key: FLINK-22601
> URL: https://issues.apache.org/jira/browse/FLINK-22601
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Reporter: Jingsong Lee
>Assignee: xuyangzhong
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.15.0
>
>
> In PushWatermarkIntoTableSourceScanRuleBase, the digest is created by RexNode 
> instead of Expression, RexNode rely on field index but Expression rely on 
> field name.
> We should adjust it to names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (FLINK-22601) PushWatermarkIntoScan should produce digest created by Expression



 [ 
https://issues.apache.org/jira/browse/FLINK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he reassigned FLINK-22601:
--

Assignee: xuyangzhong

> PushWatermarkIntoScan should produce digest created by Expression
> -
>
> Key: FLINK-22601
> URL: https://issues.apache.org/jira/browse/FLINK-22601
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Reporter: Jingsong Lee
>Assignee: xuyangzhong
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.14.0
>
>
> In PushWatermarkIntoTableSourceScanRuleBase, the digest is created by RexNode 
> instead of Expression, RexNode rely on field index but Expression rely on 
> field name.
> We should adjust it to names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22601) PushWatermarkIntoScan should produce digest created by Expression

2021-09-09 Thread xuyangzhong (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412935#comment-17412935
 ] 

xuyangzhong commented on FLINK-22601:
-

link FLINK-22603

> PushWatermarkIntoScan should produce digest created by Expression
> -
>
> Key: FLINK-22601
> URL: https://issues.apache.org/jira/browse/FLINK-22601
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Reporter: Jingsong Lee
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.14.0
>
>
> In PushWatermarkIntoTableSourceScanRuleBase, the digest is created by RexNode 
> instead of Expression, RexNode rely on field index but Expression rely on 
> field name.
> We should adjust it to names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-22603) The digest can be produced by SourceAbilitySpec



 [ 
https://issues.apache.org/jira/browse/FLINK-22603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he closed FLINK-22603.
--
Resolution: Done

Fixed in 1.15.0: d93682ccd9de5c83b8daed615cac96c288a44871

> The digest can be produced by SourceAbilitySpec
> ---
>
> Key: FLINK-22603
> URL: https://issues.apache.org/jira/browse/FLINK-22603
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Reporter: Jingsong Lee
>Assignee: xuyangzhong
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
> Fix For: 1.15.0
>
>
> We should not separate SourceAbilitySpec and digests, which may lead to the 
> risk of consistency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] godfreyhe closed pull request #17118: [FLINK-22603][table-planner]The digest can be produced by SourceAbilitySpec.



godfreyhe closed pull request #17118:
URL: https://github.com/apache/flink/pull/17118


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17230: [FLINK-23054][table-blink] Correct upsert optimization by upsert keys



flinkbot edited a comment on pull request #17230:
URL: https://github.com/apache/flink/pull/17230#issuecomment-916582233


   
   ## CI report:
   
   * 414d13798addfaecc667880c39b381effb5be830 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23889)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-23944) PulsarSourceITCase.testTaskManagerFailure is instable



 [ 
https://issues.apache.org/jira/browse/FLINK-23944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-23944.

Resolution: Fixed

Test case activated in
- master (1.15): 412c45f219d775a021dce124a9a9154f3df42d84
- release-1.14: 4e68c4559619676995984565315411af8598af2e

> PulsarSourceITCase.testTaskManagerFailure is instable
> -
>
> Key: FLINK-23944
> URL: https://issues.apache.org/jira/browse/FLINK-23944
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.14.0
>Reporter: Dian Fu
>Assignee: Qingsheng Ren
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0, 1.15.0
>
>
> [https://dev.azure.com/dianfu/Flink/_build/results?buildId=430=logs=f3dc9b18-b77a-55c1-591e-264c46fe44d1=2d3cd81e-1c37-5c31-0ee4-f5d5cdb9324d]
> It's from my personal azure pipeline, however, I'm pretty sure that I have 
> not touched any code related to this. 
> {code:java}
> Aug 24 10:44:13 [ERROR] testTaskManagerFailure{TestEnvironment, 
> ExternalContext, ClusterControllable}[1] Time elapsed: 258.397 s <<< FAILURE! 
> Aug 24 10:44:13 java.lang.AssertionError: Aug 24 10:44:13 Aug 24 10:44:13 
> Expected: Records consumed by Flink should be identical to test data and 
> preserve the order in split Aug 24 10:44:13 but: Mismatched record at 
> position 7: Expected '0W6SzacX7MNL4xLL3BZ8C3ljho4iCydbvxIl' but was 
> 'wVi5JaJpNvgkDEOBRC775qHgw0LyRW2HBxwLmfONeEmr' Aug 24 10:44:13 at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) Aug 24 10:44:13 
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8) Aug 24 
> 10:44:13 at 
> org.apache.flink.connectors.test.common.testsuites.SourceTestSuiteBase.testTaskManagerFailure(SourceTestSuiteBase.java:271)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-24238) Page title missing

2021-09-09 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-24238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-24238.
---
Resolution: Duplicate

> Page title missing
> --
>
> Key: FLINK-24238
> URL: https://issues.apache.org/jira/browse/FLINK-24238
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.13.2
>Reporter: Jun Qin
>Priority: Major
>
> the page title is missing on this Flink doc: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/concepts/versioned_tables/].
>   
>  
> [This 
> one|https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/concepts/dynamic_tables/]
>  is a good example.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-23848) PulsarSourceITCase is failed on Azure



 [ 
https://issues.apache.org/jira/browse/FLINK-23848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-23848.

Resolution: Fixed

release-1.14: 4e732ad4509d719a2f0c5a81cd75ed002df25c2e

> PulsarSourceITCase is failed on Azure
> -
>
> Key: FLINK-23848
> URL: https://issues.apache.org/jira/browse/FLINK-23848
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.14.0
>Reporter: Jark Wu
>Assignee: Yufan Sheng
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=22412=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461
> {code}
> 2021-08-17T20:11:53.7228789Z Aug 17 20:11:53 [INFO] Running 
> org.apache.flink.connector.pulsar.source.PulsarSourceITCase
> 2021-08-17T20:17:38.2429467Z Aug 17 20:17:38 [ERROR] Tests run: 8, Failures: 
> 0, Errors: 1, Skipped: 0, Time elapsed: 344.515 s <<< FAILURE! - in 
> org.apache.flink.connector.pulsar.source.PulsarSourceITCase
> 2021-08-17T20:17:38.2430693Z Aug 17 20:17:38 [ERROR] 
> testMultipleSplits{TestEnvironment, ExternalContext}[2]  Time elapsed: 66.766 
> s  <<< ERROR!
> 2021-08-17T20:17:38.2431387Z Aug 17 20:17:38 java.lang.RuntimeException: 
> Failed to fetch next result
> 2021-08-17T20:17:38.2432035Z Aug 17 20:17:38  at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:109)
> 2021-08-17T20:17:38.2433345Z Aug 17 20:17:38  at 
> org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80)
> 2021-08-17T20:17:38.2434175Z Aug 17 20:17:38  at 
> org.apache.flink.connectors.test.common.utils.TestDataMatchers$MultipleSplitDataMatcher.matchesSafely(TestDataMatchers.java:151)
> 2021-08-17T20:17:38.2435028Z Aug 17 20:17:38  at 
> org.apache.flink.connectors.test.common.utils.TestDataMatchers$MultipleSplitDataMatcher.matchesSafely(TestDataMatchers.java:133)
> 2021-08-17T20:17:38.2438387Z Aug 17 20:17:38  at 
> org.hamcrest.TypeSafeDiagnosingMatcher.matches(TypeSafeDiagnosingMatcher.java:55)
> 2021-08-17T20:17:38.2439100Z Aug 17 20:17:38  at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:12)
> 2021-08-17T20:17:38.2439708Z Aug 17 20:17:38  at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
> 2021-08-17T20:17:38.2440299Z Aug 17 20:17:38  at 
> org.apache.flink.connectors.test.common.testsuites.SourceTestSuiteBase.testMultipleSplits(SourceTestSuiteBase.java:156)
> 2021-08-17T20:17:38.2441007Z Aug 17 20:17:38  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2021-08-17T20:17:38.2441526Z Aug 17 20:17:38  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2021-08-17T20:17:38.2442068Z Aug 17 20:17:38  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2021-08-17T20:17:38.2442759Z Aug 17 20:17:38  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2021-08-17T20:17:38.2443247Z Aug 17 20:17:38  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
> 2021-08-17T20:17:38.2443812Z Aug 17 20:17:38  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2021-08-17T20:17:38.241Z Aug 17 20:17:38  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2021-08-17T20:17:38.2445101Z Aug 17 20:17:38  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2021-08-17T20:17:38.2445688Z Aug 17 20:17:38  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2021-08-17T20:17:38.2446328Z Aug 17 20:17:38  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
> 2021-08-17T20:17:38.2447303Z Aug 17 20:17:38  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2021-08-17T20:17:38.2448336Z Aug 17 20:17:38  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2021-08-17T20:17:38.2448999Z Aug 17 20:17:38  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2021-08-17T20:17:38.2449689Z Aug 17 20:17:38  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2021-08-17T20:17:38.2450363Z Aug 17 20:17:38  at 
>

[GitHub] [flink] xintongsong closed pull request #17199: [FLINK-23848][connector/pulsar] Fix the consumer not found error in test. [1.14]



xintongsong closed pull request #17199:
URL: https://github.com/apache/flink/pull/17199


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xintongsong commented on pull request #17199: [FLINK-23848][connector/pulsar] Fix the consumer not found error in test. [1.14]



xintongsong commented on pull request #17199:
URL: https://github.com/apache/flink/pull/17199#issuecomment-916592298


   4e732ad4509d719a2f0c5a81cd75ed002df25c2e


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] godfreyhe commented on a change in pull request #16192: [FLINK-22954][table-planner-blink] Rewrite Join on constant TableFunctionScan to Correlate



godfreyhe commented on a change in pull request #16192:
URL: https://github.com/apache/flink/pull/16192#discussion_r705860658



##
File path: 
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/rules/physical/stream/StreamPhysicalConstantTableFunctionScanRule.scala
##
@@ -26,7 +26,7 @@ import com.google.common.collect.ImmutableList
 import org.apache.calcite.plan.RelOptRule._
 import org.apache.calcite.plan.{RelOptRule, RelOptRuleCall}
 import org.apache.calcite.rel.core.JoinRelType
-import org.apache.calcite.rex.{RexLiteral, RexUtil}
+import org.apache.calcite.rex.{RexCall, RexLiteral, RexUtil}

Review comment:
   remove the useless import: `RexCall`

##
File path: 
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/rules/physical/stream/StreamPhysicalConstantTableFunctionScanRule.scala
##
@@ -51,7 +51,7 @@ class StreamPhysicalConstantTableFunctionScanRule
 
   override def matches(call: RelOptRuleCall): Boolean = {
 val scan: FlinkLogicalTableFunctionScan = call.rel(0)
-RexUtil.isConstant(scan.getCall) && scan.getInputs.isEmpty
+!RexUtil.containsInputRef(scan.getCall) && scan.getInputs.isEmpty

Review comment:
   please update the class java-doc

##
File path: 
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/JoinTableFunctionScanToCorrelateRule.scala
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.rules.logical
+
+import org.apache.calcite.plan.{RelOptRule, RelOptRuleCall}
+import org.apache.calcite.plan.RelOptRule.{any, none, operand}
+import org.apache.calcite.rel.RelNode
+import org.apache.calcite.rel.logical.{LogicalJoin, LogicalTableFunctionScan}
+
+/**
+ * Rule that rewrites Join on TableFunctionScan to Correlate.
+ */
+class JoinTableFunctionScanToCorrelateRule extends RelOptRule(

Review comment:
   please port this class to JAVA, and avoid introduce new SCALA class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xintongsong closed pull request #17201: [FLINK-23944][connector/pulsar] Enable PulsarSourceITCase.testTaskManagerFailure



xintongsong closed pull request #17201:
URL: https://github.com/apache/flink/pull/17201


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-23747) Testing Window TVF offset



 [ 
https://issues.apache.org/jira/browse/FLINK-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song closed FLINK-23747.

Resolution: Done

> Testing Window TVF offset
> -
>
> Key: FLINK-23747
> URL: https://issues.apache.org/jira/browse/FLINK-23747
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: JING ZHANG
>Assignee: liwei li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.14.0
>
>
> Window offset is an optional parameter which could be used to change the 
> alignment of windows.
> There are something we need clarify about window offset:
> (1) In SQL, window offset is an optional parameter, if it is specified, it is 
> the last parameter of the window.
> for Tumble window
> {code:java}
> TUMBLE(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '15' MINUTE, INTERVAL '5' 
> MINUTE){code}
> for Hop Window
> {code:java}
> HOP(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '1' MINUTE, INTERVAL '15' 
> MINUTE,
> INTERVAL '5' MINUTE){code}
> for Cumulate Window
> {code:java}
> CUMULATE(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '1' MINUTE, INTERVAL 
> '15' MINUTE, INTERVAL '5' MINUTE){code}
> (2) Window offset could be positive duration and negative duration.
> (3) Window offset is used to change the alignment of Windows. The same record 
> may be assigned to the different window after set window offset. But it 
> always apply a rule, timestamp >= window_start && timestamp < window_end.
> Give a demo, for a tumble window, window size is 10 MINUTE, which window 
> would be assigned to for a record with timestamp 2021-06-30 00:00:04?
>  # offset is '-16 MINUTE',  the record assigns to window [2021-06-29 
> 23:54:00, 2021-06-30 00:04:00)
>  # offset is '-6 MINUTE', the record assigns to window [2021-06-29 23:54:00, 
> 2021-06-30 00:04:00)
>  # offset is '-4 MINUTE', the record assigns to window [2021-06-29 23:56:00, 
> 2021-06-30 00:06:00)
>  # offset is '0', the record assigns to window [2021-06-30 00:00:00, 
> 2021-06-30 00:10:00)
>  # offset is '4 MINUTE', the record assigns to window [2021-06-29 23:54:00, 
> 2021-06-30 00:04:00)
>  # offset is '6 MINUTE, the record assigns to window [2021-06-29 23:56:00, 
> 2021-06-30 00:06:00)
>  # offset is '16 MINUTE', the record assigns to window [2021-06-29 23:56:00, 
> 2021-06-30 00:06:00)
> (4) We could find that, some window offset parameters may have same effect on 
> the alignment of windows, in the above case,  '-16 MINUTE' /'-6 MINUTE'/'4 
> MINUTE' have same effect on a tumble window with '10 MINUTE'  size.
> (5) Window offset is only used to change the alignment of Windows, it has no 
> effect on watermark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] lzshlzsh commented on pull request #17198: [FLINK-24212][kerbernets] fix the problem that kerberos krb5.conf file is mounted as empty directory, not a expected file



lzshlzsh commented on pull request #17198:
URL: https://github.com/apache/flink/pull/17198#issuecomment-916586530


   > Thanks for the contribution @lzshlzsh . Your fix looks good. Have you 
verified it in your local env? Also, it would be good to add a test for it.
   
   Yes, I have verified it in my company production env. 
   Thanks for your advise, i will add a test for it, 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] Huanli-Meng commented on a change in pull request #17188: [FLINK-23864][docs] Add pulsar connector document



Huanli-Meng commented on a change in pull request #17188:
URL: https://github.com/apache/flink/pull/17188#discussion_r705830866



##
File path: docs/layouts/shortcodes/generated/pulsar_consumer_configuration.html
##
@@ -0,0 +1,180 @@
+
+
+
+Key
+Default
+Type
+Description
+
+
+
+
+pulsar.consumer.ackReceiptEnabled
+false
+Boolean
+Ack will return receipt but does not mean that the message 
will not be resent after get receipt.
+
+
+pulsar.consumer.ackTimeoutMillis
+0
+Long
+Set the timeout (in ms) for unacknowledged messages, truncated 
to the nearest millisecond. The timeout needs to be greater than 1 second.By default, the acknowledge timeout is disabled and that means that messages 
delivered to a consumer will not be re-delivered unless the consumer 
crashes.When enabling ack timeout, if a message is not acknowledged 
within the specified timeout it will be re-delivered to the consumer (possibly 
to a different consumer in case of a shared subscription).

Review comment:
   ```suggestion
   The timeout (in ms) for unacknowledged messages, truncated 
to the nearest millisecond. The timeout needs to be greater than 1 second.By default, the acknowledge timeout is disabled and that means that messages 
delivered to a consumer will not be re-delivered unless the consumer 
crashes.When acknowledgement timeout being enabled, if a message is not 
acknowledged within the specified timeout, it will be re-delivered to the 
consumer (possibly to a different consumer in case of a shared 
subscription).
   ```

##
File path: docs/layouts/shortcodes/generated/pulsar_consumer_configuration.html
##
@@ -0,0 +1,180 @@
+
+
+
+Key
+Default
+Type
+Description
+
+
+
+
+pulsar.consumer.ackReceiptEnabled
+false
+Boolean
+Ack will return receipt but does not mean that the message 
will not be resent after get receipt.

Review comment:
   ```suggestion
   Acknowledgement will return a receipt but this does not mean 
that the message will not be resent after getting the receipt.
   ```

##
File path: docs/layouts/shortcodes/generated/pulsar_consumer_configuration.html
##
@@ -0,0 +1,180 @@
+
+
+
+Key
+Default
+Type
+Description
+
+
+
+
+pulsar.consumer.ackReceiptEnabled
+false
+Boolean
+Ack will return receipt but does not mean that the message 
will not be resent after get receipt.
+
+
+pulsar.consumer.ackTimeoutMillis
+0
+Long
+Set the timeout (in ms) for unacknowledged messages, truncated 
to the nearest millisecond. The timeout needs to be greater than 1 second.By default, the acknowledge timeout is disabled and that means that messages 
delivered to a consumer will not be re-delivered unless the consumer 
crashes.When enabling ack timeout, if a message is not acknowledged 
within the specified timeout it will be re-delivered to the consumer (possibly 
to a different consumer in case of a shared subscription).
+
+
+pulsar.consumer.acknowledgementsGroupTimeMicros
+10
+Long
+Group a consumer acknowledgment for a specified time (in μs). 
By default, a consumer uses 100μs 
grouping time to send out acknowledgments to a broker. Setting a group time of 
0 sends out acknowledgments immediately. 
A longer ack group time is more efficient at the expense of a slight increase 
in message re-deliveries after a failure.
+
+
+
pulsar.consumer.autoAckOldestChunkedMessageOnQueueFull
+false
+Boolean
+Buffering large number of outstanding uncompleted chunked 
messages can create memory pressure and it can be guarded by providing this 
pulsar.consumer.maxPendingChunkedMessage 
threshold. Once consumer reaches this threshold, it drops the outstanding 
unchunked-messages by silently acking if pulsar.consumer.autoAckOldestChunkedMessageOnQueueFull
 is true else it marks them for redelivery.
+
+
+pulsar.consumer.autoUpdatePartitions
+true
+Boolean
+If autoUpdatePartitions 
is enabled, a consumer subscribes to partition increase automatically.Note: this is only for partitioned consumers.
+
+
+
pulsar.consumer.autoUpdatePartitionsIntervalSeconds
+60
+Integer
+Set the interval (in seconds) of updating partitions. This 
only works if autoUpdatePartitions is enabled.
+
+
+

[jira] [Closed] (FLINK-23345) Migrate to the next version of Python `requests` when released

2021-09-09 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-23345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-23345.
---
Fix Version/s: 1.11.5
   1.10.4
   Resolution: Fixed

Fixed in:
- master via 0cecdcae4e20edf6a513d5be5ebf3ad7d59b9f4f
- release-1.14 via 3621b00358fecacbc63e4919525343560adf5ebe
- release-1.13 via 53f79eaa7a6d65e92ae4258d334d35d573e43c41
- release-1.12 via 07a3f6633fc88b96856dc94be664e0dca6bb991d
- release-1.11 via aae0f918f3188abaf348513511be49b5b286a01c
- release-1.10 via 3d95391ca025266debd4149c33c005901a90b66e

> Migrate to the next version of Python `requests` when released
> --
>
> Key: FLINK-23345
> URL: https://issues.apache.org/jira/browse/FLINK-23345
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.14.0
>Reporter: Jarek Potiuk
>Assignee: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.4, 1.14.0, 1.11.5, 1.12.6, 1.13.3
>
>
> Hello Maintainers, 
> I am a PMC member of Apache Airflow, and I wanted to give you a bit of 
> heads-up with rather important migration to the upcoming version of 
> `requests` library in your Python release. 
> Since you are using `requests` library in your project (at least indirectly 
> via apache-beam), you are affected.
> As discussed at length in https://issues.apache.org/jira/browse/LEGAL-572 we 
> found out that the 'chardet` library used by `requests` library was a 
> mandatory dependency to requests and since it has LGPL licence, we should not 
> release any Apache Software with it. 
> Since then (and since in Airflow we rely on requests heavily) we have been 
> working with the requests maintainers and "charset-normalizer" maintainer to 
> make it possible to replace `chardet` with MIT-licensed `charset-normalizer` 
> instead so that requests library can be used in Python releases by Apache 
> projects.
> This was a bumpy road but finally the PR by [~ash] has been merged: 
> [https://github.com/psf/requests/pull/5797] and we hope soon a new version of 
> requests library will be released. 
> This is just a heads-up. I will let you know when it is released, but I have 
> a kind requests as well - I might ask the maintainers to release a release 
> candidate of requests and maybe you could help to test it before it is 
> released, that would be some re-assurance for the maintainers of requests who 
> are very concerned about stability of their releases.
> Let me know if you need any more information and whether you would like to 
> help in testing the candidate when it is out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #17230: [FLINK-23054][table-blink] Correct upsert optimization by upsert keys



flinkbot commented on pull request #17230:
URL: https://github.com/apache/flink/pull/17230#issuecomment-916582233


   
   ## CI report:
   
   * 414d13798addfaecc667880c39b381effb5be830 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-19142) Investigate slot hijacking from preceding pipelined regions after failover

2021-09-09 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-19142:

Labels: pull-request-available  (was: auto-unassigned 
pull-request-available)

> Investigate slot hijacking from preceding pipelined regions after failover
> --
>
> Key: FLINK-19142
> URL: https://issues.apache.org/jira/browse/FLINK-19142
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Andrey Zagrebin
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> The ticket originates from [this PR 
> discussion|https://github.com/apache/flink/pull/13181#discussion_r481087221].
> The previous AllocationIDs are used by 
> PreviousAllocationSlotSelectionStrategy to schedule subtasks into the slot 
> where they were previously executed before a failover. If the previous slot 
> (AllocationID) is not available, we do not want subtasks to take previous 
> slots (AllocationIDs) of other subtasks.
> The MergingSharedSlotProfileRetriever gets all previous AllocationIDs of the 
> bulk from SlotSharingExecutionSlotAllocator but only from the current bulk. 
> The previous AllocationIDs of other bulks stay unknown. Therefore, the 
> current bulk can potentially hijack the previous slots from the preceding 
> bulks. On the other hand the previous AllocationIDs of other tasks should be 
> taken if the other tasks are not going to run at the same time, e.g. not 
> enough resources after failover or other bulks are done.
> One way to do it may be to give to MergingSharedSlotProfileRetriever all 
> previous AllocationIDs of bulks which are going to run at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19142) Investigate slot hijacking from preceding pipelined regions after failover

2021-09-09 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-19142:

Fix Version/s: (was: 1.14.0)
   1.15.0

> Investigate slot hijacking from preceding pipelined regions after failover
> --
>
> Key: FLINK-19142
> URL: https://issues.apache.org/jira/browse/FLINK-19142
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Andrey Zagrebin
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
> Fix For: 1.15.0
>
>
> The ticket originates from [this PR 
> discussion|https://github.com/apache/flink/pull/13181#discussion_r481087221].
> The previous AllocationIDs are used by 
> PreviousAllocationSlotSelectionStrategy to schedule subtasks into the slot 
> where they were previously executed before a failover. If the previous slot 
> (AllocationID) is not available, we do not want subtasks to take previous 
> slots (AllocationIDs) of other subtasks.
> The MergingSharedSlotProfileRetriever gets all previous AllocationIDs of the 
> bulk from SlotSharingExecutionSlotAllocator but only from the current bulk. 
> The previous AllocationIDs of other bulks stay unknown. Therefore, the 
> current bulk can potentially hijack the previous slots from the preceding 
> bulks. On the other hand the previous AllocationIDs of other tasks should be 
> taken if the other tasks are not going to run at the same time, e.g. not 
> enough resources after failover or other bulks are done.
> One way to do it may be to give to MergingSharedSlotProfileRetriever all 
> previous AllocationIDs of bulks which are going to run at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] Myasuka commented on pull request #17151: [FLINK-23180] Do not initialize checkpoint base locations when checkp…



Myasuka commented on pull request #17151:
URL: https://github.com/apache/flink/pull/17151#issuecomment-916581485


   > I think in the end there is no harm doing so, but a the very least the 
javadoc must be updated. I would even suggest renaming initializeBaseLocations 
to initializeBaseLocationsForCheckpoints.
   
   I think @dawidwys 's suggestion is right, we should change the javadoc 
description. Since this class is not annoated as "Public" and might not be 
extended in user classes, it's fine to rename this method (the renaming commit 
could be a separate one).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-23047) CassandraConnectorITCase.testCassandraBatchTupleFormat fails on azure



[ 
https://issues.apache.org/jira/browse/FLINK-23047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412914#comment-17412914
 ] 

Xintong Song commented on FLINK-23047:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23877=logs=ba53eb01-1462-56a3-8e98-0dd97fbcaab5=bfbc6239-57a0-5db0-63f3-41551b4f7d51=14431

> CassandraConnectorITCase.testCassandraBatchTupleFormat fails on azure
> -
>
> Key: FLINK-23047
> URL: https://issues.apache.org/jira/browse/FLINK-23047
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.0, 1.12.4, 1.13.2
>Reporter: Xintong Song
>Priority: Major
>  Labels: test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19176=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=03dca39c-73e8-5aaf-601d-328ae5c35f20=13995
> {code}
> [ERROR] Tests run: 17, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 157.28 s <<< FAILURE! - in 
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase
> [ERROR] 
> testCassandraBatchTupleFormat(org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase)
>   Time elapsed: 12.052 s  <<< ERROR!
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: /127.0.0.1:9042 
> (com.datastax.driver.core.exceptions.OperationTimedOutException: [/127.0.0.1] 
> Timed out waiting for server response))
>   at 
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
>   at 
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37)
>   at 
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
>   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63)
>   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39)
>   at 
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase.createTable(CassandraConnectorITCase.java:234)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
>

[jira] [Reopened] (FLINK-22869) SQLClientSchemaRegistryITCase timeouts on azure



 [ 
https://issues.apache.org/jira/browse/FLINK-22869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reopened FLINK-22869:
--

> SQLClientSchemaRegistryITCase timeouts on azure
> ---
>
> Key: FLINK-22869
> URL: https://issues.apache.org/jira/browse/FLINK-22869
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.12.4, 1.13.2
>Reporter: Xintong Song
>Priority: Critical
>  Labels: stale-critical, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18652=logs=68a897ab-3047-5660-245a-cce8f83859f6=16ca2cca-2f63-5cce-12d2-d519b930a729=27324
> {code}
> Jun 03 23:51:30 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 227.425 s <<< FAILURE! - in 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
> Jun 03 23:51:30 [ERROR] 
> testReading(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 194.931 s  <<< ERROR!
> Jun 03 23:51:30 org.junit.runners.model.TestTimedOutException: test timed out 
> after 12 milliseconds
> Jun 03 23:51:30   at java.lang.Object.wait(Native Method)
> Jun 03 23:51:30   at java.lang.Thread.join(Thread.java:1252)
> Jun 03 23:51:30   at java.lang.Thread.join(Thread.java:1326)
> Jun 03 23:51:30   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.close(KafkaAdminClient.java:541)
> Jun 03 23:51:30   at 
> org.apache.kafka.clients.admin.Admin.close(Admin.java:96)
> Jun 03 23:51:30   at 
> org.apache.kafka.clients.admin.Admin.close(Admin.java:79)
> Jun 03 23:51:30   at 
> org.apache.flink.tests.util.kafka.KafkaContainerClient.createTopic(KafkaContainerClient.java:71)
> Jun 03 23:51:30   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testReading(SQLClientSchemaRegistryITCase.java:102)
> Jun 03 23:51:30   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 03 23:51:30   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jun 03 23:51:30   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jun 03 23:51:30   at java.lang.reflect.Method.invoke(Method.java:498)
> Jun 03 23:51:30   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> Jun 03 23:51:30   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jun 03 23:51:30   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> Jun 03 23:51:30   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jun 03 23:51:30   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> Jun 03 23:51:30   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> Jun 03 23:51:30   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Jun 03 23:51:30   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22869) SQLClientSchemaRegistryITCase timeouts on azure



[ 
https://issues.apache.org/jira/browse/FLINK-22869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412913#comment-17412913
 ] 

Xintong Song commented on FLINK-22869:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23877=logs=08866332-78f7-59e4-4f7e-49a56faa3179=7f606211-1454-543c-70ab-c7a028a1ce8c=27463

> SQLClientSchemaRegistryITCase timeouts on azure
> ---
>
> Key: FLINK-22869
> URL: https://issues.apache.org/jira/browse/FLINK-22869
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.12.4, 1.13.2
>Reporter: Xintong Song
>Priority: Critical
>  Labels: stale-critical, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18652=logs=68a897ab-3047-5660-245a-cce8f83859f6=16ca2cca-2f63-5cce-12d2-d519b930a729=27324
> {code}
> Jun 03 23:51:30 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 227.425 s <<< FAILURE! - in 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
> Jun 03 23:51:30 [ERROR] 
> testReading(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 194.931 s  <<< ERROR!
> Jun 03 23:51:30 org.junit.runners.model.TestTimedOutException: test timed out 
> after 12 milliseconds
> Jun 03 23:51:30   at java.lang.Object.wait(Native Method)
> Jun 03 23:51:30   at java.lang.Thread.join(Thread.java:1252)
> Jun 03 23:51:30   at java.lang.Thread.join(Thread.java:1326)
> Jun 03 23:51:30   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.close(KafkaAdminClient.java:541)
> Jun 03 23:51:30   at 
> org.apache.kafka.clients.admin.Admin.close(Admin.java:96)
> Jun 03 23:51:30   at 
> org.apache.kafka.clients.admin.Admin.close(Admin.java:79)
> Jun 03 23:51:30   at 
> org.apache.flink.tests.util.kafka.KafkaContainerClient.createTopic(KafkaContainerClient.java:71)
> Jun 03 23:51:30   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testReading(SQLClientSchemaRegistryITCase.java:102)
> Jun 03 23:51:30   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 03 23:51:30   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jun 03 23:51:30   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jun 03 23:51:30   at java.lang.reflect.Method.invoke(Method.java:498)
> Jun 03 23:51:30   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> Jun 03 23:51:30   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jun 03 23:51:30   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> Jun 03 23:51:30   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jun 03 23:51:30   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> Jun 03 23:51:30   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> Jun 03 23:51:30   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Jun 03 23:51:30   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (FLINK-24213) Java deadlock in QueryableState ClientTest



 [ 
https://issues.apache.org/jira/browse/FLINK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reopened FLINK-24213:
--

Instance on 1.12:
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23877=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=03dca39c-73e8-5aaf-601d-328ae5c35f20=15504

> Java deadlock in QueryableState ClientTest
> --
>
> Key: FLINK-24213
> URL: https://issues.apache.org/jira/browse/FLINK-24213
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Queryable State
>Affects Versions: 1.12.5, 1.15.0
>Reporter: Dawid Wysakowicz
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23750=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d=15476
> {code}
>  Found one Java-level deadlock:
> Sep 08 11:12:50 =
> Sep 08 11:12:50 "Flink Test Client Event Loop Thread 0":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4e380309c8 (object 
> 0x86b2cd50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "main"
> Sep 08 11:12:50 "main":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4ea4004068 (object 
> 0x86b2cf50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "Flink Test Client Event Loop Thread 0"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-24213) Java deadlock in QueryableState ClientTest



 [ 
https://issues.apache.org/jira/browse/FLINK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-24213:
-
Affects Version/s: 1.12.5

> Java deadlock in QueryableState ClientTest
> --
>
> Key: FLINK-24213
> URL: https://issues.apache.org/jira/browse/FLINK-24213
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Queryable State
>Affects Versions: 1.12.5, 1.15.0
>Reporter: Dawid Wysakowicz
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23750=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d=15476
> {code}
>  Found one Java-level deadlock:
> Sep 08 11:12:50 =
> Sep 08 11:12:50 "Flink Test Client Event Loop Thread 0":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4e380309c8 (object 
> 0x86b2cd50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "main"
> Sep 08 11:12:50 "main":
> Sep 08 11:12:50   waiting to lock monitor 0x7f4ea4004068 (object 
> 0x86b2cf50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "Flink Test Client Event Loop Thread 0"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-24001) Support evaluating individual window table-valued function in runtime



[ 
https://issues.apache.org/jira/browse/FLINK-24001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412907#comment-17412907
 ] 

Jingsong Lee commented on FLINK-24001:
--

Never think about Operator state, it is not for big data...

> Support evaluating individual window table-valued function in runtime
> -
>
> Key: FLINK-24001
> URL: https://issues.apache.org/jira/browse/FLINK-24001
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Affects Versions: 1.15.0
>Reporter: JING ZHANG
>Priority: Major
>
> {{Currently, window table-valued function has to be used with other window 
> operation, such as window aggregate, window topN and window join. }}
> {{In the ticket, we aim to support evaluating individual window table-valued 
> function in runtime, which means, introduce an operator to handle this.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #17230: [FLINK-23054][table-blink] Correct upsert optimization by upsert keys



flinkbot commented on pull request #17230:
URL: https://github.com/apache/flink/pull/17230#issuecomment-916571829


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 414d13798addfaecc667880c39b381effb5be830 (Fri Sep 10 
02:00:41 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu closed pull request #17219: [FLINK-23345][python] Limits the version requests to 2.26.0 or above



dianfu closed pull request #17219:
URL: https://github.com/apache/flink/pull/17219


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Reopened] (FLINK-23054) Correct upsert optimization by upsert keys



 [ 
https://issues.apache.org/jira/browse/FLINK-23054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reopened FLINK-23054:
--

Re-open for 1.13

> Correct upsert optimization by upsert keys
> --
>
> Key: FLINK-23054
> URL: https://issues.apache.org/jira/browse/FLINK-23054
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> After FLINK-22901.
> We can use upsert keys to fix upsert join, upsert rank, and upsert sink.
>  * For join and rank: if input has no upsert keys, do not use upsert 
> optimization.
>  * For upsert sink: if input has unique keys but no upsert keys, we need add 
> a materialize operator to produce upsert records.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-23391) KafkaSourceReaderTest.testKafkaSourceMetrics fails on azure

2021-09-09 Thread Huang Xingbo (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-23391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412904#comment-17412904
 ] 

Huang Xingbo commented on FLINK-23391:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23856=logs=c5f0071e-1851-543e-9a45-9ac140befc32=15a22db7-8faa-5b34-3920-d33c9f0ca23c

> KafkaSourceReaderTest.testKafkaSourceMetrics fails on azure
> ---
>
> Key: FLINK-23391
> URL: https://issues.apache.org/jira/browse/FLINK-23391
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.14.0, 1.13.1
>Reporter: Xintong Song
>Priority: Major
>  Labels: test-stability
> Fix For: 1.13.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=20456=logs=c5612577-f1f7-5977-6ff6-7432788526f7=53f6305f-55e6-561c-8f1e-3a1dde2c77df=6783
> {code}
> Jul 14 23:00:26 [ERROR] Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 99.93 s <<< FAILURE! - in 
> org.apache.flink.connector.kafka.source.reader.KafkaSourceReaderTest
> Jul 14 23:00:26 [ERROR] 
> testKafkaSourceMetrics(org.apache.flink.connector.kafka.source.reader.KafkaSourceReaderTest)
>   Time elapsed: 60.225 s  <<< ERROR!
> Jul 14 23:00:26 java.util.concurrent.TimeoutException: Offsets are not 
> committed successfully. Dangling offsets: 
> {15213={KafkaSourceReaderTest-0=OffsetAndMetadata{offset=10, 
> leaderEpoch=null, metadata=''}}}
> Jul 14 23:00:26   at 
> org.apache.flink.core.testutils.CommonTestUtils.waitUtil(CommonTestUtils.java:210)
> Jul 14 23:00:26   at 
> org.apache.flink.connector.kafka.source.reader.KafkaSourceReaderTest.testKafkaSourceMetrics(KafkaSourceReaderTest.java:275)
> Jul 14 23:00:26   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jul 14 23:00:26   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 14 23:00:26   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 14 23:00:26   at java.lang.reflect.Method.invoke(Method.java:498)
> Jul 14 23:00:26   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> Jul 14 23:00:26   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 14 23:00:26   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> Jul 14 23:00:26   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 14 23:00:26   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> Jul 14 23:00:26   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
> Jul 14 23:00:26   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 14 23:00:26   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> Jul 14 23:00:26   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> Jul 14 23:00:26   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> Jul 14 23:00:26   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> Jul 14 23:00:26   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> Jul 14 23:00:26   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> Jul 14 23:00:26   at org.junit.runners.Suite.runChild(Suite.java:128)
> Jul 14 23:00:26   at org.junit.runners.Suite.runChild(Suite.java:27)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> Jul 14 23:00:26   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> Jul 14 23:00:26

[GitHub] [flink] JingsongLi opened a new pull request #17230: [FLINK-23054][table-blink] Correct upsert optimization by upsert keys



JingsongLi opened a new pull request #17230:
URL: https://github.com/apache/flink/pull/17230


   Cherry-pick #16239
   
   ## What is the purpose of the change
   
   After FLINK-22901.
   
   We can use upsert keys to fix upsert join, upsert rank, and upsert sink.
   
   - For join and rank: if input has no upsert keys, do not use upsert 
optimization.
   - For upsert sink: if input has unique keys but no upsert keys, we need add 
a materialize operator to produce upsert records.
   
   ## Brief change log
   
   - Join should use upsert keys instead of unique keys to do upsert
   - Rank should use upsert keys instead of unique keys to do upsert
   - TemporalJoin should use upsert keys instead of unique keys to do upsert
   - Introduce materialize operator to produce upsert records before sink.
   
   ## Verifying this change
   
   - UpsertTest
   - UpsertITCase
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no) no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no) no
 - The serializers: (yes / no / don't know) no
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know) no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't 
know) no
 - The S3 file system connector: (yes / no / don't know) no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no) no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-22901) Introduce getChangeLogUpsertKeys in FlinkRelMetadataQuery



[ 
https://issues.apache.org/jira/browse/FLINK-22901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412902#comment-17412902
 ] 

Jingsong Lee edited comment on FLINK-22901 at 9/10/21, 1:35 AM:


release-1.13: 3b4f2301c41e9d67af03bdae5b459575b5224ff8


was (Author: lzljs3620320):
release-1.13: 22901

> Introduce getChangeLogUpsertKeys in FlinkRelMetadataQuery
> -
>
> Key: FLINK-22901
> URL: https://issues.apache.org/jira/browse/FLINK-22901
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0, 1.13.3
>
>
> For fix FLINK-20374, we need to resolve streaming computation disorder. we 
> need to introduce a change log upsert keys, this is not unique keys.
>  
> {code:java}
> /**
>  * Determines the set of change log upsert minimal keys for this expression. 
> A key is
>  * represented as an {@link org.apache.calcite.util.ImmutableBitSet}, where 
> each bit position
>  * represents a 0-based output column ordinal.
>  *
>  * Different from the unique keys: In distributed streaming computing, one 
> record may be
>  * divided into RowKind.UPDATE_BEFORE and RowKind.UPDATE_AFTER. If a key 
> changing join is
>  * connected downstream, the two records will be divided into different 
> tasks, resulting in
>  * disorder. In this case, the downstream cannot rely on the order of the 
> original key. So in
>  * this case, it has unique keys in the traditional sense, but it doesn't 
> have change log upsert
>  * keys.
>  *
>  * @return set of keys, or null if this information cannot be determined 
> (whereas empty set
>  * indicates definitely no keys at all)
>  */
> public Set getChangeLogUpsertKeys(RelNode rel);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-22901) Introduce getChangeLogUpsertKeys in FlinkRelMetadataQuery



 [ 
https://issues.apache.org/jira/browse/FLINK-22901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-22901.

Resolution: Fixed

release-1.13: 22901

> Introduce getChangeLogUpsertKeys in FlinkRelMetadataQuery
> -
>
> Key: FLINK-22901
> URL: https://issues.apache.org/jira/browse/FLINK-22901
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0, 1.13.3
>
>
> For fix FLINK-20374, we need to resolve streaming computation disorder. we 
> need to introduce a change log upsert keys, this is not unique keys.
>  
> {code:java}
> /**
>  * Determines the set of change log upsert minimal keys for this expression. 
> A key is
>  * represented as an {@link org.apache.calcite.util.ImmutableBitSet}, where 
> each bit position
>  * represents a 0-based output column ordinal.
>  *
>  * Different from the unique keys: In distributed streaming computing, one 
> record may be
>  * divided into RowKind.UPDATE_BEFORE and RowKind.UPDATE_AFTER. If a key 
> changing join is
>  * connected downstream, the two records will be divided into different 
> tasks, resulting in
>  * disorder. In this case, the downstream cannot rely on the order of the 
> original key. So in
>  * this case, it has unique keys in the traditional sense, but it doesn't 
> have change log upsert
>  * keys.
>  *
>  * @return set of keys, or null if this information cannot be determined 
> (whereas empty set
>  * indicates definitely no keys at all)
>  */
> public Set getChangeLogUpsertKeys(RelNode rel);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] JingsongLi merged pull request #17207: [FLINK-22901][table-planner-blink] Introduce getUpsertKeys in FlinkRelMetadataQuery



JingsongLi merged pull request #17207:
URL: https://github.com/apache/flink/pull/17207


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-21647) 'Run kubernetes session test (default input)' failed on Azure

2021-09-09 Thread Kevin (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-21647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412895#comment-17412895
 ] 

Kevin commented on FLINK-21647:
---

Hello, as I face this issue currently with Flink 1.13.2 on Amazone Kubernetes 
Service 1.20.7 (AKS) when starting a flink session (./bin/kubernetes-session.sh 
-Dkubernetes.cluster-id=my-first-flink-cluster): What was the outcome. How can 
I solve this issue on my cluster? Thanks

[~jark], [~trohrmann], [~wangyang0918]

> 'Run kubernetes session test (default input)' failed on Azure
> -
>
> Key: FLINK-21647
> URL: https://issues.apache.org/jira/browse/FLINK-21647
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.0
>Reporter: Jark Wu
>Priority: Blocker
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14236=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529=2247



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21687) e2e "Run kubernetes session test (default input)" timed out due to k8s issue

2021-09-09 Thread Kevin Laubis (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-21687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412893#comment-17412893
 ] 

Kevin Laubis commented on FLINK-21687:
--

Hello, as I face this issue currently with Flink 1.13.2 on Amazone Kubernetes 
Service 1.20.7 (AKS): What was the outcome. How can I solve this issue on my 
cluster? Thanks

> e2e "Run kubernetes session test (default input)" timed out due to k8s issue
> 
>
> Key: FLINK-21687
> URL: https://issues.apache.org/jira/browse/FLINK-21687
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Reporter: Matthias
>Priority: Major
>  Labels: test-stability
>
> [This 
> build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14195=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529=2267]
>  has a timeout in "Run kubernetes session tesst (default input)" due to 
> something that looks like a deployment issue:
> {code}
> 2021-03-05T12:35:39.6076147Z Mar 05 12:35:39 Flink logs:
> 2021-03-05T12:35:39.8530781Z Mar 05 12:35:39 sed: couldn't open temporary 
> file /opt/flink/conf/sedn4tFnv: Read-only file system
> 2021-03-05T12:35:39.8532224Z Mar 05 12:35:39 sed: couldn't open temporary 
> file /opt/flink/conf/sedzaTQBz: Read-only file system
> 2021-03-05T12:35:39.8533317Z Mar 05 12:35:39 /docker-entrypoint.sh: line 73: 
> /opt/flink/conf/flink-conf.yaml: Read-only file system
> 2021-03-05T12:35:39.8534405Z Mar 05 12:35:39 sed: couldn't open temporary 
> file /opt/flink/conf/sedzrubvy: Read-only file system
> 2021-03-05T12:35:39.8535483Z Mar 05 12:35:39 /docker-entrypoint.sh: line 88: 
> /opt/flink/conf/flink-conf.yaml.tmp: Read-only file system
> 2021-03-05T12:35:39.8536615Z Mar 05 12:35:39 Error: Could not find or load 
> main class 
> org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #17229: [WIP][FLINK-23381][state] Back-pressure on reaching state change to upload limit



flinkbot edited a comment on pull request #17229:
URL: https://github.com/apache/flink/pull/17229#issuecomment-916519510


   
   ## CI report:
   
   * 30c754d27fca90d410964a7549f53252e7b2a507 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23883)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #17229: [WIP][FLINK-23381][state] Back-pressure on reaching state change to upload limit



flinkbot commented on pull request #17229:
URL: https://github.com/apache/flink/pull/17229#issuecomment-916519510


   
   ## CI report:
   
   * 30c754d27fca90d410964a7549f53252e7b2a507 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-23381) Provide backpressure (currently job fails if a limit is hit)

2021-09-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-23381:
---
Labels: pull-request-available  (was: )

> Provide backpressure (currently job fails if a limit is hit)
> 
>
> Key: FLINK-23381
> URL: https://issues.apache.org/jira/browse/FLINK-23381
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / State Backends
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> With the current approach, job will fail if dstl.dfs.upload.max-in-flight 
> (bytes) is reached.
>  
> Unsetting the limit roughly matches the current behaviour for other backends: 
> async phase doesn't backpressure
> (state.backend.rocksdb.checkpoint.transfer.thread.num only sets the upload 
> thread pool size which uses an unbounded queue).
>  
> Note that blocking caller in DfsWriter.persistInternal() will also block 
> regular stream processing (because of pre-emptive writes). This may or may 
> not be desired behaviour.
>  
> Blocking sync phase of a snapshot can also have some issues (e.g. not being 
> able to abort the checkpoint) which should be considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #17229: [WIP][FLINK-23381][state] Back-pressure on reaching state change to upload limit



flinkbot commented on pull request #17229:
URL: https://github.com/apache/flink/pull/17229#issuecomment-916508664


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 30c754d27fca90d410964a7549f53252e7b2a507 (Thu Sep 09 
23:28:32 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-23381) Provide backpressure (currently job fails if a limit is hit)

2021-09-09 Thread Roman Khachatryan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan reassigned FLINK-23381:
-

Assignee: Roman Khachatryan

> Provide backpressure (currently job fails if a limit is hit)
> 
>
> Key: FLINK-23381
> URL: https://issues.apache.org/jira/browse/FLINK-23381
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / State Backends
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.15.0
>
>
> With the current approach, job will fail if dstl.dfs.upload.max-in-flight 
> (bytes) is reached.
>  
> Unsetting the limit roughly matches the current behaviour for other backends: 
> async phase doesn't backpressure
> (state.backend.rocksdb.checkpoint.transfer.thread.num only sets the upload 
> thread pool size which uses an unbounded queue).
>  
> Note that blocking caller in DfsWriter.persistInternal() will also block 
> regular stream processing (because of pre-emptive writes). This may or may 
> not be desired behaviour.
>  
> Blocking sync phase of a snapshot can also have some issues (e.g. not being 
> able to abort the checkpoint) which should be considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



flinkbot edited a comment on pull request #17227:
URL: https://github.com/apache/flink/pull/17227#issuecomment-916400703


   
   ## CI report:
   
   * 4d1caa7cda89a0f6df2010925d4543507cf53dbf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23880)
 
   * 4c250aeecd45453d6028d0bf25174d5812893da6 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23882)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17188: [FLINK-23864][docs] Add pulsar connector document



flinkbot edited a comment on pull request #17188:
URL: https://github.com/apache/flink/pull/17188#issuecomment-914560045


   
   ## CI report:
   
   * 12809bb05f44589b9dde53d53397a06c3d6a03c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23875)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-23016) Job client must be a Coordination Request Gateway when submit a job on web ui

2021-09-09 Thread Flink Jira Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-23016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-23016:
---
  Labels: auto-deprioritized-critical auto-deprioritized-major  (was: 
auto-deprioritized-critical stale-major)
Priority: Minor  (was: Major)

This issue was labeled "stale-major" 7 days ago and has not received any 
updates so it is being deprioritized. If this ticket is actually Major, please 
raise the priority and ask a committer to assign you the issue or revive the 
public discussion.


> Job client must be a Coordination Request Gateway when submit a job on web ui 
> --
>
> Key: FLINK-23016
> URL: https://issues.apache.org/jira/browse/FLINK-23016
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Affects Versions: 1.13.1
> Environment: flink: 1.13.1
> flink-cdc: com.alibaba.ververica:flink-connector-postgres-cdc:1.4.0
> jdk:1.8
>Reporter: wen qi
>Priority: Minor
>  Labels: auto-deprioritized-critical, auto-deprioritized-major
> Attachments: WechatIMG10.png, WechatIMG11.png, WechatIMG8.png
>
>
> I used flink cdc to collect data,and use table api to  transfer data  and 
> write to another table.
> That's all ritht when I run code in IDE and submit jar of jobs use cli, but 
> web ui
> When I use StreamTableEnvironment.from('table-path').execute(), it's failed! 
> please check my attachments , it seems that a  bug of web ui bug ? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



flinkbot edited a comment on pull request #17227:
URL: https://github.com/apache/flink/pull/17227#issuecomment-916400703


   
   ## CI report:
   
   * 4d1caa7cda89a0f6df2010925d4543507cf53dbf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23880)
 
   * 4c250aeecd45453d6028d0bf25174d5812893da6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17226: [FLINK-24222] Migrate ReporterSetupTest to ContextClassLoaderExtension



flinkbot edited a comment on pull request #17226:
URL: https://github.com/apache/flink/pull/17226#issuecomment-916338880


   
   ## CI report:
   
   * 9a33d7d1b7dd46d165c3645e5a3766e843fb25f0 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23874)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #16988: [FLINK-23458][docs] Added the network buffer documentation along wit…



flinkbot edited a comment on pull request #16988:
URL: https://github.com/apache/flink/pull/16988#issuecomment-905730008


   
   ## CI report:
   
   * 56620990c0b4f82b487fe8028e9379cc59bc9f08 UNKNOWN
   * c3bb93196259844291162a0fd4c10e82c3446e55 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23869)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-24238) Page title missing

2021-09-09 Thread Jun Qin (Jira)

Jun Qin created FLINK-24238:
---

 Summary: Page title missing
 Key: FLINK-24238
 URL: https://issues.apache.org/jira/browse/FLINK-24238
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.13.2
Reporter: Jun Qin


the page title is missing on this Flink doc: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/concepts/versioned_tables/].
  
 
[This 
one|https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/concepts/dynamic_tables/]
 is a good example.
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] scudellari commented on a change in pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



scudellari commented on a change in pull request #17227:
URL: https://github.com/apache/flink/pull/17227#discussion_r705698738



##
File path: flink-core/src/main/java/org/apache/flink/util/NetUtils.java
##
@@ -116,6 +119,39 @@ private static URL validateHostPortString(String hostPort) 
{
 }
 }
 
+/**
+ * Calls {@link ServerSocket#accept()} on the provided server socket, 
suppressing any thrown
+ * {@link SocketTimeoutException}s. This is a workaround for the 
underlying JDK-8237858 bug in
+ * JDK 11 that can cause errant SocketTimeoutExceptions to be thrown at 
unexpected times.
+ *
+ * This method expects the provided ServerSocket has no timeout set 
(SO_TIMEOUT of 0),
+ * indicating an infinite timeout. It will suppress all 
SocketTimeoutExceptions, even if a
+ * ServerSocket with a non-zero timeout is passed in.
+ *
+ * @param serverSocket a ServerSocket with {@link SocketOptions#SO_TIMEOUT 
SO_TIMEOUT} set to 0;
+ * if SO_TIMEOUT is greater than 0, then this method will suppress 
SocketTimeoutException;

Review comment:
   Sounds good




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zentol commented on a change in pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



zentol commented on a change in pull request #17227:
URL: https://github.com/apache/flink/pull/17227#discussion_r705695170



##
File path: flink-core/src/main/java/org/apache/flink/util/NetUtils.java
##
@@ -116,6 +119,39 @@ private static URL validateHostPortString(String hostPort) 
{
 }
 }
 
+/**
+ * Calls {@link ServerSocket#accept()} on the provided server socket, 
suppressing any thrown
+ * {@link SocketTimeoutException}s. This is a workaround for the 
underlying JDK-8237858 bug in
+ * JDK 11 that can cause errant SocketTimeoutExceptions to be thrown at 
unexpected times.
+ *
+ * This method expects the provided ServerSocket has no timeout set 
(SO_TIMEOUT of 0),
+ * indicating an infinite timeout. It will suppress all 
SocketTimeoutExceptions, even if a
+ * ServerSocket with a non-zero timeout is passed in.
+ *
+ * @param serverSocket a ServerSocket with {@link SocketOptions#SO_TIMEOUT 
SO_TIMEOUT} set to 0;
+ * if SO_TIMEOUT is greater than 0, then this method will suppress 
SocketTimeoutException;

Review comment:
   >The fact that it throws IOExceptions gave me pause.
   
   That's a fair point but it's probably not an issue. I'd wager a guess that 
this only fails if say, the socket is already closed. If getting an option 
throws an error the socket is probably unusable anyway 路 
   
   Performance doesn't matter in this case; sockets aren't created on 
performance-critical paths.
   
   So let's do something like
   ```
   Preconditions.checkArgument(serverSocket.getSoTimeout() == 0, "");
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zentol commented on a change in pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



zentol commented on a change in pull request #17227:
URL: https://github.com/apache/flink/pull/17227#discussion_r705690463



##
File path: flink-core/src/test/java/org/apache/flink/util/NetUtilsTest.java
##
@@ -49,6 +58,30 @@ public void testParseHostPortAddress() {
 assertEquals(socketAddress, 
NetUtils.parseHostPortAddress("foo.com:8080"));
 }
 
+@Test
+public void testAcceptWithoutTimeout() throws IOException {
+// Validates that acceptWithoutTimeout suppresses all 
SocketTimeoutExceptions
+ServerSocket serverSocket = mock(ServerSocket.class);

Review comment:
   We had a fair amount of maintenance overhead due to mockito in the past; 
instead we aim to write code in such a way that mocking isn't necessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



flinkbot edited a comment on pull request #17227:
URL: https://github.com/apache/flink/pull/17227#issuecomment-916400703


   
   ## CI report:
   
   * 4d1caa7cda89a0f6df2010925d4543507cf53dbf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23880)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #17228: [FLINK-24236] Migrate tests to factory approach



flinkbot edited a comment on pull request #17228:
URL: https://github.com/apache/flink/pull/17228#issuecomment-916400792


   
   ## CI report:
   
   * 6688227d5c3120d7dc893b22892435f1c14539c0 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=23881)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] scudellari commented on a change in pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



scudellari commented on a change in pull request #17227:
URL: https://github.com/apache/flink/pull/17227#discussion_r705677409



##
File path: flink-core/src/test/java/org/apache/flink/util/NetUtilsTest.java
##
@@ -49,6 +58,30 @@ public void testParseHostPortAddress() {
 assertEquals(socketAddress, 
NetUtils.parseHostPortAddress("foo.com:8080"));
 }
 
+@Test
+public void testAcceptWithoutTimeout() throws IOException {
+// Validates that acceptWithoutTimeout suppresses all 
SocketTimeoutExceptions
+ServerSocket serverSocket = mock(ServerSocket.class);

Review comment:
   Good point, will do. Out of curiosity, what is the motivation for 
avoiding mockito? (Apologies if this is well documented already)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] scudellari commented on a change in pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



scudellari commented on a change in pull request #17227:
URL: https://github.com/apache/flink/pull/17227#discussion_r705676769



##
File path: flink-core/src/main/java/org/apache/flink/util/NetUtils.java
##
@@ -116,6 +119,39 @@ private static URL validateHostPortString(String hostPort) 
{
 }
 }
 
+/**
+ * Calls {@link ServerSocket#accept()} on the provided server socket, 
suppressing any thrown
+ * {@link SocketTimeoutException}s. This is a workaround for the 
underlying JDK-8237858 bug in
+ * JDK 11 that can cause errant SocketTimeoutExceptions to be thrown at 
unexpected times.
+ *
+ * This method expects the provided ServerSocket has no timeout set 
(SO_TIMEOUT of 0),
+ * indicating an infinite timeout. It will suppress all 
SocketTimeoutExceptions, even if a
+ * ServerSocket with a non-zero timeout is passed in.
+ *
+ * @param serverSocket a ServerSocket with {@link SocketOptions#SO_TIMEOUT 
SO_TIMEOUT} set to 0;
+ * if SO_TIMEOUT is greater than 0, then this method will suppress 
SocketTimeoutException;

Review comment:
   We could. I started down this path, but I was not sure what the 
performance implications of calling `getSoTimeout` would be. The fact that it 
throws IOExceptions gave me pause. Given that there is only a single case where 
a socket is setting a timeout (in a test none the less) I felt it was safer to 
avoid the call altogether.
   
   Granted, this landscape could change over time and this certainly would be a 
confusing bug. (suppressing timeouts unexpectedly) I named the method in a way 
that I hoped it would minimize the chances of this sort of thing.
   
   I'm happy to go either way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #17228: [FLINK-24236] Migrate tests to factory approach



flinkbot commented on pull request #17228:
URL: https://github.com/apache/flink/pull/17228#issuecomment-916400792


   
   ## CI report:
   
   * 6688227d5c3120d7dc893b22892435f1c14539c0 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



flinkbot commented on pull request #17227:
URL: https://github.com/apache/flink/pull/17227#issuecomment-916400703


   
   ## CI report:
   
   * 4d1caa7cda89a0f6df2010925d4543507cf53dbf UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zentol commented on a change in pull request #17227: [FLINK-24156][runtime/network] Catch erroneous SocketTimeoutExceptions and retry



zentol commented on a change in pull request #17227:
URL: https://github.com/apache/flink/pull/17227#discussion_r705654583



##
File path: flink-core/src/main/java/org/apache/flink/util/NetUtils.java
##
@@ -116,6 +119,39 @@ private static URL validateHostPortString(String hostPort) 
{
 }
 }
 
+/**
+ * Calls {@link ServerSocket#accept()} on the provided server socket, 
suppressing any thrown
+ * {@link SocketTimeoutException}s. This is a workaround for the 
underlying JDK-8237858 bug in
+ * JDK 11 that can cause errant SocketTimeoutExceptions to be thrown at 
unexpected times.
+ *
+ * This method expects the provided ServerSocket has no timeout set 
(SO_TIMEOUT of 0),
+ * indicating an infinite timeout. It will suppress all 
SocketTimeoutExceptions, even if a
+ * ServerSocket with a non-zero timeout is passed in.
+ *
+ * @param serverSocket a ServerSocket with {@link SocketOptions#SO_TIMEOUT 
SO_TIMEOUT} set to 0;
+ * if SO_TIMEOUT is greater than 0, then this method will suppress 
SocketTimeoutException;
+ * must not be null
+ * @return the new Socket
+ * @exception IOException if an I/O error occurs when waiting for a 
connection.
+ * @exception SecurityException if a security manager exists and its 
{@code checkAccept} method

Review comment:
   We shouldn't copy code from the JDK verbatim. Let's just add a reference 
to `ServerSocket#accept`.

##
File path: flink-core/src/main/java/org/apache/flink/util/NetUtils.java
##
@@ -116,6 +119,39 @@ private static URL validateHostPortString(String hostPort) 
{
 }
 }
 
+/**
+ * Calls {@link ServerSocket#accept()} on the provided server socket, 
suppressing any thrown
+ * {@link SocketTimeoutException}s. This is a workaround for the 
underlying JDK-8237858 bug in
+ * JDK 11 that can cause errant SocketTimeoutExceptions to be thrown at 
unexpected times.
+ *
+ * This method expects the provided ServerSocket has no timeout set 
(SO_TIMEOUT of 0),
+ * indicating an infinite timeout. It will suppress all 
SocketTimeoutExceptions, even if a
+ * ServerSocket with a non-zero timeout is passed in.
+ *
+ * @param serverSocket a ServerSocket with {@link SocketOptions#SO_TIMEOUT 
SO_TIMEOUT} set to 0;
+ * if SO_TIMEOUT is greater than 0, then this method will suppress 
SocketTimeoutException;

Review comment:
   Could we not reject such sockets via `ServerSocket#getSoTimeout`?

##
File path: flink-core/src/test/java/org/apache/flink/util/NetUtilsTest.java
##
@@ -49,6 +58,30 @@ public void testParseHostPortAddress() {
 assertEquals(socketAddress, 
NetUtils.parseHostPortAddress("foo.com:8080"));
 }
 
+@Test
+public void testAcceptWithoutTimeout() throws IOException {
+// Validates that acceptWithoutTimeout suppresses all 
SocketTimeoutExceptions
+ServerSocket serverSocket = mock(ServerSocket.class);

Review comment:
   Instead of relying on mockito which we very much want to avoid, we 
instead can create a `ServerSocket` subclass and override `accept()` 
accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #17228: [FLINK-24236] Migrate tests to factory approach