[GitHub] [flink] klion26 commented on a change in pull request #13505: [FLINK-19420][docs-zh] Translate "Program Packaging" page into Chinese

2020-10-13 Thread GitBox


klion26 commented on a change in pull request #13505:
URL: https://github.com/apache/flink/pull/13505#discussion_r504417756



##
File path: docs/dev/packaging.zh.md
##
@@ -24,36 +24,20 @@ under the License.
 -->
 
 
-As described earlier, Flink programs can be executed on
-clusters by using a `remote environment`. Alternatively, programs can be 
packaged into JAR Files
-(Java Archives) for execution. Packaging the program is a prerequisite to 
executing them through the
-[command line interface]({{ site.baseurl }}/ops/cli.html).
-
-### Packaging Programs
-
-To support execution from a packaged JAR file via the command line or web 
interface, a program must
-use the environment obtained by 
`StreamExecutionEnvironment.getExecutionEnvironment()`. This environment
-will act as the cluster's environment when the JAR is submitted to the command 
line or web
-interface. If the Flink program is invoked differently than through these 
interfaces, the
-environment will act like a local environment.
-
-To package the program, simply export all involved classes as a JAR file. The 
JAR file's manifest
-must point to the class that contains the program's *entry point* (the class 
with the public
-`main` method). The simplest way to do this is by putting the *main-class* 
entry into the
-manifest (such as `main-class: org.apache.flinkexample.MyProgram`). The 
*main-class* attribute is
-the same one that is used by the Java Virtual Machine to find the main method 
when executing a JAR
-files through the command `java -jar pathToTheJarFile`. Most IDEs offer to 
include that attribute
-automatically when exporting JAR files.
+正如之前所描述的,Flink 程序可以使用 `remote environment` 在集群上执行。或者,程序可以被打包成 JAR 文件(Java 
Archives)执行。如果使用[命令行]({% link ops/cli.zh.md %})的方式执行程序,将程序打包是必需的。
+
+### 打包程序

Review comment:
   标题翻译之后,需要添加 `` 标签,可以参考 
[wiki](https://cwiki.apache.org/confluence/display/FLINK/Flink+Translation+Specifications)
 锚点可以参考 英文版的 url

##
File path: docs/dev/packaging.zh.md
##
@@ -24,36 +24,20 @@ under the License.
 -->
 
 
-As described earlier, Flink programs can be executed on
-clusters by using a `remote environment`. Alternatively, programs can be 
packaged into JAR Files
-(Java Archives) for execution. Packaging the program is a prerequisite to 
executing them through the
-[command line interface]({{ site.baseurl }}/ops/cli.html).
-
-### Packaging Programs
-
-To support execution from a packaged JAR file via the command line or web 
interface, a program must
-use the environment obtained by 
`StreamExecutionEnvironment.getExecutionEnvironment()`. This environment
-will act as the cluster's environment when the JAR is submitted to the command 
line or web
-interface. If the Flink program is invoked differently than through these 
interfaces, the
-environment will act like a local environment.
-
-To package the program, simply export all involved classes as a JAR file. The 
JAR file's manifest
-must point to the class that contains the program's *entry point* (the class 
with the public
-`main` method). The simplest way to do this is by putting the *main-class* 
entry into the
-manifest (such as `main-class: org.apache.flinkexample.MyProgram`). The 
*main-class* attribute is
-the same one that is used by the Java Virtual Machine to find the main method 
when executing a JAR
-files through the command `java -jar pathToTheJarFile`. Most IDEs offer to 
include that attribute
-automatically when exporting JAR files.
+正如之前所描述的,Flink 程序可以使用 `remote environment` 在集群上执行。或者,程序可以被打包成 JAR 文件(Java 
Archives)执行。如果使用[命令行]({% link ops/cli.zh.md %})的方式执行程序,将程序打包是必需的。
+
+### 打包程序
+
+为了能够通过命令行或 web 界面执行打包的 JAR 文件,程序必须使用通过 
`StreamExecutionEnvironment.getExecutionEnvironment()` 获取的 environment。当 JAR 
被提交到命令行或 web 界面后,该 environment 会扮演集群环境的角色。如果调用 Flink 程序的方式与上述接口不同,该 environment 
会扮演本地环境的角色。
+
+打包程序只要简单地将所有相关的类导出为 JAR 文件,JAR 文件的 manifest 必须指向包含程序*入口点*(拥有公共 `main` 
方法)的类。实现的最简单的方法是将 *main-class* 写入 manifest 中(比如 `main-class: 
org.apache.flinkexample.MyProgram`)。*main-class* 属性与 Java 虚拟机通过指令 `java -jar 
pathToTheJarFile` 执行 JAR 文件时寻找 main 方法的类是相同的。大多数 IDE 提供了在导出 JAR 文件时自动包含该属性的功能。
 
 ### Summary

Review comment:
   这个标题也翻译下?

##
File path: docs/dev/packaging.zh.md
##
@@ -24,36 +24,20 @@ under the License.
 -->
 
 
-As described earlier, Flink programs can be executed on
-clusters by using a `remote environment`. Alternatively, programs can be 
packaged into JAR Files
-(Java Archives) for execution. Packaging the program is a prerequisite to 
executing them through the
-[command line interface]({{ site.baseurl }}/ops/cli.html).
-
-### Packaging Programs
-
-To support execution from a packaged JAR file via the command line or web 
interface, a program must
-use the environment obtained by 
`StreamExecutionEnvironment.getExecutionEnvironment()`. This environment
-will act as the cluster's environment when the JAR is submitted to the command 
line or web
-interface. If the Flink program is 

[jira] [Updated] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-19630:
-
Fix Version/s: (was: 1.11.3)
   1.12.0

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.12.0
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213625#comment-17213625
 ] 

Jingsong Lee commented on FLINK-19630:
--

Hi [~neighborhood], can you use hive 1.2.2 dependencies? Just change the hive 
version in Flink side, Hive server can still use 2.0.x.

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.11.3
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #13626: [FLINK-19594][flink-runtime-web]modify Subtasks starting index from 0

2020-10-13 Thread GitBox


flinkbot commented on pull request #13626:
URL: https://github.com/apache/flink/pull/13626#issuecomment-708171532


   
   ## CI report:
   
   * 1f6974a2251daef826687b981fd5a5fb428fe66c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13625: [FLINK-19623][table-planner-blink] Introduce ExecEdge to describe information on input edges for ExecNode

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13625:
URL: https://github.com/apache/flink/pull/13625#issuecomment-708152298


   
   ## CI report:
   
   * 2599ad69ed0b2117dff5fd47eb5b51208a2b9efb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7567)
 
   * 1d3dfd2b49f980db8ac9189ad4bf0d310e19468c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13580: [hotfix][flink-runtime-web] modify checkpoint time format

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13580:
URL: https://github.com/apache/flink/pull/13580#issuecomment-706475658


   
   ## CI report:
   
   * 64e06319bfd0a593c645f9449ce5605c9a00de2c Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7359)
 
   * 1448ef2dd316064c0a3c14e8bad22678f9242314 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7568)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13611: [FLINK-19480][python] Support RichFunction in Python DataStream API

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13611:
URL: https://github.com/apache/flink/pull/13611#issuecomment-707634048


   
   ## CI report:
   
   * 97bf0d8b9d1541d8ecb077fa977e2fe3c7a15d2d UNKNOWN
   * bbf13d4f68c7c6450d264d862eaf9db45ff80e6d Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7558)
 
   * d0e3faf04d6f9e24387d07b8175ff33d205426e6 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7569)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangXingBo commented on pull request #13504: [FLINK-19404][python] Support Pandas Stream Over Window Aggregation

2020-10-13 Thread GitBox


HuangXingBo commented on pull request #13504:
URL: https://github.com/apache/flink/pull/13504#issuecomment-708170328


   @dianfu Thanks a lot for the review. I have addressed the comments at the 
latest commit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13626: [FLINK-19594][flink-runtime-web]modify Subtasks starting index from 0

2020-10-13 Thread GitBox


flinkbot commented on pull request #13626:
URL: https://github.com/apache/flink/pull/13626#issuecomment-708168756


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 1f6974a2251daef826687b981fd5a5fb428fe66c (Wed Oct 14 
05:37:38 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-19594).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19594) SubTasks start index don't unified and may confuse users

2020-10-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19594:
---
Labels: pull-request-available  (was: )

> SubTasks start index don't unified and may confuse users
> 
>
> Key: FLINK-19594
> URL: https://issues.apache.org/jira/browse/FLINK-19594
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Web Frontend
>Reporter: zlzhang0122
>Priority: Major
>  Labels: pull-request-available
> Attachments: BackPresures.png, Checkpoints.png, SubTasks.png
>
>
> In flink web ui page, subTasks index start from 0 in SubTasks tab while in 
> BackPressure tag start from 1, at the same time the subTasks index start from 
> 1 in Checkpoints page.I think this may confuse users and does there have some 
> design purpose ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zlzhang0122 opened a new pull request #13626: [FLINK-19594][flink-runtime-web]modify Subtasks starting index from 0

2020-10-13 Thread GitBox


zlzhang0122 opened a new pull request #13626:
URL: https://github.com/apache/flink/pull/13626


   What is the purpose of the change
   This pull request modified web ui subtasks starting index from 0, which can 
make consistent in the REST API.
   
   Brief change log
   *modify web ui subtasks starting index, remove +1 syntax
   
   Verifying this change
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Documentation
   This pull request doesn't introduce a new feature.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13611: [FLINK-19480][python] Support RichFunction in Python DataStream API

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13611:
URL: https://github.com/apache/flink/pull/13611#issuecomment-707634048


   
   ## CI report:
   
   * 97bf0d8b9d1541d8ecb077fa977e2fe3c7a15d2d UNKNOWN
   * bbf13d4f68c7c6450d264d862eaf9db45ff80e6d Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7558)
 
   * d0e3faf04d6f9e24387d07b8175ff33d205426e6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13580: [hotfix][flink-runtime-web] modify checkpoint time format

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13580:
URL: https://github.com/apache/flink/pull/13580#issuecomment-706475658


   
   ## CI report:
   
   * 64e06319bfd0a593c645f9449ce5605c9a00de2c Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7359)
 
   * 1448ef2dd316064c0a3c14e8bad22678f9242314 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] dianfu commented on a change in pull request #13504: [FLINK-19404][python] Support Pandas Stream Over Window Aggregation

2020-10-13 Thread GitBox


dianfu commented on a change in pull request #13504:
URL: https://github.com/apache/flink/pull/13504#discussion_r504406366



##
File path: 
flink-python/src/main/java/org/apache/flink/table/runtime/operators/python/aggregate/arrow/stream/AbstractStreamArrowPythonBoundedRowsOperator.java
##
@@ -158,88 +161,46 @@ void registerProcessingCleanupTimer(long currentTime) 
throws Exception {
}
 
void triggerWindowProcess(List inputs, int i, int index) 
throws Exception {
-   int startIndex;
-   int startPos = 0;
if (windowData.isEmpty()) {
if (i >= lowerBoundary) {
for (int j = (int) (i - lowerBoundary); j <= i; 
j++) {
-   RowData rowData = inputs.get(j);
-   windowData.add(rowData);
-   
arrowSerializer.write(getFunctionInput(rowData));
+   windowData.add(inputs.get(j));
}
currentBatchCount += lowerBoundary;
} else {
+   for (int j = 0; j <= i; j++) {
+   RowData rowData = inputs.get(j);
+   windowData.add(rowData);
+   currentBatchCount++;
+   }
Long previousTimestamp;
-   List previousData = null;
-   int length = 0;
-   startIndex = index - 1;
+   List previousData;
+   int length;
long remainingDataCount = lowerBoundary - i;
ListIterator iter = 
sortedTimestamps.listIterator(index);
while (remainingDataCount > 0 && 
iter.hasPrevious()) {
previousTimestamp = iter.previous();
previousData = 
inputState.get(previousTimestamp);
length = previousData.size();
-   if (remainingDataCount <= length) {
-   startPos = (int) (length - 
remainingDataCount);
-   remainingDataCount = 0;
-   } else {
-   remainingDataCount -= length;
-   startIndex--;
-   }
-   }
-   if (previousData != null) {
-   for (int j = startPos; j < length; j++) 
{
-   RowData rowData = 
previousData.get(j);
-   windowData.add(rowData);
-   
arrowSerializer.write(getFunctionInput(rowData));
+   ListIterator previousDataIter 
= previousData.listIterator(length);
+   while (previousDataIter.hasPrevious() 
&& remainingDataCount > 0) {
+   
windowData.addFirst(previousDataIter.previous());
+   remainingDataCount--;
currentBatchCount++;
}
-   // clear outdated data.

Review comment:
   These logic about clear outdated state are missing in the latest PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19441) Performance regression on 24.09.2020

2020-10-13 Thread Yingjie Cao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213611#comment-17213611
 ] 

Yingjie Cao commented on FLINK-19441:
-

Really sorry for the late responses. I will take a look and will update if I 
have any findings soon.

> Performance regression on 24.09.2020
> 
>
> Key: FLINK-19441
> URL: https://issues.apache.org/jira/browse/FLINK-19441
> Project: Flink
>  Issue Type: Bug
>Reporter: Arvid Heise
>Assignee: Stephan Ewen
>Priority: Blocker
>  Labels: pull-request-available
>
> A couple of benchmarks are showing a small performance regression on 
> 24.09.2020:
> http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow=2
> http://codespeed.dak8s.net:8000/timeline/?ben=tupleKeyBy=2 (?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13625: [FLINK-19623][table-planner-blink] Introduce ExecEdge to describe information on input edges for ExecNode

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13625:
URL: https://github.com/apache/flink/pull/13625#issuecomment-708152298


   
   ## CI report:
   
   * 2599ad69ed0b2117dff5fd47eb5b51208a2b9efb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7567)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13611: [FLINK-19480][python] Support RichFunction in Python DataStream API

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13611:
URL: https://github.com/apache/flink/pull/13611#issuecomment-707634048


   
   ## CI report:
   
   * 97bf0d8b9d1541d8ecb077fa977e2fe3c7a15d2d UNKNOWN
   * bbf13d4f68c7c6450d264d862eaf9db45ff80e6d Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7558)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] gm7y8 edited a comment on pull request #13458: FLINK-18851 [runtime] Add checkpoint type to checkpoint history entries in Web UI

2020-10-13 Thread GitBox


gm7y8 edited a comment on pull request #13458:
URL: https://github.com/apache/flink/pull/13458#issuecomment-708157763


   @AHeise sorry for the delay to respond back. it should simple fix in the UI 
layer with some text changes. I would take the opinion of @XComp @vthinkxie if 
it ok with the above change.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] gm7y8 commented on pull request #13458: FLINK-18851 [runtime] Add checkpoint type to checkpoint history entries in Web UI

2020-10-13 Thread GitBox


gm7y8 commented on pull request #13458:
URL: https://github.com/apache/flink/pull/13458#issuecomment-708157763


   @AHeise sorry for the delay to respond back. it should simple fix in the UI 
layer. I would take the opinion of @XComp @vthinkxie if it ok with the above 
change.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13625: [FLINK-19623][table-planner-blink] Introduce ExecEdge to describe information on input edges for ExecNode

2020-10-13 Thread GitBox


flinkbot commented on pull request #13625:
URL: https://github.com/apache/flink/pull/13625#issuecomment-708152298


   
   ## CI report:
   
   * 2599ad69ed0b2117dff5fd47eb5b51208a2b9efb UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13616: [FLINK-18570][hbase][WIP] SQLClientHBaseITCase.testHBase fails on azure

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13616:
URL: https://github.com/apache/flink/pull/13616#issuecomment-707722760


   
   ## CI report:
   
   * f5f2b5eb85c94a1ecdbcce1901ef342c27cf42fc Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7531)
 
   * be3a69f7f525c963887a9cd0e8e293b9fe5062c9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7566)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13605: [FLINK-19599][table] Introduce BulkFormatFactory to integrate new FileSource to table

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13605:
URL: https://github.com/apache/flink/pull/13605#issuecomment-707521684


   
   ## CI report:
   
   * a9df8a1384eb6306656b4fd952edd4be5d7a857d UNKNOWN
   * 8beabe2e1b04afe449610c44f4c376909b463e10 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7493)
 
   * 76ff23e27070dc54e6af85dd91c2742457621aa9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7565)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Lsw_aka_laplace (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213601#comment-17213601
 ] 

Lsw_aka_laplace commented on FLINK-19630:
-

[~lirui] 

Thanks for ur suggestion. It's hard to change format or hive version. Still I 
found a way to this problem, since we maintain our own table source and sink, I 
can make Kafka Source a single separated stage by config to avoid the problem I 
mentioned above, which is a little hack but does works. 

Looking forwarding to A formal solution~

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.11.3
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213596#comment-17213596
 ] 

Rui Li commented on FLINK-19630:


[~lzljs3620320] Yes this is the same issue as FLINK-13998.
[~neighborhood] It's a known issue with writing to 2.0.x ORC table. As a 
workaround, is it possible to use a different file format, like Parquet? Or a 
different hive version, like 2.1.x? 

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.11.3
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13616: [FLINK-18570][hbase][WIP] SQLClientHBaseITCase.testHBase fails on azure

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13616:
URL: https://github.com/apache/flink/pull/13616#issuecomment-707722760


   
   ## CI report:
   
   * f5f2b5eb85c94a1ecdbcce1901ef342c27cf42fc Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7531)
 
   * be3a69f7f525c963887a9cd0e8e293b9fe5062c9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13617: [FLINK-19531] Implement the sink writer operator

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13617:
URL: https://github.com/apache/flink/pull/13617#issuecomment-707791817


   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 524a25e7d549af7ad9a5e92f39377cbde30059fb Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7564)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13625: [FLINK-19623][table-planner-blink] Introduce ExecEdge to describe information on input edges for ExecNode

2020-10-13 Thread GitBox


flinkbot commented on pull request #13625:
URL: https://github.com/apache/flink/pull/13625#issuecomment-708147417


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 2599ad69ed0b2117dff5fd47eb5b51208a2b9efb (Wed Oct 14 
04:22:39 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-19623).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19623) Introduce ExecEdge to describe information on input edges for ExecNode

2020-10-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19623:
---
Labels: pull-request-available  (was: )

> Introduce ExecEdge to describe information on input edges for ExecNode
> --
>
> Key: FLINK-19623
> URL: https://issues.apache.org/jira/browse/FLINK-19623
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Caizhi Weng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Deadlock breakup algorithm and Multi-input operator creation algorithm need 
> information about the input edges of an exec node, for example what's the 
> priority of this input, and how the input records will trigger the output 
> records.
> We're going to introduce a new class {{ExecEdge}} to describe this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13504: [FLINK-19404][python] Support Pandas Stream Over Window Aggregation

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13504:
URL: https://github.com/apache/flink/pull/13504#issuecomment-700556506


   
   ## CI report:
   
   * 5e9643b76e1c28968baa3cf41492d55bc8ed9ad0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7482)
 
   * 5414040e7b46b1d8f0e6efd4cb4a03e24bb9a788 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7563)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] TsReaper opened a new pull request #13625: [FLINK-19623][table-planner-blink] Introduce ExecEdge to describe information on input edges for ExecNode

2020-10-13 Thread GitBox


TsReaper opened a new pull request #13625:
URL: https://github.com/apache/flink/pull/13625


   ## What is the purpose of the change
   
   Deadlock breakup algorithm and Multi-input operator creation algorithm need 
information about the input edges of an exec node, for example what's the 
priority of this input, and how the input records will trigger the output 
records.
   
   Although `BatchExecNode` currently has a `getDamBehavior` method, it only 
describes the behavior of the node and is not very useful for the new deadlock 
breakup algorithm. So we're going to introduce a new class `ExecEdge` to 
describe this and a new method `getInputEdges` for `ExecNode`.
   
   Current implementation of `getInputEdges` for `ExecNode`s is a collection 
and rewrite of current code (mostly from the `getDamBehavior`) without any test 
coverage. `getInputEdges` will be called when we introduce the new deadlock 
breakup and multi-input operator creation algorithm and will be tested along 
with the algorithm.
   
   ## Brief change log
   
- Introduce `ExecEdge`
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213589#comment-17213589
 ] 

Jingsong Lee commented on FLINK-19630:
--

Should be same reason to FLINK-13998

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.11.3
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213585#comment-17213585
 ] 

Jingsong Lee commented on FLINK-19630:
--

Thanks [~neighborhood] for reporting, nice catch!

It seems that the orc writer requires a single thread, but the writing record 
thread and snapshot state thread (which calls the orc writer's close) are two 
threads.

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.11.3
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-19630:
-
Fix Version/s: 1.11.3

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Fix For: 1.11.3
>
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213583#comment-17213583
 ] 

Jingsong Lee commented on FLINK-19630:
--

CC: [~lirui]

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19531) Implement the writer operator

2020-10-13 Thread Guowei Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guowei Ma updated FLINK-19531:
--
Summary: Implement the writer operator  (was: Implement the 
`WriterOperator`)

> Implement the writer operator
> -
>
> Key: FLINK-19531
> URL: https://issues.apache.org/jira/browse/FLINK-19531
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Reporter: Guowei Ma
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19632) Introduce a new ResultPartitionType for Approximate Local Recovery

2020-10-13 Thread Yuan Mei (Jira)
Yuan Mei created FLINK-19632:


 Summary: Introduce a new ResultPartitionType for Approximate Local 
Recovery
 Key: FLINK-19632
 URL: https://issues.apache.org/jira/browse/FLINK-19632
 Project: Flink
  Issue Type: Sub-task
Reporter: Yuan Mei






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Lsw_aka_laplace (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213579#comment-17213579
 ] 

Lsw_aka_laplace commented on FLINK-19630:
-

[~jark]

[~lzljs3620320]

 

Would u guys mind taking a glimpse~

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Lsw_aka_laplace (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lsw_aka_laplace updated FLINK-19630:

Description: 
*ENV:*

*Flink version 1.11.2*

*Hive exec version: 2.0.1*

*Hive file storing type :ORC*

*SQL or Datastream: SQL API*

*Kafka Connector :  custom Kafka connector which is based on Legacy API 
(TableSource/`org.apache.flink.types.Row`)*

*Hive Connector : totally follows the Flink-Hive-connector (we only made some 
encapsulation upon it)*

*Using StreamingFileCommitter:YES*

 

 

*Description:*

   try to execute the following SQL:

    """

      insert into hive_table (select * from kafka_table)

    """

   HIVE Table SQL seems like:

    """

CREATE TABLE `hive_table`(
 // some fields
PARTITIONED BY (
 `dt` string,
 `hour` string)
STORED AS orc
TBLPROPERTIES (
 'orc.compress'='SNAPPY',
'type'='HIVE',
'sink.partition-commit.trigger'='process-time',
'sink.partition-commit.delay' = '1 h',
'sink.partition-commit.policy.kind' = 'metastore,success-file',
)   

   """

When this job starts to process snapshot, here comes the weird exception:

!image-2020-10-14-11-36-48-086.png|width=882,height=395!

As we can see from the message:Owner thread shall be the [Legacy Source 
Thread], but actually the streamTaskThread which represents the whole first 
stage is found. 

So I checked the Thread dump at once.

!image-2020-10-14-11-41-53-379.png|width=801,height=244!

                                                                     The legacy 
Source Thread

 

!image-2020-10-14-11-42-57-353.png|width=846,height=226!

                                                               The StreamTask 
Thread

 

   According to the thread dump info and the Exception Message, I searched and 
read certain source code and then *{color:#ffab00}DID A TEST{color}*

 

{color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
task topology as follows:{color}

{color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}

 

{color:#505f79}*Fortunately, it did work! No Exception is throwed and 
Checkpoint could be snapshot successfully!*{color}

 

 

So, from my perspective, there shall be something wrong when HiveWritingTask 
and  LegacySourceTask chained together. the Legacy source task is a seperated 
thread, which may be the cause of the exception mentioned above.

 

                                                                

 

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then 

[GitHub] [flink] KarmaGYZ commented on a change in pull request #13592: [FLINK-19324][yarn] Map requested and allocated containers with priority on YARN

2020-10-13 Thread GitBox


KarmaGYZ commented on a change in pull request #13592:
URL: https://github.com/apache/flink/pull/13592#discussion_r504365654



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/TaskExecutorProcessSpecContainerResourcePriorityAdapter.java
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.yarn;
+
+import org.apache.flink.runtime.clusterframework.TaskExecutorProcessSpec;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+
+/**
+ * Utility class for converting between Flink {@link TaskExecutorProcessSpec} 
and Yarn {@link Resource} and {@link Priority}.
+ */
+public class TaskExecutorProcessSpecContainerResourcePriorityAdapter {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(TaskExecutorProcessSpecContainerResourcePriorityAdapter.class);
+
+   private final Map 
taskExecutorProcessSpecToResource;
+   private final Map 
taskExecutorProcessSpecToPriority;

Review comment:
   We could merge these two maps. They should always have the same keyset 
by design.

##
File path: 
flink-yarn/src/test/java/org/apache/flink/yarn/TaskExecutorProcessSpecContainerResourcePriorityAdapterTest.java
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.yarn;
+
+import org.apache.flink.api.common.resources.CPUResource;
+import org.apache.flink.configuration.MemorySize;
+import org.apache.flink.runtime.clusterframework.TaskExecutorProcessSpec;
+import org.apache.flink.util.TestLogger;
+
+import org.apache.hadoop.yarn.api.records.Priority;
+import org.apache.hadoop.yarn.api.records.Resource;
+import org.junit.Test;
+
+import java.util.Collections;
+
+import static org.hamcrest.Matchers.is;
+import static org.hamcrest.Matchers.not;
+import static org.junit.Assert.assertThat;
+
+/**
+ * Tests for {@link TaskExecutorProcessSpecContainerResourcePriorityAdapter}.
+ */
+public class TaskExecutorProcessSpecContainerResourcePriorityAdapterTest 
extends TestLogger {
+
+   private static final Resource MAX_CONTAINER_RESOURCE = 
Resource.newInstance(102400, 100);

Review comment:
   I think we could add some tests for external resources. To be specific:
   - Check could we construct 
`TaskExecutorProcessSpecContainerResourcePriorityAdapter` if the given external 
resource is not supported by the Yarn cluster.
   - Under Hadoop 3.0+ or 2.10+, using 
`assumeTrue(HadoopUtils.isMinHadoopVersion(2, 10))`, set the external resource 
to `MAX_CONTAINER_RESOURCE` and add 
`testGetTaskExecutorProcessSpecWithExternalResource`.

##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManagerDriver.java
##
@@ -260,36 +257,47 @@ public void releaseResource(YarnWorkerNode workerNode) {
//  Internal
// 

 
-   private void onContainersOfResourceAllocated(Resource resource, 
List containers) {
-   final List 
pendingTaskExecutorProcessSpecs =
-   
taskExecutorProcessSpecContainerResourceAdapter.getTaskExecutorProcessSpec(resource,
 matchingStrategy).stream()
- 

[GitHub] [flink] flinkbot edited a comment on pull request #13624: [FLINK-19368][hive] TableEnvHiveConnectorITCase fails with Hive-3.x

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13624:
URL: https://github.com/apache/flink/pull/13624#issuecomment-708132432


   
   ## CI report:
   
   * 58df8007e120a6b17f11579dd59f91b6d10f5ecf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7560)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13623: [FLINK-19606][table-runtime] Implement streaming window join operator

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13623:
URL: https://github.com/apache/flink/pull/13623#issuecomment-708132144


   
   ## CI report:
   
   * 15cc49ee7ecb3060ec52ab5545f4b6fae58c25de Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7559)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13617: [FLINK-19531] Implement the sink writer operator

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13617:
URL: https://github.com/apache/flink/pull/13617#issuecomment-707791817


   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 524a25e7d549af7ad9a5e92f39377cbde30059fb UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13605: [FLINK-19599][table] Introduce BulkFormatFactory to integrate new FileSource to table

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13605:
URL: https://github.com/apache/flink/pull/13605#issuecomment-707521684


   
   ## CI report:
   
   * a9df8a1384eb6306656b4fd952edd4be5d7a857d UNKNOWN
   * 8beabe2e1b04afe449610c44f4c376909b463e10 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7493)
 
   * 76ff23e27070dc54e6af85dd91c2742457621aa9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19125) Avoid memory fragmentation when running flink docker image

2020-10-13 Thread Yu Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated FLINK-19125:
--
Description: 
This ticket tracks the problem of memory fragmentation when launching default 
Flink docker image.

In FLINK-18712, user reported if he submits job with rocksDB state backend on a 
k8s session cluster again and again once it finished, the memory usage of task 
manager grows continuously until OOM killed. 
 I reproduce this problem with official Flink docker image no matter how we use 
rocksDB (whether to enable managed memory or not).

I dig into the problem and found this is due to the memory fragmentation caused 
by {{glibc}}, which would not return memory to kernel gracefully (please refer 
to [glibc bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and 
[glibc 
manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])

I found limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please 
refer to 
[choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
 for more details).

And if we choose to use jemalloc to allocate memory via rebuilding another 
docker image, the problem would be gone. 

{code:java}
apt-get -y install libjemalloc-dev

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
{code}

Jemalloc intends to [emphasize fragmentation 
avoidance|https://github.com/jemalloc/jemalloc/wiki/Background#intended-use] 
and we might consider to re-factor our Dockerfile to base on jemalloc to avoid 
memory fragmentation.

  was:
This ticket tracks the problem of memory fragmentation when launching default 
Flink docker image.

In FLINK-18712, user reported if he submits job with rocksDB state backend on a 
k8s session cluster again and again once it finished, the memory usage of task 
manager grows continuously until OOM killed. 
 I reproduce this problem with official Flink docker image no matter how we use 
rocksDB (whether to enable managed memory).

I dig into the problem and found this is due to the memory fragmentation caused 
by {{glibc}}, which would not return memory to kernel gracefully (please refer 
to [glibc bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and 
[glibc 
manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])

I found if limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please 
refer to 
[choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
 for more details).

And if we choose to use jemalloc to allocate memory via rebuilding another 
docker image, the problem would be gone. 

{code:java}
apt-get -y install libjemalloc-dev

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
{code}

Jemalloc intends to [emphasize fragmentation 
avoidance|https://github.com/jemalloc/jemalloc /wiki/Background#intended-use] 
and we might consider to re-factor our Dockerfile to base on jemalloc to avoid 
memory fragmentation.


> Avoid memory fragmentation when running flink docker image
> --
>
> Key: FLINK-19125
> URL: https://issues.apache.org/jira/browse/FLINK-19125
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, Runtime / State Backends
>Affects Versions: 1.12.0, 1.11.1
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
> Fix For: 1.12.0, 1.11.3
>
>
> This ticket tracks the problem of memory fragmentation when launching default 
> Flink docker image.
> In FLINK-18712, user reported if he submits job with rocksDB state backend on 
> a k8s session cluster again and again once it finished, the memory usage of 
> task manager grows continuously until OOM killed. 
>  I reproduce this problem with official Flink docker image no matter how we 
> use rocksDB (whether to enable managed memory or not).
> I dig into the problem and found this is due to the memory fragmentation 
> caused by {{glibc}}, which would not return memory to kernel gracefully 
> (please refer to [glibc 
> bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and [glibc 
> manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])
> I found limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please 
> refer to 
> [choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
>  for more details).
> And if we choose to use jemalloc to allocate memory via rebuilding another 
> docker image, the problem would be gone. 
> {code:java}
> apt-get -y install libjemalloc-dev
> ENV 

[GitHub] [flink] flinkbot edited a comment on pull request #13504: [FLINK-19404][python] Support Pandas Stream Over Window Aggregation

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13504:
URL: https://github.com/apache/flink/pull/13504#issuecomment-700556506


   
   ## CI report:
   
   * 5e9643b76e1c28968baa3cf41492d55bc8ed9ad0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7482)
 
   * 5414040e7b46b1d8f0e6efd4cb4a03e24bb9a788 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Lsw_aka_laplace (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lsw_aka_laplace updated FLINK-19630:

Attachment: image-2020-10-14-11-48-51-310.png

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] curcur commented on pull request #13614: [FLINK-19547][runtime] Clean up partial record when reconnecting for Approximate Local Recovery

2020-10-13 Thread GitBox


curcur commented on pull request #13614:
URL: https://github.com/apache/flink/pull/13614#issuecomment-708138894


   Local azure results:
   
   https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=92=results
   
   green light except `JMXReporterFactoryTest` due to FLINK-19539



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Lsw_aka_laplace (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lsw_aka_laplace updated FLINK-19630:

Attachment: image-2020-10-14-11-42-57-353.png

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Lsw_aka_laplace (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lsw_aka_laplace updated FLINK-19630:

Attachment: image-2020-10-14-11-41-53-379.png

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18999) Temporary generic table doesn't work with HiveCatalog

2020-10-13 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-18999.

Resolution: Fixed

master:

24f88069f596673c6f58b136d85d1b00ddf840f2

e6d7f97e63a727566125489e527f1dabb225d91e

> Temporary generic table doesn't work with HiveCatalog
> -
>
> Key: FLINK-18999
> URL: https://issues.apache.org/jira/browse/FLINK-18999
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Suppose current catalog is a {{HiveCatalog}}. If user creates a temporary 
> generic table, this table cannot be accessed in SQL queries. Will hit 
> exception like:
> {noformat}
> Caused by: org.apache.hadoop.hive.metastore.api.NoSuchObjectException: DB.TBL 
> table not found
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55064)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55032)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result.read(ThriftHiveMetastore.java:54963)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) 
> ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1344)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_181]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_181]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_181]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at com.sun.proxy.$Proxy28.getTable(Unknown Source) ~[?:?]
> at 
> org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper.getTable(HiveMetastoreClientWrapper.java:112)
>  ~[flink-connector-hive_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.connectors.hive.HiveTableSource.initAllPartitions(HiveTableSource.java:415)
>  ~[flink-connector-hive_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.connectors.hive.HiveTableSource.getDataStream(HiveTableSource.java:171)
>  ~[flink-connector-hive_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.PhysicalLegacyTableSourceScan.getSourceTransformation(PhysicalLegacyTableSourceScan.scala:82)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacyTableSourceScan.translateToPlanInternal(StreamExecLegacyTableSourceScan.scala:98)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacyTableSourceScan.translateToPlanInternal(StreamExecLegacyTableSourceScan.scala:63)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacyTableSourceScan.translateToPlan(StreamExecLegacyTableSourceScan.scala:63)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToTransformation(StreamExecLegacySink.scala:158)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlanInternal(StreamExecLegacySink.scala:82)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> 

[GitHub] [flink] JingsongLi merged pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…

2020-10-13 Thread GitBox


JingsongLi merged pull request #13216:
URL: https://github.com/apache/flink/pull/13216


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Lsw_aka_laplace (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lsw_aka_laplace updated FLINK-19630:

Attachment: image-2020-10-14-11-36-48-086.png

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> --
>
> Key: FLINK-19630
> URL: https://issues.apache.org/jira/browse/FLINK-19630
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Table SQL / Ecosystem
>Affects Versions: 1.11.2
>Reporter: Lsw_aka_laplace
>Priority: Critical
> Attachments: image-2020-10-14-11-36-48-086.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19629) English words are spelled incorrectly and an example is not provided

2020-10-13 Thread shizhengchao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213569#comment-17213569
 ] 

shizhengchao commented on FLINK-19629:
--

Avro union(something, null)  spelled incorrectly, should be Avro 
unions(something, null), not union. 
and i think, an example nullable types should be provided. So the complete 
documentation should be like this: 

In addition to the types listed above, Flink supports reading/writing nullable 
types, e.g "behavior STRING NULL" . Flink maps nullable types to Avro 
unions(something, null), where something is the Avro type converted from Flink 
type.

> English words are spelled incorrectly and an example is not provided
> 
>
> Key: FLINK-19629
> URL: https://issues.apache.org/jira/browse/FLINK-19629
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.11.2
>Reporter: shizhengchao
>Priority: Critical
> Fix For: 1.12.0
>
>
> the docs Connectors/Table & SQL Connectors/Formats/Avro:
>  In addition to the types listed above, Flink supports reading/writing 
> nullable types. Flink maps nullable types to Avro union(something, null), 
> where something is the Avro type converted from Flink type.
> avro have no union type, should be unions:
>  Avro unions(something, null)
> by the way, an example should be provided that reading/writing nullable 
> types, such as this:
> {code:java}
> CREATE TABLE user_behavior (
>   behavior STRING NULL
> ) WITH (
>  'connector' = 'kafka',
>  'topic' = 'user_behavior',
>  'properties.bootstrap.servers' = 'localhost:9092',
>  'properties.group.id' = 'testGroup',
>  'format' = 'avro'
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] guoweiM commented on pull request #13617: [FLINK-19531] Implement the sink writer operator

2020-10-13 Thread GitBox


guoweiM commented on pull request #13617:
URL: https://github.com/apache/flink/pull/13617#issuecomment-708134965


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19629) English words are spelled incorrectly and an example is not provided

2020-10-13 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213565#comment-17213565
 ] 

Jark Wu commented on FLINK-19629:
-

What's wrong with the example? Do you mean changing {{behavior STRING}} to 
{{behavior STRING NULL}}? However, datatypes are nullable by default and we 
don't support {{NULL}} AFAIK.

> English words are spelled incorrectly and an example is not provided
> 
>
> Key: FLINK-19629
> URL: https://issues.apache.org/jira/browse/FLINK-19629
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.11.2
>Reporter: shizhengchao
>Priority: Critical
> Fix For: 1.12.0
>
>
> the docs Connectors/Table & SQL Connectors/Formats/Avro:
>  In addition to the types listed above, Flink supports reading/writing 
> nullable types. Flink maps nullable types to Avro union(something, null), 
> where something is the Avro type converted from Flink type.
> avro have no union type, should be unions:
>  Avro unions(something, null)
> by the way, an example should be provided that reading/writing nullable 
> types, such as this:
> {code:java}
> CREATE TABLE user_behavior (
>   behavior STRING NULL
> ) WITH (
>  'connector' = 'kafka',
>  'topic' = 'user_behavior',
>  'properties.bootstrap.servers' = 'localhost:9092',
>  'properties.group.id' = 'testGroup',
>  'format' = 'avro'
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #13624: [FLINK-19368][hive] TableEnvHiveConnectorITCase fails with Hive-3.x

2020-10-13 Thread GitBox


flinkbot commented on pull request #13624:
URL: https://github.com/apache/flink/pull/13624#issuecomment-708132432


   
   ## CI report:
   
   * 58df8007e120a6b17f11579dd59f91b6d10f5ecf UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19631) Comments of DecodingFormatFactory is not clear

2020-10-13 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-19631:
-
Issue Type: Bug  (was: Test)

> Comments of DecodingFormatFactory is not clear
> --
>
> Key: FLINK-19631
> URL: https://issues.apache.org/jira/browse/FLINK-19631
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: 1.11.3
>
>
> e.g. from \{@code key.format.ignore-errors} to \{@code format.ignore-errors}
> Should be "from \{@code format.ignore-errors} to \{@code ignore-errors}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19631) Comments of DecodingFormatFactory is not clear

2020-10-13 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-19631:
-
Labels: starter  (was: )

> Comments of DecodingFormatFactory is not clear
> --
>
> Key: FLINK-19631
> URL: https://issues.apache.org/jira/browse/FLINK-19631
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Jingsong Lee
>Priority: Major
>  Labels: starter
> Fix For: 1.11.3
>
>
> e.g. from \{@code key.format.ignore-errors} to \{@code format.ignore-errors}
> Should be "from \{@code format.ignore-errors} to \{@code ignore-errors}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] HuangXingBo commented on pull request #13504: [FLINK-19404][python] Support Pandas Stream Over Window Aggregation

2020-10-13 Thread GitBox


HuangXingBo commented on pull request #13504:
URL: https://github.com/apache/flink/pull/13504#issuecomment-708131977


   @dianfu Thanks a lot for the review. I have addressed the comments at the 
latest commit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13623: [FLINK-19606][table-runtime] Implement streaming window join operator

2020-10-13 Thread GitBox


flinkbot commented on pull request #13623:
URL: https://github.com/apache/flink/pull/13623#issuecomment-708132144


   
   ## CI report:
   
   * 15cc49ee7ecb3060ec52ab5545f4b6fae58c25de UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-19631) Comments of DecodingFormatFactory is not clear

2020-10-13 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-19631:


 Summary: Comments of DecodingFormatFactory is not clear
 Key: FLINK-19631
 URL: https://issues.apache.org/jira/browse/FLINK-19631
 Project: Flink
  Issue Type: Test
  Components: Table SQL / API
Reporter: Jingsong Lee
 Fix For: 1.11.3


e.g. from \{@code key.format.ignore-errors} to \{@code format.ignore-errors}

Should be "from \{@code format.ignore-errors} to \{@code ignore-errors}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13611: [FLINK-19480][python] Support RichFunction in Python DataStream API

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13611:
URL: https://github.com/apache/flink/pull/13611#issuecomment-707634048


   
   ## CI report:
   
   * 97bf0d8b9d1541d8ecb077fa977e2fe3c7a15d2d UNKNOWN
   * 51d4364176c11ed1b766a00bdbab9597f7ca4af2 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7533)
 
   * bbf13d4f68c7c6450d264d862eaf9db45ff80e6d Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7558)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19629) English words are spelled incorrectly and an example is not provided

2020-10-13 Thread shizhengchao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213561#comment-17213561
 ] 

shizhengchao commented on FLINK-19629:
--

[~jark], i can complete this work, could you assign it to me ?

 

> English words are spelled incorrectly and an example is not provided
> 
>
> Key: FLINK-19629
> URL: https://issues.apache.org/jira/browse/FLINK-19629
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.11.2
>Reporter: shizhengchao
>Priority: Critical
> Fix For: 1.12.0
>
>
> the docs Connectors/Table & SQL Connectors/Formats/Avro:
>  In addition to the types listed above, Flink supports reading/writing 
> nullable types. Flink maps nullable types to Avro union(something, null), 
> where something is the Avro type converted from Flink type.
> avro have no union type, should be unions:
>  Avro unions(something, null)
> by the way, an example should be provided that reading/writing nullable 
> types, such as this:
> {code:java}
> CREATE TABLE user_behavior (
>   behavior STRING NULL
> ) WITH (
>  'connector' = 'kafka',
>  'topic' = 'user_behavior',
>  'properties.bootstrap.servers' = 'localhost:9092',
>  'properties.group.id' = 'testGroup',
>  'format' = 'avro'
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19584) HBaseSinkFunction no needs to create thread to flush when bufferFlushMaxMutations = 1

2020-10-13 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-19584.
---
Resolution: Fixed

Fixed in master: b62c13b9d494110b806a1cf88233f075df16c810

> HBaseSinkFunction no needs to create thread to flush when 
> bufferFlushMaxMutations = 1
> -
>
> Key: FLINK-19584
> URL: https://issues.apache.org/jira/browse/FLINK-19584
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase
>Affects Versions: 1.11.0
>Reporter: hailong wang
>Assignee: hailong wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> HBaseSinkFunction no needs to create thread to flush when 
> bufferFlushMaxMutations equal one. By doing this, It will reduce the overhead 
> of this thread.
> This maybe be the same as 
> [FLINK-15389|https://issues.apache.org/jira/browse/FLINK-15389].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19441) Performance regression on 24.09.2020

2020-10-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213559#comment-17213559
 ] 

Yuan Mei edited comment on FLINK-19441 at 10/14/20, 3:21 AM:
-

I have forwarded this to [~kevin.cyj] and pinged him in person.


was (Author: ym):
forward this to [~kevin.cyj] and pinged him in person.

> Performance regression on 24.09.2020
> 
>
> Key: FLINK-19441
> URL: https://issues.apache.org/jira/browse/FLINK-19441
> Project: Flink
>  Issue Type: Bug
>Reporter: Arvid Heise
>Assignee: Stephan Ewen
>Priority: Blocker
>  Labels: pull-request-available
>
> A couple of benchmarks are showing a small performance regression on 
> 24.09.2020:
> http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow=2
> http://codespeed.dak8s.net:8000/timeline/?ben=tupleKeyBy=2 (?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong commented on pull request #13613: [FLINK-19584][Connector-hbase] Not starting flush thread when bufferFlushMaxMutations = 1

2020-10-13 Thread GitBox


wuchong commented on pull request #13613:
URL: https://github.com/apache/flink/pull/13613#issuecomment-708131146


   Build is passed, merging...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-19630) Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

2020-10-13 Thread Lsw_aka_laplace (Jira)
Lsw_aka_laplace created FLINK-19630:
---

 Summary: Sink data in [ORC] format into Hive By using Legacy Table 
API  caused unexpected Exception
 Key: FLINK-19630
 URL: https://issues.apache.org/jira/browse/FLINK-19630
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive, Table SQL / Ecosystem
Affects Versions: 1.11.2
Reporter: Lsw_aka_laplace






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong merged pull request #13613: [FLINK-19584][Connector-hbase] Not starting flush thread when bufferFlushMaxMutations = 1

2020-10-13 Thread GitBox


wuchong merged pull request #13613:
URL: https://github.com/apache/flink/pull/13613


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19441) Performance regression on 24.09.2020

2020-10-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213559#comment-17213559
 ] 

Yuan Mei commented on FLINK-19441:
--

forward this to [~kevin.cyj] and pinged him in person.

> Performance regression on 24.09.2020
> 
>
> Key: FLINK-19441
> URL: https://issues.apache.org/jira/browse/FLINK-19441
> Project: Flink
>  Issue Type: Bug
>Reporter: Arvid Heise
>Assignee: Stephan Ewen
>Priority: Blocker
>  Labels: pull-request-available
>
> A couple of benchmarks are showing a small performance regression on 
> 24.09.2020:
> http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow=2
> http://codespeed.dak8s.net:8000/timeline/?ben=tupleKeyBy=2 (?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19629) English words are spelled incorrectly and an example is not provided

2020-10-13 Thread shizhengchao (Jira)
shizhengchao created FLINK-19629:


 Summary: English words are spelled incorrectly and an example is 
not provided
 Key: FLINK-19629
 URL: https://issues.apache.org/jira/browse/FLINK-19629
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.11.2
Reporter: shizhengchao
 Fix For: 1.12.0


the docs Connectors/Table & SQL Connectors/Formats/Avro:
 In addition to the types listed above, Flink supports reading/writing nullable 
types. Flink maps nullable types to Avro union(something, null), where 
something is the Avro type converted from Flink type.

avro have no union type, should be unions:
 Avro unions(something, null)

by the way, an example should be provided that reading/writing nullable types, 
such as this:
{code:java}
CREATE TABLE user_behavior (
  behavior STRING NULL
) WITH (
 'connector' = 'kafka',
 'topic' = 'user_behavior',
 'properties.bootstrap.servers' = 'localhost:9092',
 'properties.group.id' = 'testGroup',
 'format' = 'avro'
)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19622) Flinksql version 1.11 is for the NullPointerException of the Map type value value in the avro format

2020-10-13 Thread Benchao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benchao Li updated FLINK-19622:
---
Affects Version/s: (was: 1.11.1)
   1.11.2

> Flinksql version 1.11 is for the NullPointerException of the Map type value 
> value in the avro format
> 
>
> Key: FLINK-19622
> URL: https://issues.apache.org/jira/browse/FLINK-19622
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.11.2
>Reporter: 宋洪亮
>Priority: Critical
>
> Hello, when I use flinksql version 1.11 to analyze data in avro format in 
> Kafka, I used the Map type. For this type, my definition is 
> MAP, but the analysis is because the value in the map is 
> empty. NullPointerException
> The source code is attached below (AvroRowDataDeserializationSchema)
> {code:java}
> private static DeserializationRuntimeConverter createMapConverter(LogicalType 
> type) {
>final DeserializationRuntimeConverter keyConverter = createConverter(
>   DataTypes.STRING().getLogicalType());
>final DeserializationRuntimeConverter valueConverter = createConverter(
>   extractValueTypeToAvroMap(type));
>return avroObject -> {
>   final Map map = (Map) avroObject;
>   Map result = new HashMap<>();
>   for (Map.Entry entry : map.entrySet()) {
>  Object key = keyConverter.convert(entry.getKey());
>  Object value = valueConverter.convert(entry.getValue());
>  result.put(key, value);
>   }
>   return new GenericMapData(result);
>};
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-statefun] tzulitai commented on pull request #163: [FLINK-19620] Merge ExactlyOnceE2E and RemoteModuleE2E

2020-10-13 Thread GitBox


tzulitai commented on pull request #163:
URL: https://github.com/apache/flink-statefun/pull/163#issuecomment-708128215


   cc @sjwiesman @igalshilman 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19368) TableEnvHiveConnectorITCase fails with Hive-3.x

2020-10-13 Thread Rui Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated FLINK-19368:
---
Priority: Blocker  (was: Major)

> TableEnvHiveConnectorITCase fails with Hive-3.x
> ---
>
> Key: FLINK-19368
> URL: https://issues.apache.org/jira/browse/FLINK-19368
> Project: Flink
>  Issue Type: Test
>  Components: Connectors / Hive, Tests
>Reporter: Rui Li
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Failed test cases are {{testBatchTransactionalTable}} and 
> {{testStreamTransactionalTable}}. It fails to create the ACID table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19622) Flinksql version 1.11 is for the NullPointerException of the Map type value value in the avro format

2020-10-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-19622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

宋洪亮 updated FLINK-19622:

Description: 
Hello, when I use flinksql version 1.11 to analyze data in avro format in 
Kafka, I used the Map type. For this type, my definition is 
MAP, but the analysis is because the value in the map is 
empty. NullPointerException

The source code is attached below (AvroRowDataDeserializationSchema)
{code:java}
private static DeserializationRuntimeConverter createMapConverter(LogicalType 
type) {
   final DeserializationRuntimeConverter keyConverter = createConverter(
  DataTypes.STRING().getLogicalType());
   final DeserializationRuntimeConverter valueConverter = createConverter(
  extractValueTypeToAvroMap(type));
   return avroObject -> {
  final Map map = (Map) avroObject;
  Map result = new HashMap<>();
  for (Map.Entry entry : map.entrySet()) {
 Object key = keyConverter.convert(entry.getKey());
 Object value = valueConverter.convert(entry.getValue());
 result.put(key, value);
  }
  return new GenericMapData(result);
   };
}
{code}
 

 

  was:
Hello, when I use flinksql version 1.11 to analyze data in avro format in 
Kafka, I used the Map type. For this type, my definition is 
MAP, but the analysis is because the value in the map is 
empty. Null pointer exception

The source code is attached below (AvroRowDataDeserializationSchema)
{code:java}
private static DeserializationRuntimeConverter createMapConverter(LogicalType 
type) {
   final DeserializationRuntimeConverter keyConverter = createConverter(
  DataTypes.STRING().getLogicalType());
   final DeserializationRuntimeConverter valueConverter = createConverter(
  extractValueTypeToAvroMap(type));
   return avroObject -> {
  final Map map = (Map) avroObject;
  Map result = new HashMap<>();
  for (Map.Entry entry : map.entrySet()) {
 Object key = keyConverter.convert(entry.getKey());
 Object value = valueConverter.convert(entry.getValue());
 result.put(key, value);
  }
  return new GenericMapData(result);
   };
}
{code}
 

 

Summary: Flinksql version 1.11 is for the NullPointerException of the 
Map type value value in the avro format  (was: Flinksql version 1.11 is for the 
null pointer exception of the Map type value value in the avro format)

> Flinksql version 1.11 is for the NullPointerException of the Map type value 
> value in the avro format
> 
>
> Key: FLINK-19622
> URL: https://issues.apache.org/jira/browse/FLINK-19622
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.11.1
>Reporter: 宋洪亮
>Priority: Critical
>
> Hello, when I use flinksql version 1.11 to analyze data in avro format in 
> Kafka, I used the Map type. For this type, my definition is 
> MAP, but the analysis is because the value in the map is 
> empty. NullPointerException
> The source code is attached below (AvroRowDataDeserializationSchema)
> {code:java}
> private static DeserializationRuntimeConverter createMapConverter(LogicalType 
> type) {
>final DeserializationRuntimeConverter keyConverter = createConverter(
>   DataTypes.STRING().getLogicalType());
>final DeserializationRuntimeConverter valueConverter = createConverter(
>   extractValueTypeToAvroMap(type));
>return avroObject -> {
>   final Map map = (Map) avroObject;
>   Map result = new HashMap<>();
>   for (Map.Entry entry : map.entrySet()) {
>  Object key = keyConverter.convert(entry.getKey());
>  Object value = valueConverter.convert(entry.getValue());
>  result.put(key, value);
>   }
>   return new GenericMapData(result);
>};
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19620) Merge StateFun's ExactlyOnceE2E and RemoteModuleE2E

2020-10-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19620:
---
Labels: pull-request-available  (was: )

> Merge StateFun's ExactlyOnceE2E and RemoteModuleE2E
> ---
>
> Key: FLINK-19620
> URL: https://issues.apache.org/jira/browse/FLINK-19620
> Project: Flink
>  Issue Type: Test
>  Components: Stateful Functions, Tests
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
>  Labels: pull-request-available
>
> Currently, we have:
> - {{ExactlyOnceE2E}}, which verifies end-to-end exactly-once in the presence 
> of TM failures, but uses embedded functions
> - {{RemoteModuleE2E}} which runs functions remotely and verifies that the 
> communication between StateFun + the functions is correct, messages are 
> routed correctly.
> A recent issue (https://github.com/apache/flink-statefun/pull/159) suggested 
> that we should add a E2E with remote functions + TM failures.
> With this in mind, it is worth considering merging these 2 E2Es into one, as 
> together they should cover equal functionality (remote functions are built on 
> top of embedded functions), and having them merged also saves test time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19622) Flinksql version 1.11 is for the null pointer exception of the Map type value value in the avro format

2020-10-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-19622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

宋洪亮 updated FLINK-19622:

Description: 
Hello, when I use flinksql version 1.11 to analyze data in avro format in 
Kafka, I used the Map type. For this type, my definition is 
MAP, but the analysis is because the value in the map is 
empty. Null pointer exception

The source code is attached below (AvroRowDataDeserializationSchema)
{code:java}
private static DeserializationRuntimeConverter createMapConverter(LogicalType 
type) {
   final DeserializationRuntimeConverter keyConverter = createConverter(
  DataTypes.STRING().getLogicalType());
   final DeserializationRuntimeConverter valueConverter = createConverter(
  extractValueTypeToAvroMap(type));
   return avroObject -> {
  final Map map = (Map) avroObject;
  Map result = new HashMap<>();
  for (Map.Entry entry : map.entrySet()) {
 Object key = keyConverter.convert(entry.getKey());
 Object value = valueConverter.convert(entry.getValue());
 result.put(key, value);
  }
  return new GenericMapData(result);
   };
}
{code}
 

 

  was:
hello,我在使用flinksql 
1.11版本针对kafka中avro格式的数据解析时,使用到了Map类型,针对此类型我的定义是MAP,但是在解析是由于map中的value为空我碰到了空指针异常

 

 

Environment: (was: 下面附上源码 AvroRowDataDeserializationSchema
{code:java}
私有 静态DeserializationRuntimeConverter createMapConverter(LogicalType类型){
   最终DeserializationRuntimeConverter keyConverter = createConverter(
  DataTypes.STRING()。getLogicalType());
   最后的DeserializationRuntimeConverter valueConverter = createConverter(
  extractValueTypeToAvroMap(type));
   返回avroObject-> {
   final Map  map =(Map )avroObject;
  Map < Object,Object > result = new HashMap <>();
  对于(Map.Entry  entry:map.entrySet()){
 对象键= keyConverter.convert(entry.getKey());
 对象值= valueConverter.convert(entry.getValue());
 result.put(key,value);
  }
  返回 新的GenericMapData(result);
   };
}
{code})
Summary: Flinksql version 1.11 is for the null pointer exception of the 
Map type value value in the avro format  (was: flinksql 
1.11版本针对avro格式中Map类型value值的空指针异常)

> Flinksql version 1.11 is for the null pointer exception of the Map type value 
> value in the avro format
> --
>
> Key: FLINK-19622
> URL: https://issues.apache.org/jira/browse/FLINK-19622
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.11.1
>Reporter: 宋洪亮
>Priority: Critical
>
> Hello, when I use flinksql version 1.11 to analyze data in avro format in 
> Kafka, I used the Map type. For this type, my definition is 
> MAP, but the analysis is because the value in the map is 
> empty. Null pointer exception
> The source code is attached below (AvroRowDataDeserializationSchema)
> {code:java}
> private static DeserializationRuntimeConverter createMapConverter(LogicalType 
> type) {
>final DeserializationRuntimeConverter keyConverter = createConverter(
>   DataTypes.STRING().getLogicalType());
>final DeserializationRuntimeConverter valueConverter = createConverter(
>   extractValueTypeToAvroMap(type));
>return avroObject -> {
>   final Map map = (Map) avroObject;
>   Map result = new HashMap<>();
>   for (Map.Entry entry : map.entrySet()) {
>  Object key = keyConverter.convert(entry.getKey());
>  Object value = valueConverter.convert(entry.getValue());
>  result.put(key, value);
>   }
>   return new GenericMapData(result);
>};
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-statefun] tzulitai opened a new pull request #163: [FLINK-19620] Merge ExactlyOnceE2E and RemoteModuleE2E

2020-10-13 Thread GitBox


tzulitai opened a new pull request #163:
URL: https://github.com/apache/flink-statefun/pull/163


   The original `ExactlyOnceE2E` verified end-to-end exactly-once with TM 
failures, while the `RemoteModuleE2E` verified a deployment with remote 
functions.
   
   As mentioned in #159 by @sjwiesman, we seem to be lacking an exactly-once 
verifying E2E with remote functions.
   
   It is actually worthwhile to merge these 2 E2Es into one, since 1) remote 
functions are built on top of embedded functions, so merging them does not hurt 
us on test coverage, and 2) E2E tests take a significant amount of test time, 
so less is better.
   
   ### Change log
   
   - bbe5d84: Extends the original `RemoteModuleE2E` to have TM failures, 
`read_committed` settings for Kafka consumers that verify outputs, and 
exactly-once delivering Kafka egresses.
   - 1c022ad: Removes the `ExactlyOnceE2E`
   - 8ca8993: Renames `RemoteModuleE2E` to `ExactlyOnceWithRemoteFnE2E` to 
reflect the extra coverage in the test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13624: [FLINK-19368][hive] TableEnvHiveConnectorITCase fails with Hive-3.x

2020-10-13 Thread GitBox


flinkbot commented on pull request #13624:
URL: https://github.com/apache/flink/pull/13624#issuecomment-708127677


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 58df8007e120a6b17f11579dd59f91b6d10f5ecf (Wed Oct 14 
03:09:27 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-19368).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-19628) Introduce multi-input operator for streaming

2020-10-13 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-19628:
---

 Summary: Introduce multi-input operator for streaming
 Key: FLINK-19628
 URL: https://issues.apache.org/jira/browse/FLINK-19628
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: Caizhi Weng
 Fix For: 1.12.0


After the planner is ready for multi-input, we should introduce multi-input 
operator for streaming.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19368) TableEnvHiveConnectorITCase fails with Hive-3.x

2020-10-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19368:
---
Labels: pull-request-available  (was: )

> TableEnvHiveConnectorITCase fails with Hive-3.x
> ---
>
> Key: FLINK-19368
> URL: https://issues.apache.org/jira/browse/FLINK-19368
> Project: Flink
>  Issue Type: Test
>  Components: Connectors / Hive, Tests
>Reporter: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Failed test cases are {{testBatchTransactionalTable}} and 
> {{testStreamTransactionalTable}}. It fails to create the ACID table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19627) Introduce multi-input operator for batch

2020-10-13 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-19627:
---

 Summary: Introduce multi-input operator for batch
 Key: FLINK-19627
 URL: https://issues.apache.org/jira/browse/FLINK-19627
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: Caizhi Weng
 Fix For: 1.12.0


After the planner is ready for multi-input, we should introduce multi-input 
operator for batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] lirui-apache opened a new pull request #13624: [FLINK-19368][hive] TableEnvHiveConnectorITCase fails with Hive-3.x

2020-10-13 Thread GitBox


lirui-apache opened a new pull request #13624:
URL: https://github.com/apache/flink/pull/13624


   
   
   ## What is the purpose of the change
   
   Fix `TableEnvHiveConnectorITCase` for Hive-3.x.
   
   
   ## Brief change log
   
 - Set ACID transaction manager for 3.x -- it's required for the 
transaction table tests.
 - Configures the standalone HMS to prepare ACID tables for the test.
   
   
   ## Verifying this change
   
   Manually verified tests for all hive versions.
   
   ## Does this pull request potentially affect one of the following parts:
   
   NA
   
   ## Documentation
   
   NA
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-19626) Introduce multi-input operator construction algorithm

2020-10-13 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-19626:
---

 Summary: Introduce multi-input operator construction algorithm
 Key: FLINK-19626
 URL: https://issues.apache.org/jira/browse/FLINK-19626
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Caizhi Weng
 Fix For: 1.12.0


We should introduce an algorithm to organize exec nodes into multi-input exec 
nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19625) Introduce multi-input exec node

2020-10-13 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-19625:
---

 Summary: Introduce multi-input exec node
 Key: FLINK-19625
 URL: https://issues.apache.org/jira/browse/FLINK-19625
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Caizhi Weng
 Fix For: 1.12.0


For multi-input to work in Blink planner, we should first introduce multi-input 
exec node in the planner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13577: [FLINK-16579][table] Upgrade Calcite version to 1.26 for Flink SQL

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13577:
URL: https://github.com/apache/flink/pull/13577#issuecomment-706470705


   
   ## CI report:
   
   * 90f8521b60440ef0782c1ce4356d260fe56a1bc0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7537)
 
   * 4c7d37a3b02ae9221b3f9a798ba9fd0523ec4bb9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7557)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13299: [FLINK-19072][table-planner] Import Temporal Table join rule for Stream

2020-10-13 Thread GitBox


flinkbot edited a comment on pull request #13299:
URL: https://github.com/apache/flink/pull/13299#issuecomment-684842866


   
   ## CI report:
   
   * a0719cf3cfc2219361b74540399126cc84a8e1cd Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7553)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=7542)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19622) flinksql 1.11版本针对avro格式中Map类型value值的空指针异常

2020-10-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-19622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

宋洪亮 updated FLINK-19622:

  Component/s: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Version/s: 1.11.1

> flinksql 1.11版本针对avro格式中Map类型value值的空指针异常
> -
>
> Key: FLINK-19622
> URL: https://issues.apache.org/jira/browse/FLINK-19622
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.11.1
> Environment: 下面附上源码 AvroRowDataDeserializationSchema
> {code:java}
> 私有 静态DeserializationRuntimeConverter createMapConverter(LogicalType类型){
>最终DeserializationRuntimeConverter keyConverter = createConverter(
>   DataTypes.STRING()。getLogicalType());
>最后的DeserializationRuntimeConverter valueConverter = createConverter(
>   extractValueTypeToAvroMap(type));
>返回avroObject-> {
>final Map  map =(Map )avroObject;
>   Map < Object,Object > result = new HashMap <>();
>   对于(Map.Entry  entry:map.entrySet()){
>  对象键= keyConverter.convert(entry.getKey());
>  对象值= valueConverter.convert(entry.getValue());
>  result.put(key,value);
>   }
>   返回 新的GenericMapData(result);
>};
> }
> {code}
>Reporter: 宋洪亮
>Priority: Critical
>
> hello,我在使用flinksql 
> 1.11版本针对kafka中avro格式的数据解析时,使用到了Map类型,针对此类型我的定义是MAP,但是在解析是由于map中的value为空我碰到了空指针异常
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19624) Update deadlock break-up algorithm to cover more cases

2020-10-13 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-19624:
---

 Summary: Update deadlock break-up algorithm to cover more cases
 Key: FLINK-19624
 URL: https://issues.apache.org/jira/browse/FLINK-19624
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Caizhi Weng
 Fix For: 1.12.0
 Attachments: pasted image 0.png

Current deadlock breakup algorithm fails to cover the following case:

We're going to introduce a new deadlock breakup algorithm to cover this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19624) Update deadlock break-up algorithm to cover more cases

2020-10-13 Thread Caizhi Weng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caizhi Weng updated FLINK-19624:

Description: 
Current deadlock breakup algorithm fails to cover the following case:

 !pasted image 0.png! 

We're going to introduce a new deadlock breakup algorithm to cover this.

  was:
Current deadlock breakup algorithm fails to cover the following case:

We're going to introduce a new deadlock breakup algorithm to cover this.


> Update deadlock break-up algorithm to cover more cases
> --
>
> Key: FLINK-19624
> URL: https://issues.apache.org/jira/browse/FLINK-19624
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Caizhi Weng
>Priority: Major
> Fix For: 1.12.0
>
> Attachments: pasted image 0.png
>
>
> Current deadlock breakup algorithm fails to cover the following case:
>  !pasted image 0.png! 
> We're going to introduce a new deadlock breakup algorithm to cover this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19622) flinksql 1.11版本针对avro格式中Map类型value值的空指针异常

2020-10-13 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213552#comment-17213552
 ] 

Benchao Li commented on FLINK-19622:


[~奔跑的小飞袁] We should use English in Jira issues, could you please translate your 
description into English?

BTW, do you want to provide a PR to fix this?

> flinksql 1.11版本针对avro格式中Map类型value值的空指针异常
> -
>
> Key: FLINK-19622
> URL: https://issues.apache.org/jira/browse/FLINK-19622
> Project: Flink
>  Issue Type: Bug
> Environment: 下面附上源码 AvroRowDataDeserializationSchema
> {code:java}
> 私有 静态DeserializationRuntimeConverter createMapConverter(LogicalType类型){
>最终DeserializationRuntimeConverter keyConverter = createConverter(
>   DataTypes.STRING()。getLogicalType());
>最后的DeserializationRuntimeConverter valueConverter = createConverter(
>   extractValueTypeToAvroMap(type));
>返回avroObject-> {
>final Map  map =(Map )avroObject;
>   Map < Object,Object > result = new HashMap <>();
>   对于(Map.Entry  entry:map.entrySet()){
>  对象键= keyConverter.convert(entry.getKey());
>  对象值= valueConverter.convert(entry.getValue());
>  result.put(key,value);
>   }
>   返回 新的GenericMapData(result);
>};
> }
> {code}
>Reporter: 宋洪亮
>Priority: Critical
>
> hello,我在使用flinksql 
> 1.11版本针对kafka中avro格式的数据解析时,使用到了Map类型,针对此类型我的定义是MAP,但是在解析是由于map中的value为空我碰到了空指针异常
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19606) Implement streaming window join operator

2020-10-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19606:
---
Labels: pull-request-available  (was: )

> Implement streaming window join operator
> 
>
> Key: FLINK-19606
> URL: https://issues.apache.org/jira/browse/FLINK-19606
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Reporter: Jark Wu
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Implement streaming window join operator in blink runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #13623: [FLINK-19606][table-runtime] Implement streaming window join operator

2020-10-13 Thread GitBox


flinkbot commented on pull request #13623:
URL: https://github.com/apache/flink/pull/13623#issuecomment-708125660


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 15cc49ee7ecb3060ec52ab5545f4b6fae58c25de (Wed Oct 14 
03:02:41 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-19618) Broken link in docs

2020-10-13 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-19618:
---

Assignee: hailong wang

> Broken link in docs
> ---
>
> Key: FLINK-19618
> URL: https://issues.apache.org/jira/browse/FLINK-19618
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.12.0
>Reporter: hailong wang
>Assignee: hailong wang
>Priority: Major
> Fix For: 1.12.0
>
>
> I run the `check_links` shell and found the following broken link,
> {code:java}
> ERROR `/api/java/' not found.
> ERROR `/dev/python/table-api-users-guide/udfs.html' not found.
> ERROR `/dev/python/user-guide/table/dependency_management.html' not found.
> ERROR `/api/java/org/apache/flink/types/RowKind.html' not found.
> {code}
> 1. `ERROR `/api/java/' not found` seems reachable in remote 
> ['[{{ExecutionEnvironment}}|https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/]'|[https://ci.apache.org/projects/flink/flink-docs-master/concepts/flink-architecture.html#flink-application-execution]]
>  (PS, It only broken in my local, I have already built the javadoc in my 
> local.)
> 2. For `ERROR `/dev/python/table-api-users-guide/udfs.html' not found.`, I 
> did not found any documents use this.
> 3. It is really broken. We should use 
> {code:java}
> dev/python/table-apis-users-guide/dependency_management.html {code}
> not 
> {code:java}
> dev/python/user-guide/table/dependency_management.html {code}
> in
> {code:java}
> dev/python/user-guide/table/python_table_api_connectors.md{code}
> 4. It is same as the first case.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19623) Introduce ExecEdge to describe information on input edges for ExecNode

2020-10-13 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-19623:
---

 Summary: Introduce ExecEdge to describe information on input edges 
for ExecNode
 Key: FLINK-19623
 URL: https://issues.apache.org/jira/browse/FLINK-19623
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Caizhi Weng
 Fix For: 1.12.0


Deadlock breakup algorithm and Multi-input operator creation algorithm need 
information about the input edges of an exec node, for example what's the 
priority of this input, and how the input records will trigger the output 
records.

We're going to introduce a new class {{ExecEdge}} to describe this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] danny0405 opened a new pull request #13623: [FLINK-19606][table-runtime] Implement streaming window join operator

2020-10-13 Thread GitBox


danny0405 opened a new pull request #13623:
URL: https://github.com/apache/flink/pull/13623


   
   ## What is the purpose of the change
   
   This patch introduces `WindowJoinOperator` and `WindowJoinOperatorSimple`,
   
   The `WindowJoinOperator` accepts two streams and windows each stream, when
   watermark passes and triggers each window, it joins the dataset of the
   two stream windows and emit the results. Note that the operator itself
   generates the window attributes.
   
   The `WindowJoinOperatorSimple` is a simpler version of `WindowJoinOperator`,
   it assumes that all the inputs has window attributes(e.g. window_start
   and window_end), so there is no need for windowing again.
   
   
   ## Brief change log
   
 - Add two operators, `WindowJoinOperator` and `WindowJoinOperatorSimple`
 - Refactor the table window process functions so that to reuse
 - Add test cases
   
   ## Verifying this change
   
   Added UTs.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? yes
 - If yes, how is the feature documented? not documented
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19618) Broken link in docs

2020-10-13 Thread hailong wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213549#comment-17213549
 ] 

hailong wang commented on FLINK-19618:
--

[~dian.fu] Yes, Thank you for assgining to me~

> Broken link in docs
> ---
>
> Key: FLINK-19618
> URL: https://issues.apache.org/jira/browse/FLINK-19618
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.12.0
>Reporter: hailong wang
>Priority: Major
> Fix For: 1.12.0
>
>
> I run the `check_links` shell and found the following broken link,
> {code:java}
> ERROR `/api/java/' not found.
> ERROR `/dev/python/table-api-users-guide/udfs.html' not found.
> ERROR `/dev/python/user-guide/table/dependency_management.html' not found.
> ERROR `/api/java/org/apache/flink/types/RowKind.html' not found.
> {code}
> 1. `ERROR `/api/java/' not found` seems reachable in remote 
> ['[{{ExecutionEnvironment}}|https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/]'|[https://ci.apache.org/projects/flink/flink-docs-master/concepts/flink-architecture.html#flink-application-execution]]
>  (PS, It only broken in my local, I have already built the javadoc in my 
> local.)
> 2. For `ERROR `/dev/python/table-api-users-guide/udfs.html' not found.`, I 
> did not found any documents use this.
> 3. It is really broken. We should use 
> {code:java}
> dev/python/table-apis-users-guide/dependency_management.html {code}
> not 
> {code:java}
> dev/python/user-guide/table/dependency_management.html {code}
> in
> {code:java}
> dev/python/user-guide/table/python_table_api_connectors.md{code}
> 4. It is same as the first case.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19622) flinksql 1.11版本针对avro格式中Map类型value值的空指针异常

2020-10-13 Thread Jira
宋洪亮 created FLINK-19622:
---

 Summary: flinksql 1.11版本针对avro格式中Map类型value值的空指针异常
 Key: FLINK-19622
 URL: https://issues.apache.org/jira/browse/FLINK-19622
 Project: Flink
  Issue Type: Bug
 Environment: 下面附上源码 AvroRowDataDeserializationSchema
{code:java}
私有 静态DeserializationRuntimeConverter createMapConverter(LogicalType类型){
   最终DeserializationRuntimeConverter keyConverter = createConverter(
  DataTypes.STRING()。getLogicalType());
   最后的DeserializationRuntimeConverter valueConverter = createConverter(
  extractValueTypeToAvroMap(type));
   返回avroObject-> {
   final Map  map =(Map )avroObject;
  Map < Object,Object > result = new HashMap <>();
  对于(Map.Entry  entry:map.entrySet()){
 对象键= keyConverter.convert(entry.getKey());
 对象值= valueConverter.convert(entry.getValue());
 result.put(key,value);
  }
  返回 新的GenericMapData(result);
   };
}
{code}
Reporter: 宋洪亮


hello,我在使用flinksql 
1.11版本针对kafka中avro格式的数据解析时,使用到了Map类型,针对此类型我的定义是MAP,但是在解析是由于map中的value为空我碰到了空指针异常

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19621) Introduce Multi-input operator in Blink planner

2020-10-13 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-19621:
---

 Summary: Introduce Multi-input operator in Blink planner
 Key: FLINK-19621
 URL: https://issues.apache.org/jira/browse/FLINK-19621
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner, Table SQL / Runtime
Reporter: Caizhi Weng
 Fix For: 1.12.0


As runtime now supports multi-input tasks and source chaining, we're going to 
introduce multi-input operator in Blink planner to remove unnecessary shuffle, 
thus improve the performance for both batch and streaming.

Design doc: 
https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19618) Broken link in docs

2020-10-13 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213547#comment-17213547
 ] 

Dian Fu commented on FLINK-19618:
-

[~hailong wang]  Good catch! Would you like to submit a PR?

> Broken link in docs
> ---
>
> Key: FLINK-19618
> URL: https://issues.apache.org/jira/browse/FLINK-19618
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.12.0
>Reporter: hailong wang
>Priority: Major
> Fix For: 1.12.0
>
>
> I run the `check_links` shell and found the following broken link,
> {code:java}
> ERROR `/api/java/' not found.
> ERROR `/dev/python/table-api-users-guide/udfs.html' not found.
> ERROR `/dev/python/user-guide/table/dependency_management.html' not found.
> ERROR `/api/java/org/apache/flink/types/RowKind.html' not found.
> {code}
> 1. `ERROR `/api/java/' not found` seems reachable in remote 
> ['[{{ExecutionEnvironment}}|https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/]'|[https://ci.apache.org/projects/flink/flink-docs-master/concepts/flink-architecture.html#flink-application-execution]]
>  (PS, It only broken in my local, I have already built the javadoc in my 
> local.)
> 2. For `ERROR `/dev/python/table-api-users-guide/udfs.html' not found.`, I 
> did not found any documents use this.
> 3. It is really broken. We should use 
> {code:java}
> dev/python/table-apis-users-guide/dependency_management.html {code}
> not 
> {code:java}
> dev/python/user-guide/table/dependency_management.html {code}
> in
> {code:java}
> dev/python/user-guide/table/python_table_api_connectors.md{code}
> 4. It is same as the first case.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19619) Test failed in Azure For EmulatedPubSubSourceTest

2020-10-13 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-19619:

Affects Version/s: 1.11.0

> Test failed in Azure For EmulatedPubSubSourceTest
> -
>
> Key: FLINK-19619
> URL: https://issues.apache.org/jira/browse/FLINK-19619
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.11.0, 1.12.0
>Reporter: hailong wang
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
>  
> The link is 
> [https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_apis/build/builds/7545/logs/133|https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_apis/build/builds/7545/logs/133]
> {code:java}
>  [ERROR] Tests run: 3, Failures: 1, Errors: 2, Skipped: 0, Time elapsed: 
> 1.705 s <<< FAILURE! - in 
> org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest
> 2020-10-13T18:12:53.5967780Z [ERROR] 
> org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest  Time 
> elapsed: 1.703 s  <<< FAILURE!
> 2020-10-13T18:12:53.5973768Z java.lang.AssertionError: We expect 1 port to be 
> mapped expected:<1> but was:<0>
> 2020-10-13T18:12:53.5979530Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-10-13T18:12:53.5980372Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-10-13T18:12:53.5980722Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-10-13T18:12:53.5981575Z  at 
> org.apache.flink.streaming.connectors.gcp.pubsub.emulator.GCloudEmulatorManager.launchDocker(GCloudEmulatorManager.java:141)
> 2020-10-13T18:12:53.5982596Z  at 
> org.apache.flink.streaming.connectors.gcp.pubsub.emulator.GCloudUnitTestBase.launchGCloudEmulator(GCloudUnitTestBase.java:45)
> 2020-10-13T18:12:53.5983234Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-10-13T18:12:53.5983626Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-10-13T18:12:53.5984410Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-10-13T18:12:53.5985246Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-10-13T18:12:53.5985825Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-10-13T18:12:53.5986306Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-10-13T18:12:53.5986988Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-10-13T18:12:53.5987740Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> 2020-10-13T18:12:53.5988167Z  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2020-10-13T18:12:53.5988550Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-10-13T18:12:53.5988954Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-10-13T18:12:53.5989404Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-10-13T18:12:53.5989888Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-10-13T18:12:53.5990332Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-10-13T18:12:53.5990819Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-10-13T18:12:53.5991302Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 2020-10-13T18:12:53.5991752Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> 2020-10-13T18:12:53.5992161Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> 2020-10-13T18:12:53.5992420Z 
> 2020-10-13T18:12:53.5992746Z [ERROR] 
> org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest  Time 
> elapsed: 1.704 s  <<< ERROR!
> 2020-10-13T18:12:53.5993127Z java.lang.NullPointerException
> 2020-10-13T18:12:53.5993502Z  at 
> org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest.tearDown(EmulatedPubSubSinkTest.java:62)
> 2020-10-13T18:12:53.5993944Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-10-13T18:12:53.5994307Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-10-13T18:12:53.5994757Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-10-13T18:12:53.5995151Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-10-13T18:12:53.5995532Z  at 
> 

  1   2   3   4   5   6   >