[jira] [Created] (FLINK-18726) Support INSERT INTO specific columns

2020-07-26 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-18726:
---

 Summary: Support INSERT INTO specific columns
 Key: FLINK-18726
 URL: https://issues.apache.org/jira/browse/FLINK-18726
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: Caizhi Weng


Currently Flink only supports insert into a table without specifying columns, 
but most database systems support insert into specific columns by

{code:sql}
INSERT INTO table_name(column1, column2, ...) ...
{code}

The columns not specified will be filled with default values or {{NULL}} if no 
default value is given when creating the table.

As Flink currently does not support default values when creating tables, we can 
fill the unspecified columns with {{NULL}} and throw exceptions if there are 
columns with {{NOT NULL}} constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] klion26 commented on a change in pull request #12665: [FLINK-17886][docs-zh] Update Chinese documentation for new Watermark…

2020-07-26 Thread GitBox


klion26 commented on a change in pull request #12665:
URL: https://github.com/apache/flink/pull/12665#discussion_r460654064



##
File path: docs/dev/event_timestamp_extractors.zh.md
##
@@ -25,83 +25,51 @@ under the License.
 * toc
 {:toc}
 
-As described in [timestamps and watermark handling]({{ site.baseurl 
}}/dev/event_timestamps_watermarks.html),
-Flink provides abstractions that allow the programmer to assign their own 
timestamps and emit their own watermarks. More specifically,
-one can do so by implementing one of the `AssignerWithPeriodicWatermarks` and 
`AssignerWithPunctuatedWatermarks` interfaces, depending
-on the use case. In a nutshell, the first will emit watermarks periodically, 
while the second does so based on some property of
-the incoming records, e.g. whenever a special element is encountered in the 
stream.
+如[生成 Watermark]({%link dev/event_timestamps_watermarks.zh.md %}) 小节中所述,Flink 
提供的抽象方法可以允许用户自己去定义时间戳分配方式和 watermark 生成的方式。你可以通过实现 `WatermarkGenerator` 
接口来实现上述功能。
 
-In order to further ease the programming effort for such tasks, Flink comes 
with some pre-implemented timestamp assigners.
-This section provides a list of them. Apart from their out-of-the-box 
functionality, their implementation can serve as an example
-for custom implementations.
+为了进一步简化此类任务的编程工作,Flink 框架预设了一些时间戳分配器。本节后续内容有举例。除了开箱即用的已有实现外,其还可以作为自定义实现的示例以供参考。
 
-### **Assigners with ascending timestamps**
+
 
-The simplest special case for *periodic* watermark generation is the case 
where timestamps seen by a given source task
-occur in ascending order. In that case, the current timestamp can always act 
as a watermark, because no earlier timestamps will
-arrive.
+## 单调递增时间戳分配器
 
-Note that it is only necessary that timestamps are ascending *per parallel 
data source task*. For example, if
-in a specific setup one Kafka partition is read by one parallel data source 
instance, then it is only necessary that
-timestamps are ascending within each Kafka partition. Flink's watermark 
merging mechanism will generate correct
-watermarks whenever parallel streams are shuffled, unioned, connected, or 
merged.
+*周期性* watermark 生成方式的一个最简单特例就是你给定的数据源中数据的时间戳升序出现。在这种情况下,当前时间戳就可以充当 
watermark,因为后续到达数据的时间戳不会比当前的小。
+
+注意:在 Flink 应用程序中,如果是并行数据源,则只要求并行数据源中的每个*单分区数据源任务*时间戳递增。例如,设置每一个并行数据源实例都只读取一个 
Kafka 分区,则时间戳只需在每个 Kafka 分区内递增即可。Flink 的 watermark 
合并机制会在并行数据流进行分发(shuffle)、联合(union)、连接(connect)或合并(merge)时生成正确的 watermark。
 
 
 
 {% highlight java %}
-DataStream stream = ...
-
-DataStream withTimestampsAndWatermarks =
-stream.assignTimestampsAndWatermarks(new 
AscendingTimestampExtractor() {
-
-@Override
-public long extractAscendingTimestamp(MyEvent element) {
-return element.getCreationTime();
-}
-});
+WatermarkStrategies

Review comment:
   英文版的这个代码看上去是 
   ```
   WatermarkStrategy.forMonotonousTimestamps();
   ```

##
File path: docs/dev/event_timestamps_watermarks.zh.md
##
@@ -175,200 +140,250 @@ withTimestampsAndWatermarks
 
 
 
+使用 `WatermarkStrategy` 去获取流并生成带有时间戳的元素和 watermark 的新流时,如果原始流已经具有时间戳或 
watermark,则新指定的时间戳分配器将覆盖原有的时间戳和 watermark。
 
- **With Periodic Watermarks**
+
 
-`AssignerWithPeriodicWatermarks` assigns timestamps and generates watermarks 
periodically (possibly depending
-on the stream elements, or purely based on processing time).
+## 处理空闲数据源
 
-The interval (every *n* milliseconds) in which the watermark will be generated 
is defined via
-`ExecutionConfig.setAutoWatermarkInterval(...)`. The assigner's 
`getCurrentWatermark()` method will be
-called each time, and a new watermark will be emitted if the returned 
watermark is non-null and larger than the previous
-watermark.
+如果数据源中的某一个分区/分片在一段时间内未发送事件数据,则意味着 `WatermarkGenerator` 也不会获得任何新数据去生成 
watermark。我们称这类数据源为*空闲输入*或*空闲源*。在这种情况下,当某些其他分区仍然发送事件数据的时候就会出现问题。由于下游算子 
watermark 的计算方式是取所有不同的上游并行数据源 watermark 的最小值,则其 watermark 将不会发生变化。
 
-Here we show two simple examples of timestamp assigners that use periodic 
watermark generation. Note that Flink ships with a 
`BoundedOutOfOrdernessTimestampExtractor` similar to the 
`BoundedOutOfOrdernessGenerator` shown below, which you can read about 
[here]({{ site.baseurl 
}}/dev/event_timestamp_extractors.html#assigners-allowing-a-fixed-amount-of-lateness).
+为了解决这个问题,你可以使用 `WatermarkStrategy` 来检测空闲输入并将其标记为空闲状态。`WatermarkStrategies` 
为此提供了一个工具接口:

Review comment:
   看上去 FLINK-18011(https://github.com/apache/flink/pull/12412) 删掉了 
`WatermarkStrategies` 相关的内容了,这个地方还要麻烦再修改一下(这个文档中其他相关的也修改一下),不好意思之前没有发现这个问题。





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-16404) Avoid caching buffers for blocked input channels before barrier alignment

2020-07-26 Thread Zhijiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165452#comment-17165452
 ] 

Zhijiang commented on FLINK-16404:
--

Yes, that is our intention to not remove this metric from web UI immediately as 
mentioned in the above release note.
I created another separate ticket 
[FLINK-17572|https://issues.apache.org/jira/browse/FLINK-17572] before to 
handle this issue and the PR already submitted by external contributor. But 
[~chesnay] had some concerns for breaking the compatible of REST API when 
reviewing the PR.

> Avoid caching buffers for blocked input channels before barrier alignment
> -
>
> Key: FLINK-16404
> URL: https://issues.apache.org/jira/browse/FLINK-16404
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Zhijiang
>Assignee: Yingjie Cao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> One motivation of this issue is for reducing the in-flight data in the case 
> of back pressure to speed up checkpoint. The current default exclusive 
> buffers per channel is 2. If we reduce it to 0 and increase somewhat floating 
> buffers for compensation, it might cause deadlock problem because all the 
> floating buffers might be requested away by some blocked input channels and 
> never recycled until barrier alignment.
> In order to solve above deadlock concern, we can make some logic changes on 
> both sender and receiver sides.
>  * Sender side: It should revoke previous received credit after sending 
> checkpoint barrier, that means it would not send any following buffers until 
> receiving new credits.
>  * Receiver side: The respective channel releases the requested floating 
> buffers if barrier is received from the network. After barrier alignment, it 
> would request floating buffers for the channels with positive backlog, and 
> notify the sender side of available credits. Then the sender can continue 
> transporting the buffers.
> Based on above changes, we can also remove the `BufferStorage` component 
> completely, because the receiver would never reading buffers for blocked 
> channels. Another possible benefit is that the floating buffers might be more 
> properly made use of before barrier alignment.
> The only side effect would bring somehow cold setup after barrier alignment. 
> That means the sender side has to wait for credit feedback to transport data 
> just after alignment, which would impact on delay and network throughput. But 
> considering the checkpoint interval not too short in general, so the above 
> side effect can be ignored in practice. We can further verify it via existing 
> micro-benchmark.
> After this ticket done, we still can not set exclusive buffers to zero ATM, 
> there exists another deadlock issue which would be solved separately in 
> another ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17767) Tumble and Hop window support offset

2020-07-26 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165451#comment-17165451
 ] 

Jark Wu commented on FLINK-17767:
-

CALCITE-4000 is for the TUMBLE TVF, not the TUMBLE window group function. 
AFAIK, the offset type in TUMBLE window group function is time type. For 
example: {{group by tumble(rowtime, interval '2' hour, time '00:12:00')}}. 
cc [~danny0405]

> Tumble and Hop window support offset
> 
>
> Key: FLINK-17767
> URL: https://issues.apache.org/jira/browse/FLINK-17767
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.10.0
>Reporter: hailong wang
>Assignee: hailong wang
>Priority: Major
> Fix For: 1.12.0
>
>
> TUMBLE window and HOP window with alignment is not supported yet. We can 
> support by 
> (, , )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18713) Allow default ms unit for table.exec.mini-batch.allow-latency etc.

2020-07-26 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165448#comment-17165448
 ] 

Jark Wu commented on FLINK-18713:
-

Yes. I think we should update them. All of them should be defined as 
{{ConfigOption}} now. The {{ConfigOption}} doesn't support 
{{Duration}} when we having above options. 

> Allow default ms unit for table.exec.mini-batch.allow-latency etc.
> --
>
> Key: FLINK-18713
> URL: https://issues.apache.org/jira/browse/FLINK-18713
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: hailong wang
>Priority: Major
> Fix For: 1.12.0
>
>
> We use `scala.concurrent.duration.Duration.create` to parse timeStr in 
> `TableConfigUtils#
> getMillisecondFromConfigDuration` for the following properties,
> {code:java}
> table.exec.async-lookup.timeout
> table.exec.source.idle-timeout
> table.exec.mini-batch.allow-latency
> table.exec.emit.early-fire.delay
> table.exec.emit.late-fire.delay{code}
> And it must has the unit.
> I think we can replace it with `TimeUtils.parseDuration(timeStr)` to parse 
> timeStr just like `DescriptorProperties#getOptionalDuration` to has default 
> ms unit and be consistent.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12917: [WIP] [FLINK-18355][tests] Simplify tests of SlotPoolImpl

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12917:
URL: https://github.com/apache/flink/pull/12917#issuecomment-659940433


   
   ## CI report:
   
   * 664fa0ad0056da67af2a8580cf6ebfc82dad9ecc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4913)
 
   * 1c5870c57b64436014900d67be2005395e007a52 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18122) Kubernetes test fails with "error: timed out waiting for the condition on jobs/flink-job-cluster"

2020-07-26 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165436#comment-17165436
 ] 

Dian Fu commented on FLINK-18122:
-

Thanks [~fly_in_gis], you are right. Have created a separate JIRA: 
https://issues.apache.org/jira/browse/FLINK-18725

> Kubernetes test fails with "error: timed out waiting for the condition on 
> jobs/flink-job-cluster"
> -
>
> Key: FLINK-18122
> URL: https://issues.apache.org/jira/browse/FLINK-18122
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, Tests
>Affects Versions: 1.11.0
>Reporter: Robert Metzger
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2697=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> {code}
> 2020-06-04T09:25:40.7205843Z service/flink-job-cluster created
> 2020-06-04T09:25:40.9661515Z job.batch/flink-job-cluster created
> 2020-06-04T09:25:41.2189123Z deployment.apps/flink-task-manager created
> 2020-06-04T10:32:32.6402983Z error: timed out waiting for the condition on 
> jobs/flink-job-cluster
> 2020-06-04T10:32:33.8057757Z error: unable to upgrade connection: container 
> not found ("flink-task-manager")
> 2020-06-04T10:32:33.8111302Z sort: cannot read: 
> '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-56335570120/out/kubernetes_wc_out*':
>  No such file or directory
> 2020-06-04T10:32:33.8124455Z FAIL WordCount: Output hash mismatch.  Got 
> d41d8cd98f00b204e9800998ecf8427e, expected e682ec6622b5e83f2eb614617d5ab2cf.
> 2020-06-04T10:32:33.8125379Z head hexdump of actual:
> 2020-06-04T10:32:33.8136133Z head: cannot open 
> '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-56335570120/out/kubernetes_wc_out*'
>  for reading: No such file or directory
> 2020-06-04T10:32:33.8344715Z Debugging failed Kubernetes test:
> 2020-06-04T10:32:33.8345469Z Currently existing Kubernetes resources
> 2020-06-04T10:32:36.4977853Z I0604 10:32:36.497383   13191 request.go:621] 
> Throttling request took 1.198606989s, request: 
> GET:https://10.1.0.4:8443/apis/rbac.authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:32:46.6975735Z I0604 10:32:46.697234   13191 request.go:621] 
> Throttling request took 4.398107353s, request: 
> GET:https://10.1.0.4:8443/apis/authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:32:57.4978637Z I0604 10:32:57.497209   13191 request.go:621] 
> Throttling request took 1.198449167s, request: 
> GET:https://10.1.0.4:8443/apis/apps/v1?timeout=32s
> 2020-06-04T10:33:07.4980104Z I0604 10:33:07.497320   13191 request.go:621] 
> Throttling request took 4.198274438s, request: 
> GET:https://10.1.0.4:8443/apis/apiextensions.k8s.io/v1?timeout=32s
> 2020-06-04T10:33:18.4976060Z I0604 10:33:18.497258   13191 request.go:621] 
> Throttling request took 1.19871495s, request: 
> GET:https://10.1.0.4:8443/apis/apps/v1?timeout=32s
> 2020-06-04T10:33:28.4979129Z I0604 10:33:28.497276   13191 request.go:621] 
> Throttling request took 4.198369672s, request: 
> GET:https://10.1.0.4:8443/apis/rbac.authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:33:30.9182069Z NAME READY   
> STATUS  RESTARTS   AGE
> 2020-06-04T10:33:30.9184099Z pod/flink-job-cluster-dtb67  0/1 
> ErrImageNeverPull   0  67m
> 2020-06-04T10:33:30.9184869Z pod/flink-task-manager-74ccc9bd9-psqwm   0/1 
> ErrImageNeverPull   0  67m
> 2020-06-04T10:33:30.9185226Z 
> 2020-06-04T10:33:30.9185926Z NAMETYPE
> CLUSTER-IP  EXTERNAL-IP   PORT(S) 
>   AGE
> 2020-06-04T10:33:30.9186832Z service/flink-job-cluster   NodePort
> 10.111.92.199   
> 6123:32501/TCP,6124:31360/TCP,6125:30025/TCP,8081:30081/TCP   67m
> 2020-06-04T10:33:30.9187545Z service/kubernetes  ClusterIP   
> 10.96.0.1   443/TCP 
>   68m
> 2020-06-04T10:33:30.9187976Z 
> 2020-06-04T10:33:30.9188472Z NAME READY   
> UP-TO-DATE   AVAILABLE   AGE
> 2020-06-04T10:33:30.9189179Z deployment.apps/flink-task-manager   0/1 1   
>  0   67m
> 2020-06-04T10:33:30.9189508Z 
> 2020-06-04T10:33:30.9189815Z NAME   
> DESIRED   CURRENT   READY   AGE
> 2020-06-04T10:33:30.9190418Z replicaset.apps/flink-task-manager-74ccc9bd9   1 
> 1 0   67m
> 2020-06-04T10:33:30.9190662Z 
> 2020-06-04T10:33:30.9190891Z NAME  COMPLETIONS   
> DURATION   AGE
> 2020-06-04T10:33:30.9191423Z job.batch/flink-job-cluster   0/1   

[jira] [Updated] (FLINK-18725) "Run Kubernetes test" failed with "30025: provided port is already allocated"

2020-07-26 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-18725:

Issue Type: Test  (was: Bug)

> "Run Kubernetes test" failed with "30025: provided port is already allocated"
> -
>
> Key: FLINK-18725
> URL: https://issues.apache.org/jira/browse/FLINK-18725
> Project: Flink
>  Issue Type: Test
>  Components: Deployment / Kubernetes, Tests
>Affects Versions: 1.11.0, 1.11.1
>Reporter: Dian Fu
>Priority: Major
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4901=logs=08866332-78f7-59e4-4f7e-49a56faa3179=3e8647c1-5a28-5917-dd93-bf78594ea994
> {code}
> The Service "flink-job-cluster" is invalid: spec.ports[2].nodePort: Invalid 
> value: 30025: provided port is already allocated
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18725) "Run Kubernetes test" failed with "30025: provided port is already allocated"

2020-07-26 Thread Dian Fu (Jira)
Dian Fu created FLINK-18725:
---

 Summary: "Run Kubernetes test" failed with "30025: provided port 
is already allocated"
 Key: FLINK-18725
 URL: https://issues.apache.org/jira/browse/FLINK-18725
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes, Tests
Affects Versions: 1.11.1, 1.11.0
Reporter: Dian Fu


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4901=logs=08866332-78f7-59e4-4f7e-49a56faa3179=3e8647c1-5a28-5917-dd93-bf78594ea994

{code}
The Service "flink-job-cluster" is invalid: spec.ports[2].nodePort: Invalid 
value: 30025: provided port is already allocated
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12917: [WIP] [FLINK-18355][tests] Simplify tests of SlotPoolImpl

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12917:
URL: https://github.com/apache/flink/pull/12917#issuecomment-659940433


   
   ## CI report:
   
   * 664fa0ad0056da67af2a8580cf6ebfc82dad9ecc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4913)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12917: [WIP] [FLINK-18355][tests] Simplify tests of SlotPoolImpl

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12917:
URL: https://github.com/apache/flink/pull/12917#issuecomment-659940433


   
   ## CI report:
   
   * 664fa0ad0056da67af2a8580cf6ebfc82dad9ecc UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12958: [FLINK-18625][runtime] Maintain redundant taskmanagers to speed up failover

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12958:
URL: https://github.com/apache/flink/pull/12958#issuecomment-662378507


   
   ## CI report:
   
   * deb09ce58822dda5b7388c2928b8a119879d9749 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4912)
 
   * 35ddeb1f517c78a9237ab5b5814dd504487acc00 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4914)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue

2020-07-26 Thread Farnight (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165422#comment-17165422
 ] 

Farnight commented on FLINK-18712:
--

Thanks a lot [~yunta]! We are trying a simple job  ( remove all business 
related), and do the testing to reproduce this. We share more detail after the 
testing.

> Flink RocksDB statebackend memory leak issue 
> -
>
> Key: FLINK-18712
> URL: https://issues.apache.org/jira/browse/FLINK-18712
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.10.0
>Reporter: Farnight
>Priority: Critical
>
> When using RocksDB as our statebackend, we found it will lead to memory leak 
> when restarting job (manually or in recovery case).
>  
> How to reproduce:
>  # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and 
> reproduce.
>  # start a job using RocksDB statebackend.
>  # when the RocksDB blockcache reachs maximum size, restart the job. and 
> monitor the memory usage (k8s pod working set) of the TM.
>  # go through step 2-3 few more times. and memory will keep raising.
>  
> Any solution or suggestion for this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12958: [FLINK-18625][runtime] Maintain redundant taskmanagers to speed up failover

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12958:
URL: https://github.com/apache/flink/pull/12958#issuecomment-662378507


   
   ## CI report:
   
   * deb09ce58822dda5b7388c2928b8a119879d9749 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4912)
 
   * 35ddeb1f517c78a9237ab5b5814dd504487acc00 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18496) Anchors are not generated based on ZH characters

2020-07-26 Thread Zhilong Hong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165414#comment-17165414
 ] 

Zhilong Hong commented on FLINK-18496:
--

Now I'm trying to remove the document files one by one to find the exact one 
(or ones) with mistakes. But I think it's too slow to figure out it in this 
way. So I'm wondering is there anyone experienced in debugging the documents or 
jekyll? And how should I find out the mistake in flink docs in a better way?

> Anchors are not generated based on ZH characters
> 
>
> Key: FLINK-18496
> URL: https://issues.apache.org/jira/browse/FLINK-18496
> Project: Flink
>  Issue Type: Bug
>  Components: Project Website
>Reporter: Zhu Zhu
>Assignee: Zhilong Hong
>Priority: Major
>  Labels: starter
>
> In ZH version pages of flink-web, the anchors are not generated based on ZH 
> characters. The anchor name would be like 'section-1', 'section-2' if there 
> is no EN characters. An example can be the links in the navigator of 
> https://flink.apache.org/zh/contributing/contribute-code.html
> This makes it impossible to ref an anchor from the content because the anchor 
> name might change unexpectedly if a new section is added.
> Note that it is a problem for flink-web only. The docs generated from the 
> flink repo can properly generate ZH anchors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18723) performance test

2020-07-26 Thread junbiao chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165413#comment-17165413
 ] 

junbiao chen commented on FLINK-18723:
--

Thanks,I close it immediately

> performance test
> 
>
> Key: FLINK-18723
> URL: https://issues.apache.org/jira/browse/FLINK-18723
> Project: Flink
>  Issue Type: Test
>Reporter: junbiao chen
>Priority: Major
>
> what is the best performance benchmark for flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (FLINK-18723) performance test

2020-07-26 Thread junbiao chen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

junbiao chen updated FLINK-18723:
-
Comment: was deleted

(was: Thanks,I close it immediately)

> performance test
> 
>
> Key: FLINK-18723
> URL: https://issues.apache.org/jira/browse/FLINK-18723
> Project: Flink
>  Issue Type: Test
>Reporter: junbiao chen
>Priority: Major
>
> what is the best performance benchmark for flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-18723) performance test

2020-07-26 Thread junbiao chen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

junbiao chen closed FLINK-18723.

Resolution: Not A Problem

> performance test
> 
>
> Key: FLINK-18723
> URL: https://issues.apache.org/jira/browse/FLINK-18723
> Project: Flink
>  Issue Type: Test
>Reporter: junbiao chen
>Priority: Major
>
> what is the best performance benchmark for flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-18723) performance test

2020-07-26 Thread Kenneth William Krugler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165409#comment-17165409
 ] 

Kenneth William Krugler edited comment on FLINK-18723 at 7/27/20, 3:38 AM:
---

Hi [~dahaishuantuoba] - this is a question best asked via the [Flink mailing 
list|[https://flink.apache.org/community.html]], or on [Stack 
Overflow|[https://stackoverflow.com/questions/tagged/apache-flink]]. If would 
be great if you could ask on one of those two venues, and close this issue, 
thanks!


was (Author: kkrugler):
Hi [~dahaishuantuoba] - this is a question best asked via the [Flink mailing 
list|[https://flink.apache.org/community.html]], or on [Stack 
Overflow|[https://stackoverflow.com/questions/tagged/apache-flink]].

> performance test
> 
>
> Key: FLINK-18723
> URL: https://issues.apache.org/jira/browse/FLINK-18723
> Project: Flink
>  Issue Type: Test
>Reporter: junbiao chen
>Priority: Major
>
> what is the best performance benchmark for flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18122) Kubernetes test fails with "error: timed out waiting for the condition on jobs/flink-job-cluster"

2020-07-26 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165399#comment-17165399
 ] 

Yang Wang commented on FLINK-18122:
---

[~dian.fu] I think your instance is caused by the port conflict. The specified 
node port "30025" is used by other processes. So it failed to create 
"flink-job-cluster" service. 

 
{code:java}
The Service "flink-job-cluster" is invalid: spec.ports[2].nodePort: Invalid 
value: 30025: provided port is already allocated

{code}

> Kubernetes test fails with "error: timed out waiting for the condition on 
> jobs/flink-job-cluster"
> -
>
> Key: FLINK-18122
> URL: https://issues.apache.org/jira/browse/FLINK-18122
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, Tests
>Affects Versions: 1.11.0
>Reporter: Robert Metzger
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2697=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> {code}
> 2020-06-04T09:25:40.7205843Z service/flink-job-cluster created
> 2020-06-04T09:25:40.9661515Z job.batch/flink-job-cluster created
> 2020-06-04T09:25:41.2189123Z deployment.apps/flink-task-manager created
> 2020-06-04T10:32:32.6402983Z error: timed out waiting for the condition on 
> jobs/flink-job-cluster
> 2020-06-04T10:32:33.8057757Z error: unable to upgrade connection: container 
> not found ("flink-task-manager")
> 2020-06-04T10:32:33.8111302Z sort: cannot read: 
> '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-56335570120/out/kubernetes_wc_out*':
>  No such file or directory
> 2020-06-04T10:32:33.8124455Z FAIL WordCount: Output hash mismatch.  Got 
> d41d8cd98f00b204e9800998ecf8427e, expected e682ec6622b5e83f2eb614617d5ab2cf.
> 2020-06-04T10:32:33.8125379Z head hexdump of actual:
> 2020-06-04T10:32:33.8136133Z head: cannot open 
> '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-56335570120/out/kubernetes_wc_out*'
>  for reading: No such file or directory
> 2020-06-04T10:32:33.8344715Z Debugging failed Kubernetes test:
> 2020-06-04T10:32:33.8345469Z Currently existing Kubernetes resources
> 2020-06-04T10:32:36.4977853Z I0604 10:32:36.497383   13191 request.go:621] 
> Throttling request took 1.198606989s, request: 
> GET:https://10.1.0.4:8443/apis/rbac.authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:32:46.6975735Z I0604 10:32:46.697234   13191 request.go:621] 
> Throttling request took 4.398107353s, request: 
> GET:https://10.1.0.4:8443/apis/authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:32:57.4978637Z I0604 10:32:57.497209   13191 request.go:621] 
> Throttling request took 1.198449167s, request: 
> GET:https://10.1.0.4:8443/apis/apps/v1?timeout=32s
> 2020-06-04T10:33:07.4980104Z I0604 10:33:07.497320   13191 request.go:621] 
> Throttling request took 4.198274438s, request: 
> GET:https://10.1.0.4:8443/apis/apiextensions.k8s.io/v1?timeout=32s
> 2020-06-04T10:33:18.4976060Z I0604 10:33:18.497258   13191 request.go:621] 
> Throttling request took 1.19871495s, request: 
> GET:https://10.1.0.4:8443/apis/apps/v1?timeout=32s
> 2020-06-04T10:33:28.4979129Z I0604 10:33:28.497276   13191 request.go:621] 
> Throttling request took 4.198369672s, request: 
> GET:https://10.1.0.4:8443/apis/rbac.authorization.k8s.io/v1?timeout=32s
> 2020-06-04T10:33:30.9182069Z NAME READY   
> STATUS  RESTARTS   AGE
> 2020-06-04T10:33:30.9184099Z pod/flink-job-cluster-dtb67  0/1 
> ErrImageNeverPull   0  67m
> 2020-06-04T10:33:30.9184869Z pod/flink-task-manager-74ccc9bd9-psqwm   0/1 
> ErrImageNeverPull   0  67m
> 2020-06-04T10:33:30.9185226Z 
> 2020-06-04T10:33:30.9185926Z NAMETYPE
> CLUSTER-IP  EXTERNAL-IP   PORT(S) 
>   AGE
> 2020-06-04T10:33:30.9186832Z service/flink-job-cluster   NodePort
> 10.111.92.199   
> 6123:32501/TCP,6124:31360/TCP,6125:30025/TCP,8081:30081/TCP   67m
> 2020-06-04T10:33:30.9187545Z service/kubernetes  ClusterIP   
> 10.96.0.1   443/TCP 
>   68m
> 2020-06-04T10:33:30.9187976Z 
> 2020-06-04T10:33:30.9188472Z NAME READY   
> UP-TO-DATE   AVAILABLE   AGE
> 2020-06-04T10:33:30.9189179Z deployment.apps/flink-task-manager   0/1 1   
>  0   67m
> 2020-06-04T10:33:30.9189508Z 
> 2020-06-04T10:33:30.9189815Z NAME   
> DESIRED   CURRENT   READY   AGE
> 2020-06-04T10:33:30.9190418Z replicaset.apps/flink-task-manager-74ccc9bd9   1 
> 1  

[jira] [Commented] (FLINK-18723) performance test

2020-07-26 Thread Kenneth William Krugler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165409#comment-17165409
 ] 

Kenneth William Krugler commented on FLINK-18723:
-

Hi [~dahaishuantuoba] - this is a question best asked via the [Flink mailing 
list|[https://flink.apache.org/community.html]], or on [Stack 
Overflow|[https://stackoverflow.com/questions/tagged/apache-flink]].

> performance test
> 
>
> Key: FLINK-18723
> URL: https://issues.apache.org/jira/browse/FLINK-18723
> Project: Flink
>  Issue Type: Test
>Reporter: junbiao chen
>Priority: Major
>
> what is the best performance benchmark for flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18717) reuse MiniCluster in table integration test class ?

2020-07-26 Thread godfrey he (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165405#comment-17165405
 ] 

godfrey he commented on FLINK-18717:


cc [~kkl0u]

> reuse MiniCluster in table integration test class ? 
> 
>
> Key: FLINK-18717
> URL: https://issues.apache.org/jira/browse/FLINK-18717
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Legacy Planner, Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: godfrey he
>Priority: Major
>
> before 1.11, {{MiniCluster}} can be reused in each integration test class. 
> (see TestStreamEnvironment#setAsContext) 
> In 1.11, after we correct the execution behavior of TableEnvironment, 
> StreamTableEnvironment and BatchTableEnvironment (see 
> [FLINK-16363|https://issues.apache.org/jira/browse/FLINK-16363], 
> [FLINK-17126|https://issues.apache.org/jira/browse/FLINK-17126]), MiniCluster 
> will be created for each test case even in same test class (see 
> {{org.apache.flink.client.deployment.executors.LocalExecutor}}). It's better 
> we can reuse {{MiniCluster}} like before. One approach is we provide a new 
> kind of  MiniCluster factory (such as: SessionMiniClusterFactory) instead of 
> using  {{PerJobMiniClusterFactory}}. WDYT ?
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18718) Support for using yarn-cluster CLI parameters on application mode

2020-07-26 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165404#comment-17165404
 ] 

Yang Wang commented on FLINK-18718:
---

[~flolas] Since the community decides to deprecate the cli options in the 
future, see the discussion here[1], they are not supported in Yarn application 
mode, Kubernetes session, Kubernetes application mode now. 

 

[1]. 
https://issues.apache.org/jira/browse/FLINK-15179?focusedCommentId=16995321=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16995321

> Support for using yarn-cluster CLI parameters on application mode
> -
>
> Key: FLINK-18718
> URL: https://issues.apache.org/jira/browse/FLINK-18718
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client
>Affects Versions: 1.11.0, 1.11.1
> Environment: *Flink 1.11*
> *CDH6.2.1*
>Reporter: Felipe Lolas
>Priority: Minor
>
> Hi!
> Just a minor improv. here:
> Now with the new Application Mode, we need to use generic cli configurations 
> for setting up YARN deployments.
> *Example*
> {code:java}
>  -Djobmanager.memory.process.size=1024m 
>  -Dtaskmanager.memory.process.size=1024m 
>  -Dyarn.application.queue=root.test
>  -Dyarn.application.name=flink-hbase 
>  -Dyarn.application.type=flink {code}
> It would be nice to use directly the CLI options available for yarn-cluster. 
> This works in run per-job and session mode *but not in application mode.*
>  
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18724) Integration with DataStream and DataSet API report error

2020-07-26 Thread liang ding (Jira)
liang ding created FLINK-18724:
--

 Summary: Integration with DataStream and DataSet API report error 
 Key: FLINK-18724
 URL: https://issues.apache.org/jira/browse/FLINK-18724
 Project: Flink
  Issue Type: Bug
  Components: API / Core, Connectors / Kafka, Table SQL / API
Affects Versions: 1.11.1
Reporter: liang ding


I want to create a table from a DataStream(kafka) : there is two reason I need 
to use DataStream:
 # I need decode msg to columns by custom format, in sql mode I don't known how 
to do it.
 # I have realize DeserializationSchema or FlatMapFunction both. when use 
datastream I can do many things before it become a suitable table, that is my 
prefer way in any other apply.
 so I do it like that:

{code:java}
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings tSet= 
EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
StreamTableEnvironment tEnv=StreamTableEnvironment.create(env,tSet);
DataStream stream = env
.addSource(new FlinkKafkaConsumer<>("test-log", new 
SimpleStringSchema(), properties))
.flatMap(new LogParser());
//stream.printToErr();
tEnv.fromDataStream(stream).select("userId,city").execute().print();
tEnv.execute("test-sql");
//env.execute("test");
{code}
then I got message:
{noformat}
 [Kafka Fetcher for Source: Custom Source -> Flat Map ->* -> select: 
(userId,city) -> to: Row (3/3)] INFO 
org.apache.kafka.clients.FetchSessionHandler - [Consumer 
clientId=consumer-flink-3-5, groupId=flink-3] Node 0 sent an invalid full fetch 
response with extra=(test-log-0, response=(
 [Kafka Fetcher for Source: Custom Source -> Flat Map ->* -> select: 
(userId,city) -> to: Row (3/3)] INFO 
org.apache.kafka.clients.FetchSessionHandler - [Consumer 
clientId=consumer-flink-3-5, groupId=flink-3] Node 0 sent an invalid full fetch 
response with extra=(test-log-1, response=({noformat}
it seen like both StreamExecutionEnvironment and StreamTableEnvironment start 
the fetcher and make no one successed.

and there is no guide Integration which made me confused: should I do 
env.execute or 
 tableEnv.execute or both(it's seen not) ? and the blink planner way



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18723) performance test

2020-07-26 Thread junbiao chen (Jira)
junbiao chen created FLINK-18723:


 Summary: performance test
 Key: FLINK-18723
 URL: https://issues.apache.org/jira/browse/FLINK-18723
 Project: Flink
  Issue Type: Test
Reporter: junbiao chen


what is the best performance benchmark for flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-10674) Fix handling of retractions after clean up

2020-07-26 Thread Xingxing Di (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165401#comment-17165401
 ] 

Xingxing Di commented on FLINK-10674:
-

Hi Timo,  I saw this issue has been fixed in 1.7.0 but today we still got the 
exception under flink 1.7.2, should we reopen this issue?
 !screenshot-1.png! 

> Fix handling of retractions after clean up
> --
>
> Key: FLINK-10674
> URL: https://issues.apache.org/jira/browse/FLINK-10674
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.6.1
> Environment: Flink 1.6.0
>Reporter: ambition
>Assignee: Timo Walther
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.6.3, 1.7.0
>
> Attachments: image-2018-10-25-14-46-03-373.png, screenshot-1.png
>
>
> Our online Flink Job run about a week,job contain sql :
> {code:java}
> select  `time`,  
> lower(trim(os_type)) as os_type, 
> count(distinct feed_id) as feed_total_view  
> from  my_table 
> group by `time`, lower(trim(os_type)){code}
>  
>   then occur NPE: 
>  
> {code:java}
> java.lang.NullPointerException
> at scala.Predef$.Long2long(Predef.scala:363)
> at 
> org.apache.flink.table.functions.aggfunctions.DistinctAccumulator.remove(DistinctAccumulator.scala:109)
> at NonWindowedAggregationHelper$894.retract(Unknown Source)
> at 
> org.apache.flink.table.runtime.aggregate.GroupAggProcessFunction.processElement(GroupAggProcessFunction.scala:124)
> at 
> org.apache.flink.table.runtime.aggregate.GroupAggProcessFunction.processElement(GroupAggProcessFunction.scala:39)
> at 
> org.apache.flink.streaming.api.operators.LegacyKeyedProcessOperator.processElement(LegacyKeyedProcessOperator.java:88)
> at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>  
>  
> View DistinctAccumulator.remove
>  !image-2018-10-25-14-46-03-373.png!
>  
> this NPE should currentCnt = null lead to, so we simple handle like :
> {code:java}
> def remove(params: Row): Boolean = {
>   if(!distinctValueMap.contains(params)){
> true
>   }else{
> val currentCnt = distinctValueMap.get(params)
> // 
> if (currentCnt == null || currentCnt == 1) {
>   distinctValueMap.remove(params)
>   true
> } else {
>   var value = currentCnt - 1L
>   if(value < 0){
> value = 1
>   }
>   distinctValueMap.put(params, value)
>   false
> }
>   }
> }{code}
>  
> Update:
> Because state clean up happens in processing time, it might be
>  the case that retractions are arriving after the state has
>  been cleaned up. Before these changes, a new accumulator was
>  created and invalid retraction messages were emitted. This
>  change drops retraction messages for which no accumulator
>  exists. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-10674) Fix handling of retractions after clean up

2020-07-26 Thread Xingxing Di (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingxing Di updated FLINK-10674:

Attachment: screenshot-1.png

> Fix handling of retractions after clean up
> --
>
> Key: FLINK-10674
> URL: https://issues.apache.org/jira/browse/FLINK-10674
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.6.1
> Environment: Flink 1.6.0
>Reporter: ambition
>Assignee: Timo Walther
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.6.3, 1.7.0
>
> Attachments: image-2018-10-25-14-46-03-373.png, screenshot-1.png
>
>
> Our online Flink Job run about a week,job contain sql :
> {code:java}
> select  `time`,  
> lower(trim(os_type)) as os_type, 
> count(distinct feed_id) as feed_total_view  
> from  my_table 
> group by `time`, lower(trim(os_type)){code}
>  
>   then occur NPE: 
>  
> {code:java}
> java.lang.NullPointerException
> at scala.Predef$.Long2long(Predef.scala:363)
> at 
> org.apache.flink.table.functions.aggfunctions.DistinctAccumulator.remove(DistinctAccumulator.scala:109)
> at NonWindowedAggregationHelper$894.retract(Unknown Source)
> at 
> org.apache.flink.table.runtime.aggregate.GroupAggProcessFunction.processElement(GroupAggProcessFunction.scala:124)
> at 
> org.apache.flink.table.runtime.aggregate.GroupAggProcessFunction.processElement(GroupAggProcessFunction.scala:39)
> at 
> org.apache.flink.streaming.api.operators.LegacyKeyedProcessOperator.processElement(LegacyKeyedProcessOperator.java:88)
> at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>  
>  
> View DistinctAccumulator.remove
>  !image-2018-10-25-14-46-03-373.png!
>  
> this NPE should currentCnt = null lead to, so we simple handle like :
> {code:java}
> def remove(params: Row): Boolean = {
>   if(!distinctValueMap.contains(params)){
> true
>   }else{
> val currentCnt = distinctValueMap.get(params)
> // 
> if (currentCnt == null || currentCnt == 1) {
>   distinctValueMap.remove(params)
>   true
> } else {
>   var value = currentCnt - 1L
>   if(value < 0){
> value = 1
>   }
>   distinctValueMap.put(params, value)
>   false
> }
>   }
> }{code}
>  
> Update:
> Because state clean up happens in processing time, it might be
>  the case that retractions are arriving after the state has
>  been cleaned up. Before these changes, a new accumulator was
>  created and invalid retraction messages were emitted. This
>  change drops retraction messages for which no accumulator
>  exists. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18681) The jar package version conflict causes the task to continue to increase and grab resources

2020-07-26 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165397#comment-17165397
 ] 

Xintong Song commented on FLINK-18681:
--

Hi [~apach...@163.com],

It seems I've misunderstood what you meant. I thought you were complaining 
about the dependency conflict, but actually you were complaining about the 
resource increasing caused by the dependency conflict. I apologize for that 
misunderstanding.

I think you are right. This is indeed a problem that one problematic Flink 
application impacts the entire cluster by taking lots of resources. Could you 
describe a bit more information to help us understand the problem better? I'm 
particularly interested in how exactly does the resources increase.
- Does the application keeps requesting new containers but never release them?
- What are the status of requested containers? Are the actually allocated or 
reserved?
- Are there AM failovers? (In other words, do you see many application attempts 
on Yarn?)

It would also be helpful if you can share the complete JM logs, if there's no 
sensitive information.

> The jar package version conflict causes the task to continue to increase and 
> grab resources
> ---
>
> Key: FLINK-18681
> URL: https://issues.apache.org/jira/browse/FLINK-18681
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: wangtaiyang
>Priority: Major
>
> When I submit a flink task to yarn, the default resource configuration is 
> 1G&1core, but in fact this task will always increase resources 2core, 3core, 
> and so on. . . 200core. . . Then I went to look at the JM log and found the 
> following error:
> {code:java}
> //代码占位符
> java.lang.NoSuchMethodError: 
> org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;java.lang.NoSuchMethodError:
>  
> org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;
>  at 
> org.apache.flink.runtime.entrypoint.parser.CommandLineOptions.(CommandLineOptions.java:28)
>  ~[flink-dist_2.11-1.11.1.jar:1.11.1] at 
> org.apache.flink.runtime.clusterframework.BootstrapTools.lambda$getDynamicPropertiesAsString$0(BootstrapTools.java:648)
>  ~[flink-dist_2.11-1.11.1.jar:1.11.1] at 
> java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) 
> ~[?:1.8.0_191]
> ...
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.runtime.entrypoint.parser.CommandLineOptionsjava.lang.NoClassDefFoundError:
>  Could not initialize class 
> org.apache.flink.runtime.entrypoint.parser.CommandLineOptions at 
> org.apache.flink.runtime.clusterframework.BootstrapTools.lambda$getDynamicPropertiesAsString$0(BootstrapTools.java:648)
>  ~[flink-dist_2.11-1.11.1.jar:1.11.1] at 
> java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) 
> ~[?:1.8.0_191] at 
> java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1553) 
> ~[?:1.8.0_191] at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) 
> ~[?:1.8.0_191]{code}
> Finally, it is confirmed that it is caused by the commands-cli version 
> conflict, but the task reporting error has not stopped and will continue to 
> grab resources and increase. Is this a bug?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17767) Tumble and Hop window support offset

2020-07-26 Thread hailong wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165396#comment-17165396
 ] 

hailong wang commented on FLINK-17767:
--

Hi [~jark] , what do you think which type of  window offset may be better,

SqlTypeFamily.TIME or SqlTypeFamily.DATETIME_INTERVAL? Calcite has a window 
table function with offset of DATETIME_INTERVAL[1].

[1]https://issues.apache.org/jira/browse/CALCITE-4000

> Tumble and Hop window support offset
> 
>
> Key: FLINK-17767
> URL: https://issues.apache.org/jira/browse/FLINK-17767
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.10.0
>Reporter: hailong wang
>Assignee: hailong wang
>Priority: Major
> Fix For: 1.12.0
>
>
> TUMBLE window and HOP window with alignment is not supported yet. We can 
> support by 
> (, , )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12958: [FLINK-18625][runtime] Maintain redundant taskmanagers to speed up failover

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12958:
URL: https://github.com/apache/flink/pull/12958#issuecomment-662378507


   
   ## CI report:
   
   * deb09ce58822dda5b7388c2928b8a119879d9749 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4912)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18438) TaskManager start failed

2020-07-26 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165393#comment-17165393
 ] 

Xintong Song commented on FLINK-18438:
--

[~JohnSiro],

I could not tell what is the problem neither. Everything looks fine to me. 
Sorry for not being helpful.

Maybe you can try a newer build of JDK8?

> TaskManager start failed
> 
>
> Key: FLINK-18438
> URL: https://issues.apache.org/jira/browse/FLINK-18438
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Affects Versions: 1.10.1
> Environment: Java:   java version "1.8.0_101"
>           Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
>           Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) 
> Flink: 1.10.1 (flink-1.10.1-bin-scala_2.12.tgz)
> OS: Windows 10 (1903) / 64-bits
>Reporter: JohnSiro
>Priority: Major
>
>  
> Error:  in file  xxx-taskexecutor-0-xxx.out is:
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit.
> Improperly specified VM option 'MaxMetaspaceSize=268435456 '



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18719) Define the interfaces and introduce ActiveResourceManager

2020-07-26 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-18719:
-
Fix Version/s: 1.12.0

> Define the interfaces and introduce ActiveResourceManager
> -
>
> Key: FLINK-18719
> URL: https://issues.apache.org/jira/browse/FLINK-18719
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Xintong Song
>Priority: Major
> Fix For: 1.12.0
>
>
> * Define the interface ResourceManagerDriver and ResourceEventHandler
> * Rename the original ActiveResourceManager to LegacyActiveResourceManager.
> * Introduce the new ActiveResourceManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-07-26 Thread Congxian Qiu(klion26) (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165387#comment-17165387
 ] 

Congxian Qiu(klion26) commented on FLINK-18675:
---

Hi [~raviratnakar] from the git history, the code at line number 1512 was added 
in 1.9.0(and seems there change would not affect this problem), IIUC, we would 
check whether the checkpoint can be triggered somewhere, I need to check it 
carefully as the code changed a lot. will reply here If found anything.

> Checkpoint not maintaining minimum pause duration between checkpoints
> -
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0
> Environment: !image.png!
>Reporter: Ravi Bhushan Ratnakar
>Priority: Critical
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18620) Unify behaviors of active resource managers

2020-07-26 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-18620:
-
Fix Version/s: 1.12.0

> Unify behaviors of active resource managers
> ---
>
> Key: FLINK-18620
> URL: https://issues.apache.org/jira/browse/FLINK-18620
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Major
> Fix For: 1.12.0
>
>
> Flink supports various deployment modes: standalone, Kubernetes, Yarn & 
> Mesos. For each deployment mode, a resource manager is implemented for 
> managing the resources.
> While StandaloneResourceManager is quite different from the others by not 
> being able to dynamically request and release resources, the other three 
> (KubernetesResourceManager, YarnResourceManager and MesosResourceManager) 
> share many logics in common. These common logics are currently duplicately 
> implemented by each of the active resource managers. Such duplication leads 
> to extra maintaining overhead and amplifies stability risks.
> This ticket proposes a refactor design for the resource managers, with better 
> abstraction deduplicating common logics implementations and minimizing the 
> deployment specific behaviors.
> This proposal is a pure refactor effort. It does not intend to change any of 
> the current resource management behaviors.
> A detailed design doc and a simplified proof-of-concept implementation for 
> the Kubernetes deployment are linked to this ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-18719) Define the interfaces and introduce ActiveResourceManager

2020-07-26 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-18719:


Assignee: Xintong Song

> Define the interfaces and introduce ActiveResourceManager
> -
>
> Key: FLINK-18719
> URL: https://issues.apache.org/jira/browse/FLINK-18719
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Major
> Fix For: 1.12.0
>
>
> * Define the interface ResourceManagerDriver and ResourceEventHandler
> * Rename the original ActiveResourceManager to LegacyActiveResourceManager.
> * Introduce the new ActiveResourceManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18722) Migrate MesosResourceManager to the new MesosResourceManagerDriver

2020-07-26 Thread Xintong Song (Jira)
Xintong Song created FLINK-18722:


 Summary: Migrate MesosResourceManager to the new 
MesosResourceManagerDriver
 Key: FLINK-18722
 URL: https://issues.apache.org/jira/browse/FLINK-18722
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Mesos, Runtime / Coordination
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18721) Migrate YarnResourceManager to the new YarnResourceManagerDriver

2020-07-26 Thread Xintong Song (Jira)
Xintong Song created FLINK-18721:


 Summary: Migrate YarnResourceManager to the new 
YarnResourceManagerDriver
 Key: FLINK-18721
 URL: https://issues.apache.org/jira/browse/FLINK-18721
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / YARN, Runtime / Coordination
Reporter: Xintong Song
 Fix For: 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18720) Migrate KubernetesResourceManager to the new KubernetesResourceManagerDriver

2020-07-26 Thread Xintong Song (Jira)
Xintong Song created FLINK-18720:


 Summary: Migrate KubernetesResourceManager to the new 
KubernetesResourceManagerDriver
 Key: FLINK-18720
 URL: https://issues.apache.org/jira/browse/FLINK-18720
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes, Runtime / Coordination
Reporter: Xintong Song
 Fix For: 1.12.0


* Introduce KubernetesResourceManagerDriver
* Switch to ActiveResourceManager and KubernetesResourceManagerDriver for 
Kubernetes deployment
* Remove KubernetesResourceManager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18719) Define the interfaces and introduce ActiveResourceManager

2020-07-26 Thread Xintong Song (Jira)
Xintong Song created FLINK-18719:


 Summary: Define the interfaces and introduce ActiveResourceManager
 Key: FLINK-18719
 URL: https://issues.apache.org/jira/browse/FLINK-18719
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Xintong Song


* Define the interface ResourceManagerDriver and ResourceEventHandler
* Rename the original ActiveResourceManager to LegacyActiveResourceManager.
* Introduce the new ActiveResourceManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18702) Flink elasticsearch connector leaks threads and classloaders thereof

2020-07-26 Thread Yangze Guo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165385#comment-17165385
 ] 

Yangze Guo commented on FLINK-18702:


I think it is indeed the same issue as FLINK-11205. Not sure does it make sense 
to backport Flink-16408. As a workaround, you could increase the 
taskmanager.memory.jvm-metaspace.size[1].

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#taskmanager-memory-jvm-metaspace-size

> Flink elasticsearch connector leaks threads and classloaders thereof
> 
>
> Key: FLINK-18702
> URL: https://issues.apache.org/jira/browse/FLINK-18702
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / ElasticSearch
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Jun Qin
>Assignee: Jun Qin
>Priority: Major
>
> Flink elasticsearch connector leaking threads and classloaders thereof.  This 
> results in OOM Metaspace when ES sink fails and restarted many times. 
> This issue is visible in Flink 1.10 but not in 1.11 because Flink 1.11 does 
> not create new class loaders in case of recoveries (FLINK-16408)
>  
> Reproduction:
>  * Start a job with ES sink in a Flink 1.10 cluster, without starting the ES 
> cluster.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18080) Translate the "Kerberos Authentication Setup and Configuration" page into Chinese

2020-07-26 Thread pengweibo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165384#comment-17165384
 ] 

pengweibo commented on FLINK-18080:
---

  Jark Wu, could u show me the  original link to this page  ?

> Translate the "Kerberos Authentication Setup and Configuration" page into 
> Chinese
> -
>
> Key: FLINK-18080
> URL: https://issues.apache.org/jira/browse/FLINK-18080
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Yangze Guo
>Assignee: pengweibo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18080) Translate the "Kerberos Authentication Setup and Configuration" page into Chinese

2020-07-26 Thread pengweibo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165382#comment-17165382
 ] 

pengweibo commented on FLINK-18080:
---

Hello Jark Wu

I havent seen these changes not until today and i am willing to translate this 
page ,thanks a lot !

 

Weibo PENG

 

> Translate the "Kerberos Authentication Setup and Configuration" page into 
> Chinese
> -
>
> Key: FLINK-18080
> URL: https://issues.apache.org/jira/browse/FLINK-18080
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Yangze Guo
>Assignee: pengweibo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17730) HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart times out

2020-07-26 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165380#comment-17165380
 ] 

Dian Fu commented on FLINK-17730:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4911=logs=e9af9cde-9a65-5281-a58e-2c8511d36983=603cb7fd-6f38-5c99-efca-877e1439232f

> HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart
>  times out
> 
>
> Key: FLINK-17730
> URL: https://issues.apache.org/jira/browse/FLINK-17730
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, FileSystems, Tests
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0, 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1374=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=34f486e1-e1e4-5dd2-9c06-bfdd9b9c74a8
> After 5 minutes 
> {code}
> 2020-05-15T06:56:38.1688341Z "main" #1 prio=5 os_prio=0 
> tid=0x7fa10800b800 nid=0x1161 runnable [0x7fa110959000]
> 2020-05-15T06:56:38.1688709Zjava.lang.Thread.State: RUNNABLE
> 2020-05-15T06:56:38.1689028Z  at 
> java.net.SocketInputStream.socketRead0(Native Method)
> 2020-05-15T06:56:38.1689496Z  at 
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> 2020-05-15T06:56:38.1689921Z  at 
> java.net.SocketInputStream.read(SocketInputStream.java:171)
> 2020-05-15T06:56:38.1690316Z  at 
> java.net.SocketInputStream.read(SocketInputStream.java:141)
> 2020-05-15T06:56:38.1690723Z  at 
> sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
> 2020-05-15T06:56:38.1691196Z  at 
> sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
> 2020-05-15T06:56:38.1691608Z  at 
> sun.security.ssl.InputRecord.read(InputRecord.java:532)
> 2020-05-15T06:56:38.1692023Z  at 
> sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
> 2020-05-15T06:56:38.1692558Z  - locked <0xb94644f8> (a 
> java.lang.Object)
> 2020-05-15T06:56:38.1692946Z  at 
> sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
> 2020-05-15T06:56:38.1693371Z  at 
> sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
> 2020-05-15T06:56:38.1694151Z  - locked <0xb9464d20> (a 
> sun.security.ssl.AppInputStream)
> 2020-05-15T06:56:38.1694908Z  at 
> org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
> 2020-05-15T06:56:38.1695475Z  at 
> org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198)
> 2020-05-15T06:56:38.1696007Z  at 
> org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176)
> 2020-05-15T06:56:38.1696509Z  at 
> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
> 2020-05-15T06:56:38.1696993Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
> 2020-05-15T06:56:38.1697466Z  at 
> com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> 2020-05-15T06:56:38.1698069Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
> 2020-05-15T06:56:38.1698567Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
> 2020-05-15T06:56:38.1699041Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
> 2020-05-15T06:56:38.1699624Z  at 
> com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> 2020-05-15T06:56:38.1700090Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
> 2020-05-15T06:56:38.1700584Z  at 
> com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107)
> 2020-05-15T06:56:38.1701282Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
> 2020-05-15T06:56:38.1701800Z  at 
> com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125)
> 2020-05-15T06:56:38.1702328Z  at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
> 2020-05-15T06:56:38.1702804Z  at 
> org.apache.hadoop.fs.s3a.S3AInputStream.lambda$read$3(S3AInputStream.java:445)
> 2020-05-15T06:56:38.1703270Z  at 
> org.apache.hadoop.fs.s3a.S3AInputStream$$Lambda$42/1204178174.execute(Unknown 
> Source)
> 2020-05-15T06:56:38.1703677Z  at 
> org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
> 2020-05-15T06:56:38.1704090Z  at 
> org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
> 2020-05-15T06:56:38.1704607Z  at 
> org.apache.hadoop.fs.s3a.Invoker$$Lambda$23/1991724700.execute(Unknown Source)
> 2020-05-15T06:56:38.1705115Z  at 
> 

[GitHub] [flink] flinkbot edited a comment on pull request #12958: [FLINK-18625][runtime] Maintain redundant taskmanagers to speed up failover

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12958:
URL: https://github.com/apache/flink/pull/12958#issuecomment-662378507


   
   ## CI report:
   
   * ca4d714f273259e0c1eb42238d455d331ad93996 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4747)
 
   * deb09ce58822dda5b7388c2928b8a119879d9749 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4912)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18496) Anchors are not generated based on ZH characters

2020-07-26 Thread Zhilong Hong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165379#comment-17165379
 ] 

Zhilong Hong commented on FLINK-18496:
--

I try to upgrade the version of Jekyll from 3.0.5 to 4.0.1. When I attempt to 
build the website after upgrading, the bundler reports the errors like below:


{code:bash}
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-multiple-languages-2.0.3/lib/jekyll-multiple-languages/multilang.rb:55:in
 `append_data_for_liquid'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-multiple-languages-2.0.3/lib/jekyll-multiple-languages/document.rb:71:in
 `to_liquid'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/drops/document_drop.rb:40:in
 `next'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/drops/drop.rb:47:in
 `public_send'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/drops/drop.rb:47:in
 `[]'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/drops/drop.rb:168:in
 `block in each'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/drops/drop.rb:167:in
 `each'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/drops/drop.rb:167:in
 `each'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/drops/drop.rb:167:in
 `each'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/utils.rb:336:in
 `duplicate_frozen_values'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/utils.rb:44:in 
`deep_merge_hashes!'
/srv/flink-web/.rubydeps/ruby/2.5.0/gems/jekyll-4.0.1/lib/jekyll/utils.rb:29:in 
`deep_merge_hashes'
{code}

I think maybe some docs are incompatible with the plugin 
jekyll-multiple-languages, because flink/docs uses jekyll 4.0.1 and 
jekyll-multiple-language 2.0.3, and it goes well. 

Also I attempt to build the Chinese version only, the anchors are generated 
based on the characters correctly. So I think currently the major challenge is 
how to fix the docs to make them compatible with new jekyll and the plugin 
jekyll-multiple-languages.

I searched how to get the log of jekyll, and I got nothing on the internet. I 
think this makes it harder to solve this issue.

> Anchors are not generated based on ZH characters
> 
>
> Key: FLINK-18496
> URL: https://issues.apache.org/jira/browse/FLINK-18496
> Project: Flink
>  Issue Type: Bug
>  Components: Project Website
>Reporter: Zhu Zhu
>Assignee: Zhilong Hong
>Priority: Major
>  Labels: starter
>
> In ZH version pages of flink-web, the anchors are not generated based on ZH 
> characters. The anchor name would be like 'section-1', 'section-2' if there 
> is no EN characters. An example can be the links in the navigator of 
> https://flink.apache.org/zh/contributing/contribute-code.html
> This makes it impossible to ref an anchor from the content because the anchor 
> name might change unexpectedly if a new section is added.
> Note that it is a problem for flink-web only. The docs generated from the 
> flink repo can properly generate ZH anchors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12958: [FLINK-18625][runtime] Maintain redundant taskmanagers to speed up failover

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12958:
URL: https://github.com/apache/flink/pull/12958#issuecomment-662378507


   
   ## CI report:
   
   * ca4d714f273259e0c1eb42238d455d331ad93996 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4747)
 
   * deb09ce58822dda5b7388c2928b8a119879d9749 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] eogramow edited a comment on pull request #12791: [FLINK-18362][FLINK-13838][yarn] Add yarn.ship-archives to support LocalResourceType.ARCHIVE

2020-07-26 Thread GitBox


eogramow edited a comment on pull request #12791:
URL: https://github.com/apache/flink/pull/12791#issuecomment-664063296


   @kl0u Thanks. The test passed on my centos. I'll check it on ~Travis~Azure. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] eogramow commented on pull request #12791: [FLINK-18362][FLINK-13838][yarn] Add yarn.ship-archives to support LocalResourceType.ARCHIVE

2020-07-26 Thread GitBox


eogramow commented on pull request #12791:
URL: https://github.com/apache/flink/pull/12791#issuecomment-664063296


   @kl0u Thanks. The test passed on my centos. I'll check it on Travis. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Myracle commented on a change in pull request #12958: [FLINK-18625][runtime] Maintain redundant taskmanagers to speed up failover

2020-07-26 Thread GitBox


Myracle commented on a change in pull request #12958:
URL: https://github.com/apache/flink/pull/12958#discussion_r460593997



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManagerImpl.java
##
@@ -299,6 +303,8 @@ public void start(ResourceManagerId newResourceManagerId, 
Executor newMainThread
TimeUnit.MILLISECONDS);
 
registerSlotManagerMetrics();
+
+   allocateRedundantSlots(redundantSlotNum);

Review comment:
   After offline discussion, SlotManagerImpl will not allocate redundant 
taskmanagers at start for that it is hard to know wether there exist enough 
redundant taskmanagers.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-18718) Support for using yarn-cluster CLI parameters on application mode

2020-07-26 Thread Felipe Lolas (Jira)
Felipe Lolas created FLINK-18718:


 Summary: Support for using yarn-cluster CLI parameters on 
application mode
 Key: FLINK-18718
 URL: https://issues.apache.org/jira/browse/FLINK-18718
 Project: Flink
  Issue Type: Improvement
  Components: Command Line Client
Affects Versions: 1.11.1, 1.11.0
 Environment: *Flink 1.11*

*CDH6.2.1*
Reporter: Felipe Lolas


Hi!

Just a minor improv. here:

Now with the new Application Mode, we need to use generic cli configurations 
for setting up YARN deployments.

*Example*
{code:java}
 -Djobmanager.memory.process.size=1024m 
 -Dtaskmanager.memory.process.size=1024m 
 -Dyarn.application.queue=root.test
 -Dyarn.application.name=flink-hbase 
 -Dyarn.application.type=flink {code}
It would be nice to use directly the CLI options available for yarn-cluster. 
This works in run per-job and session mode *but not in application mode.*

 

Thanks

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12872: [FLINK-18550][sql-client] use TableResult#collect to get select result for sql client

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12872:
URL: https://github.com/apache/flink/pull/12872#issuecomment-656681765


   
   ## CI report:
   
   * 71ee80061eebe7ee3dc50b2b73e991cc074d3e26 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4907)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] kl0u commented on pull request #12791: [FLINK-18362][FLINK-13838][yarn] Add yarn.ship-archives to support LocalResourceType.ARCHIVE

2020-07-26 Thread GitBox


kl0u commented on pull request #12791:
URL: https://github.com/apache/flink/pull/12791#issuecomment-664001248


   Hi @eogramow , Azure fails on your PR and the same happens after I rebased 
your branch on the current master. This is why I have not merged it yet. They 
both fail on the Kerberized YARN test. Could you investigate what the root 
cause may be? BTW the master seems to be passing. 
   
   If you want to have also a look, this is my branch with the fix 
https://github.com/kl0u/flink/tree/yarn-archives-rebased



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-18710) ResourceProfileInfo is not serializable

2020-07-26 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-18710.
-
Resolution: Fixed

Fixed via

master: ec8a08eb7aefb837b8c9befddad825ba2459500a
1.11.2: b86ddff71c71452102f3ee908db6c0ddc23240c4

> ResourceProfileInfo is not serializable
> ---
>
> Key: FLINK-18710
> URL: https://issues.apache.org/jira/browse/FLINK-18710
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.2
>
>
> {{ResourceProfileInfo}} should be {{Serializable}} because it is sent as an 
> RPC payload.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] tillrohrmann closed pull request #12991: [FLINK-18710] Make ResourceProfileInfo serializable

2020-07-26 Thread GitBox


tillrohrmann closed pull request #12991:
URL: https://github.com/apache/flink/pull/12991


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12872: [FLINK-18550][sql-client] use TableResult#collect to get select result for sql client

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12872:
URL: https://github.com/apache/flink/pull/12872#issuecomment-656681765


   
   ## CI report:
   
   * 913aaebdeacc913d6b9f8498d55547aa8d480f10 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4525)
 
   * 71ee80061eebe7ee3dc50b2b73e991cc074d3e26 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4907)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-7129) Support dynamically changing CEP patterns

2020-07-26 Thread Konstantin Knauf (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165212#comment-17165212
 ] 

Konstantin Knauf commented on FLINK-7129:
-

[~lukefilwalker] Yes, the complexity of the implementation depends on your 
requirements (mostly complexity of patterns & latency I'd say).  I think, Alex 
provided a simple example of this in his blog post series. With respect to this 
ticket, I am not aware of anyone working on it.

> Support dynamically changing CEP patterns
> -
>
> Key: FLINK-7129
> URL: https://issues.apache.org/jira/browse/FLINK-7129
> Project: Flink
>  Issue Type: New Feature
>  Components: Library / CEP
>Reporter: Dawid Wysakowicz
>Priority: Major
>
> An umbrella task for introducing mechanism for injecting patterns through 
> coStream



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12872: [FLINK-18550][sql-client] use TableResult#collect to get select result for sql client

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12872:
URL: https://github.com/apache/flink/pull/12872#issuecomment-656681765


   
   ## CI report:
   
   * 913aaebdeacc913d6b9f8498d55547aa8d480f10 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4525)
 
   * 71ee80061eebe7ee3dc50b2b73e991cc074d3e26 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-18717) reuse MiniCluster in table integration test class ?

2020-07-26 Thread godfrey he (Jira)
godfrey he created FLINK-18717:
--

 Summary: reuse MiniCluster in table integration test class ? 
 Key: FLINK-18717
 URL: https://issues.apache.org/jira/browse/FLINK-18717
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Legacy Planner, Table SQL / Planner
Affects Versions: 1.11.0
Reporter: godfrey he


before 1.11, {{MiniCluster}} can be reused in each integration test class. (see 
TestStreamEnvironment#setAsContext) 
In 1.11, after we correct the execution behavior of TableEnvironment, 
StreamTableEnvironment and BatchTableEnvironment (see 
[FLINK-16363|https://issues.apache.org/jira/browse/FLINK-16363], 
[FLINK-17126|https://issues.apache.org/jira/browse/FLINK-17126]), MiniCluster 
will be created for each test case even in same test class (see 
{{org.apache.flink.client.deployment.executors.LocalExecutor}}). It's better we 
can reuse {{MiniCluster}} like before. One approach is we provide a new kind of 
 MiniCluster factory (such as: SessionMiniClusterFactory) instead of using  
{{PerJobMiniClusterFactory}}. WDYT ?
  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12992: [FLINK-18716][python][docs] Remove the deprecated execute and insert_into calls in PyFlink Table API docs.

2020-07-26 Thread GitBox


flinkbot edited a comment on pull request #12992:
URL: https://github.com/apache/flink/pull/12992#issuecomment-663982442


   
   ## CI report:
   
   * 7c53366ee4c2dd53f6689ea7caf9f819502d8426 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4906)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12992: [FLINK-18716][python][docs] Remove the deprecated execute and insert_into calls in PyFlink Table API docs.

2020-07-26 Thread GitBox


flinkbot commented on pull request #12992:
URL: https://github.com/apache/flink/pull/12992#issuecomment-663982442


   
   ## CI report:
   
   * 7c53366ee4c2dd53f6689ea7caf9f819502d8426 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12992: [FLINK-18716][python][docs] Remove the deprecated execute and insert_into calls in PyFlink Table API docs.

2020-07-26 Thread GitBox


flinkbot commented on pull request #12992:
URL: https://github.com/apache/flink/pull/12992#issuecomment-663980522


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 7c53366ee4c2dd53f6689ea7caf9f819502d8426 (Sun Jul 26 
12:14:10 UTC 2020)
   
   **Warnings:**
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-18716).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18716) Remove the deprecated "execute" and "insert_into" calls in PyFlink Table API docs

2020-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18716:
---
Labels: pull-request-available  (was: )

> Remove the deprecated "execute" and "insert_into" calls in PyFlink Table API 
> docs
> -
>
> Key: FLINK-18716
> URL: https://issues.apache.org/jira/browse/FLINK-18716
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Affects Versions: 1.11.0
>Reporter: Wei Zhong
>Priority: Minor
>  Labels: pull-request-available
>
> Currently the TableEnvironment#execute and the Table#insert_into is 
> deprecated, but the docs of PyFlink Table API still use them. We should 
> replace them with the recommended API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] WeiZhong94 opened a new pull request #12992: [FLINK-18716][python][docs] Remove the deprecated execute and insert_into calls in PyFlink Table API docs.

2020-07-26 Thread GitBox


WeiZhong94 opened a new pull request #12992:
URL: https://github.com/apache/flink/pull/12992


   ## What is the purpose of the change
   
   *This pull request removes the deprecated execute and insert_into calls in 
PyFlink Table API docs.*
   
   
   ## Brief change log
   
 - *Remove the deprecated execute and insert_into calls in PyFlink Table 
API docs.*
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (docs)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-18716) Remove the deprecated "execute" and "insert_into" calls in PyFlink Table API docs

2020-07-26 Thread Wei Zhong (Jira)
Wei Zhong created FLINK-18716:
-

 Summary: Remove the deprecated "execute" and "insert_into" calls 
in PyFlink Table API docs
 Key: FLINK-18716
 URL: https://issues.apache.org/jira/browse/FLINK-18716
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Affects Versions: 1.11.0
Reporter: Wei Zhong


Currently the TableEnvironment#execute and the Table#insert_into is deprecated, 
but the docs of PyFlink Table API still use them. We should replace them with 
the recommended API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] Myracle commented on a change in pull request #12958: [FLINK-18625][runtime] Maintain redundant taskmanagers to speed up failover

2020-07-26 Thread GitBox


Myracle commented on a change in pull request #12958:
URL: https://github.com/apache/flink/pull/12958#discussion_r460508736



##
File path: 
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java
##
@@ -67,6 +67,12 @@
"for streaming workloads, which may fail if there are 
not enough slots. Note that this configuration option does not take " +
"effect for standalone clusters, where how many slots 
are allocated is not controlled by Flink.");
 
+   public static final ConfigOption REDUNDANT_SLOT_NUM = 
ConfigOptions

Review comment:
   Got it. I will expose slotmanager.redundant-taskmanager-num to users. In 
flink, the calculation will be based on slots.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Myracle commented on a change in pull request #12958: [FLINK-18625][runtime] Maintain redundant taskmanagers to speed up failover

2020-07-26 Thread GitBox


Myracle commented on a change in pull request #12958:
URL: https://github.com/apache/flink/pull/12958#discussion_r460508614



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManagerImplTest.java
##
@@ -976,6 +976,7 @@ public void testTaskManagerTimeoutDoesNotRemoveSlots() 
throws Exception {
 
try (final SlotManager slotManager = createSlotManagerBuilder()
.setTaskManagerTimeout(taskManagerTimeout)
+   .setRedundantSlotNum(0)

Review comment:
   If default value is changed for certain organization or special 
situation, developers need to fix this test. Unit test should be run in any 
valid config. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue

2020-07-26 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165194#comment-17165194
 ] 

Yun Tang commented on FLINK-18712:
--

[~lio_sy], do you have a simple job to reproduce this and what happened if we 
move this job to YARN as it could also kill container once memory exceed the 
memory limit. I ask this is because I just wonder whether k8s would take os 
cache into account as the memory usage for that pod.

To know how much memory used in RocksDB, there existed two ways:
 # Turn on block cache memory usage metrics: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#state-backend-rocksdb-metrics-block-cache-usage]
 when managed memory for RocksDB is enabled. And remember that all rocksDB 
instances within one slot would report the same value.
 # Use jemalloc and jeprof to see how much memory allocated from os by RocksDB, 
this is much more precious than 1st solution, and you could refer to 
[https://github.com/jeffgriffith/native-jvm-leaks/#going-native-with-jemalloc] 
to re-build and pre load related ".so" file. For flink, you could refer to 
[https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#forwarding-environment-variables]
 to know how to pass environment variables {{LD_PRELOAD and }}{{MALLOC_CONF}}.

I think by doing this, you could know whether rocksDB has used too much 
unexpected memory.

 

 

> Flink RocksDB statebackend memory leak issue 
> -
>
> Key: FLINK-18712
> URL: https://issues.apache.org/jira/browse/FLINK-18712
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.10.0
>Reporter: Farnight
>Priority: Critical
>
> When using RocksDB as our statebackend, we found it will lead to memory leak 
> when restarting job (manually or in recovery case).
>  
> How to reproduce:
>  # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and 
> reproduce.
>  # start a job using RocksDB statebackend.
>  # when the RocksDB blockcache reachs maximum size, restart the job. and 
> monitor the memory usage (k8s pod working set) of the TM.
>  # go through step 2-3 few more times. and memory will keep raising.
>  
> Any solution or suggestion for this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-18663) Fix Flink On YARN AM not exit

2020-07-26 Thread tartarus (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165176#comment-17165176
 ] 

tartarus edited comment on FLINK-18663 at 7/26/20, 7:48 AM:


[~trohrmann] hello, you said the {{terminationFuture}} is 
{{AbstractHandler.terminationFuture}} ?

{{AbstractHandler.terminationFuture}} wait for 
{{AbstractHandler.inFlightRequestTracker}}.


was (Author: tartarus):
[~trohrmann] hello, you said the \{{ terminationFuture }} is \{{ 
AbstractHandler.terminationFuture }} ?

{\{ AbstractHandler.terminationFuture }} wait for \{{ 
AbstractHandler.inFlightRequestTracker }}.

> Fix Flink On YARN AM not exit
> -
>
> Key: FLINK-18663
> URL: https://issues.apache.org/jira/browse/FLINK-18663
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.10.0, 1.10.1, 1.11.0
>Reporter: tartarus
>Assignee: tartarus
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 110.png, 111.png, 
> C49A7310-F932-451B-A203-6D17F3140C0D.png, e18e00dd6664485c2ff55284fe969474.png
>
>
> AbstractHandler throw NPE cause by FlinkHttpObjectAggregator is null
> when rest throw exception, it will do this code
> {code:java}
> private CompletableFuture handleException(Throwable throwable, 
> ChannelHandlerContext ctx, HttpRequest httpRequest) {
>   FlinkHttpObjectAggregator flinkHttpObjectAggregator = 
> ctx.pipeline().get(FlinkHttpObjectAggregator.class);
>   int maxLength = flinkHttpObjectAggregator.maxContentLength() - 
> OTHER_RESP_PAYLOAD_OVERHEAD;
>   if (throwable instanceof RestHandlerException) {
>   RestHandlerException rhe = (RestHandlerException) throwable;
>   String stackTrace = ExceptionUtils.stringifyException(rhe);
>   String truncatedStackTrace = Ascii.truncate(stackTrace, 
> maxLength, "...");
>   if (log.isDebugEnabled()) {
>   log.error("Exception occurred in REST handler.", rhe);
>   } else {
>   log.error("Exception occurred in REST handler: {}", 
> rhe.getMessage());
>   }
>   return HandlerUtils.sendErrorResponse(
>   ctx,
>   httpRequest,
>   new ErrorResponseBody(truncatedStackTrace),
>   rhe.getHttpResponseStatus(),
>   responseHeaders);
>   } else {
>   log.error("Unhandled exception.", throwable);
>   String stackTrace = String.format(" side:%n%s%nEnd of exception on server side>",
>   ExceptionUtils.stringifyException(throwable));
>   String truncatedStackTrace = Ascii.truncate(stackTrace, 
> maxLength, "...");
>   return HandlerUtils.sendErrorResponse(
>   ctx,
>   httpRequest,
>   new ErrorResponseBody(Arrays.asList("Internal server 
> error.", truncatedStackTrace)),
>   HttpResponseStatus.INTERNAL_SERVER_ERROR,
>   responseHeaders);
>   }
> }
> {code}
> but flinkHttpObjectAggregator some case is null,so this will throw NPE,but 
> this method called by  AbstractHandler#respondAsLeader
> {code:java}
> requestProcessingFuture
>   .whenComplete((Void ignored, Throwable throwable) -> {
>   if (throwable != null) {
>   
> handleException(ExceptionUtils.stripCompletionException(throwable), ctx, 
> httpRequest)
>   .whenComplete((Void ignored2, Throwable 
> throwable2) -> finalizeRequestProcessing(finalUploadedFiles));
>   } else {
>   finalizeRequestProcessing(finalUploadedFiles);
>   }
>   });
> {code}
>  the result is InFlightRequestTracker Cannot be cleared.
> so the CompletableFuture does‘t complete that handler's closeAsync returned
> !C49A7310-F932-451B-A203-6D17F3140C0D.png!
> !e18e00dd6664485c2ff55284fe969474.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18663) Fix Flink On YARN AM not exit

2020-07-26 Thread tartarus (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165176#comment-17165176
 ] 

tartarus commented on FLINK-18663:
--

[~trohrmann] hello, you said the \{{ terminationFuture }} is \{{ 
AbstractHandler.terminationFuture }} ?

{\{ AbstractHandler.terminationFuture }} wait for \{{ 
AbstractHandler.inFlightRequestTracker }}.

> Fix Flink On YARN AM not exit
> -
>
> Key: FLINK-18663
> URL: https://issues.apache.org/jira/browse/FLINK-18663
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.10.0, 1.10.1, 1.11.0
>Reporter: tartarus
>Assignee: tartarus
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 110.png, 111.png, 
> C49A7310-F932-451B-A203-6D17F3140C0D.png, e18e00dd6664485c2ff55284fe969474.png
>
>
> AbstractHandler throw NPE cause by FlinkHttpObjectAggregator is null
> when rest throw exception, it will do this code
> {code:java}
> private CompletableFuture handleException(Throwable throwable, 
> ChannelHandlerContext ctx, HttpRequest httpRequest) {
>   FlinkHttpObjectAggregator flinkHttpObjectAggregator = 
> ctx.pipeline().get(FlinkHttpObjectAggregator.class);
>   int maxLength = flinkHttpObjectAggregator.maxContentLength() - 
> OTHER_RESP_PAYLOAD_OVERHEAD;
>   if (throwable instanceof RestHandlerException) {
>   RestHandlerException rhe = (RestHandlerException) throwable;
>   String stackTrace = ExceptionUtils.stringifyException(rhe);
>   String truncatedStackTrace = Ascii.truncate(stackTrace, 
> maxLength, "...");
>   if (log.isDebugEnabled()) {
>   log.error("Exception occurred in REST handler.", rhe);
>   } else {
>   log.error("Exception occurred in REST handler: {}", 
> rhe.getMessage());
>   }
>   return HandlerUtils.sendErrorResponse(
>   ctx,
>   httpRequest,
>   new ErrorResponseBody(truncatedStackTrace),
>   rhe.getHttpResponseStatus(),
>   responseHeaders);
>   } else {
>   log.error("Unhandled exception.", throwable);
>   String stackTrace = String.format(" side:%n%s%nEnd of exception on server side>",
>   ExceptionUtils.stringifyException(throwable));
>   String truncatedStackTrace = Ascii.truncate(stackTrace, 
> maxLength, "...");
>   return HandlerUtils.sendErrorResponse(
>   ctx,
>   httpRequest,
>   new ErrorResponseBody(Arrays.asList("Internal server 
> error.", truncatedStackTrace)),
>   HttpResponseStatus.INTERNAL_SERVER_ERROR,
>   responseHeaders);
>   }
> }
> {code}
> but flinkHttpObjectAggregator some case is null,so this will throw NPE,but 
> this method called by  AbstractHandler#respondAsLeader
> {code:java}
> requestProcessingFuture
>   .whenComplete((Void ignored, Throwable throwable) -> {
>   if (throwable != null) {
>   
> handleException(ExceptionUtils.stripCompletionException(throwable), ctx, 
> httpRequest)
>   .whenComplete((Void ignored2, Throwable 
> throwable2) -> finalizeRequestProcessing(finalUploadedFiles));
>   } else {
>   finalizeRequestProcessing(finalUploadedFiles);
>   }
>   });
> {code}
>  the result is InFlightRequestTracker Cannot be cleared.
> so the CompletableFuture does‘t complete that handler's closeAsync returned
> !C49A7310-F932-451B-A203-6D17F3140C0D.png!
> !e18e00dd6664485c2ff55284fe969474.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-18202) Introduce Protobuf format

2020-07-26 Thread Zou (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165170#comment-17165170
 ] 

Zou edited comment on FLINK-18202 at 7/26/20, 7:31 AM:
---

Hi [~libenchao] , here's my point:
 1.1 I think it is convenient for both users and devs if we could derive table 
schema according to proto definition.

1.2 I prefer 'compiled proto class' , and maybe we should restrict the protobuf 
version used to compile the class.

1.3 Shall we do a simple performance test for both options? I am inclined to 
DynamicMessage if there is no significant performance difference.

Futhermore, maybe we should discuss the default value for miss fields in 
protobuf (such as oneof fields), should we use `null` or PB default value to 
handle this. I prefer the former.


was (Author: frankzou):
Hi [~libenchao] , here's my point:
 1.1 I think it is convenient for both users and devs if we could derive table 
schema according to proto definition.

1.2 I prefer 'compiled proto class' , and maybe we should restrict the protobuf 
version used to compile the class.

1.3 Shall we do a simple performance test for both options? I am inclined to 
DynamicMessage if there is no significant performance difference.

Futher more, maybe we shoule discuss the default value for miss fields in 
protobuf (such as oneof fields), should we use `null` or PB default value to 
handle this. I prefer the former.

> Introduce Protobuf format
> -
>
> Key: FLINK-18202
> URL: https://issues.apache.org/jira/browse/FLINK-18202
> Project: Flink
>  Issue Type: New Feature
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / API
>Reporter: Benchao Li
>Priority: Major
> Attachments: image-2020-06-15-17-18-03-182.png
>
>
> PB[1] is a very famous and wildly used (de)serialization framework. The ML[2] 
> also has some discussions about this. It's a useful feature.
> This issue maybe needs some designs, or a FLIP.
> [1] [https://developers.google.com/protocol-buffers]
> [2] [http://apache-flink.147419.n8.nabble.com/Flink-SQL-UDF-td3725.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18202) Introduce Protobuf format

2020-07-26 Thread Zou (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165170#comment-17165170
 ] 

Zou commented on FLINK-18202:
-

Hi [~libenchao] , here's my point:
 1.1 I think it is convenient for both users and devs if we could derive table 
schema according to proto definition.

1.2 I prefer 'compiled proto class' , and maybe we should restrict the protobuf 
version used to compile the class.

1.3 Shall we do a simple performance test for both options? I am inclined to 
DynamicMessage if there is no significant performance difference.

Futher more, maybe we shoule discuss the default value for miss fields in 
protobuf (such as oneof fields), should we use `null` or PB default value to 
handle this. I prefer the former.

> Introduce Protobuf format
> -
>
> Key: FLINK-18202
> URL: https://issues.apache.org/jira/browse/FLINK-18202
> Project: Flink
>  Issue Type: New Feature
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / API
>Reporter: Benchao Li
>Priority: Major
> Attachments: image-2020-06-15-17-18-03-182.png
>
>
> PB[1] is a very famous and wildly used (de)serialization framework. The ML[2] 
> also has some discussions about this. It's a useful feature.
> This issue maybe needs some designs, or a FLIP.
> [1] [https://developers.google.com/protocol-buffers]
> [2] [http://apache-flink.147419.n8.nabble.com/Flink-SQL-UDF-td3725.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)