Re: [DISCUSS] Flink Kerberos Improvement

2018-12-17 Thread Shuyi Chen
Hi Rong, thanks a lot for the proposal. Currently, Flink assume the keytab
is located in a remote DFS. Pre-installing Keytabs statically in YARN node
local filesystem is a common approach, so I think we should support this
mode in Flink natively. As an optimazation to reduce the KDC access
frequency, we should also support method 3 (the DT approach) as discussed
in [1]. A question is that why do we need to implement impersonation in
Flink? I assume the superuser can do the impersonation for 'joe' and 'joe'
can then invoke Flink client to deploy the job. Thanks a lot.

Shuyi

[1]
https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6TZi3FGf2Dm6-i8/edit

On Mon, Dec 17, 2018 at 5:49 PM Rong Rong  wrote:

> Hi All,
>
> We have been experimenting integration of Kerberos with Flink in our Corp
> environment and found out some limitations on the current Flink-Kerberos
> security mechanism running with Apache YARN.
>
> Based on the Hadoop Kerberos security guide [1]. Apparently there are only
> a subset of the suggested long-running service security mechanism is
> supported in Flink. Furthermore, the current model does not work well with
> superuser impersonating actual users [2] for deployment purposes, which is
> a widely adopted way to launch application in corp environments.
>
> We would like to propose an improvement [3] to introduce the other comment
> methods [1] for securing long-running application on YARN and enable
> impersonation mode. Any comments and suggestions are highly appreciated.
>
> Many thanks,
> Rong
>
> [1]
>
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html#Securing_Long-lived_YARN_Services
> [2]
>
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html
> [3]
>
> https://docs.google.com/document/d/1rBLCpyQKg6Ld2P0DEgv4VIOMTwv4sitd7h7P5r202IE/edit?usp=sharing
>


-- 
"So you have to trust that the dots will somehow connect in your future."


Re: [VOTE] Release 1.7.1, release candidate #2

2018-12-17 Thread jincheng sun
Hi Chesnay,

+1(non-binding)

- run `mvn clean install` locally with success
- run all TestCases(both java and scala), except NettyEpollITCase in the
flink-test module in IDEA locally with success (errors has checked in
FLINK-11178)
hequn Chen, tison and myself had check the test case error in
https://issues.apache.org/jira/browse/FLINK-11178.  All of them are not the
blocker for release-1.7.1.
- Remove the FLINK-10543 from the Improvement list of release-note,due to
the problem of compatibility.
- sh bin/start-cluster.sh and submit both stream and batch word-count
example with 1 parallelism. [success]


Cheers,
Jincheng

Chesnay Schepler  于2018年12月17日周一 上午3:39写道:

> Hi everyone,
> Please review and vote on the release candidate #2 for the version
> 1.7.1, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to
> be deployed to dist.apache.org [2], which are signed with the key with
> fingerprint 11D464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.7.1-rc2" [5].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Chesnay
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344412
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.7.1-rc2/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1198
> [5]
>
> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=refs/tags/release-1.7.1-rc2
>
>
>


[VOTE] Release 1.5.6, release candidate #1

2018-12-17 Thread Thomas Weise
Hi everyone,
Please review and vote on the release candidate #1 for the version
1.5.6, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to
be deployed to dist.apache.org [2], which are signed with the key with
fingerprint D920A98C [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.5.6-rc1" [5].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Thomas

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344315
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.5.6-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1199/
[5]
https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=refs/tags/release-1.5.6-rc1


Re: [DISCUSS] Flink SQL DDL Design

2018-12-17 Thread Jark Wu
Hi Timo,

I think I get your point why it would be better to put Table Update Mode in
MVP. But because this is a sophisticated problem, we need to think about it
carefully and need some discussions offline. We will reach out to here when
we have a clear design.


8). Support row/map/array data type
Do you mean how to distinguish int[] and Integer[]?  Yes, maybe we need to
support NULL/NOT NULL just for array elements, such as: ARRAY
is int[], ARRAY is Integer[].


Cheers,
Jark

On Fri, 14 Dec 2018 at 19:46, Timo Walther  wrote:

> Hi all,
>
> I think we should discuss what we consider an MVP DDL. For me, an MVP
> DDL was to just focus on a CREATE TABLE statement. It would be great to
> come up with a solution that finally solves the issue of connecting
> different kind of systems. One reason why we postponed DDL statements
> for quite some time is that we cannot change it easily once released.
>
> However, the current state of the discussion can be summarized by the
> following functionality:
>
> 1. Only support append source tables (because the distinction of
> update/retract table is not clear).
> 2. Only support append and update sink tables (because a changeflag is
> missing).
> 3. Don't support outputting to Kafka with time attributes (because we
> cannot set a timestamp).
>
> Personally, I would like to have more use cases enabled by solving the
> header timestamps and change flag discussion. And I don't see a reason
> why we have to rush here.
>
> 8). Support row/map/array data type
> How do we want to support object arrays vs. primitive arrays? Currently,
> we need to make this clear distinction for between external system and
> Java [1] (E.g. byte[] arrays vs. object arrays) and users can choose
> between Types.PRIMITIVE_ARRAY and Types.OBJECT_ARRAY. Otherwise we need
> to support NULL/NOT NULL for array elements.
>
> 4) Event-Time Attributes and Watermarks
> I completely agree with Rong here. `ts AS SYSTEMROWTIME()` indicates
> that the system takes care of this column and for unification this would
> mean both for sources and sinks. It is still a computed column but gives
> hints to connectors. Implementing connectors can choose if they want to
> use this hint or not. The Flink Kafka connector would make use of it.
> @Jark: I think a PERSISTED keyword would confuse users (as shown by your
> Stackoverflow question) and would only make sense for SYSTEMROWTIME and
> no other computed column.
>
> 3) SOURCE / SINK / BOTH
> @Jark: My initial suggestion was to make the SOURCE/SINK optional such
> that users can only use CREATE TABLE depending on the use case. But as I
> said before, since I cannot find support here, we can drop the keywords.
>
> 7) Table Update Mode
> @Jark: The questions that you posted are exactly the ones that we should
> find an answer for. Because a DDL should just be the front end to the
> characteristics of an engine. After thinking about it again a change
> flag is actually more similar to a PARTITION BY clause because it
> defines a field that is not in the table's schema but in the schema of
> the physical format. However, the columns defined by a PARTITION BY are
> shown when describing/projecting a table whereas a change flag column
> must not be shown.
>
> If a table source supports append, upserts, and retractions, we need a
> way to express how we want to connect to the system.
>
> hasPrimaryKey() && !hasChangeFlag() -> append mode
> hasPrimaryKey() && hasChangeFlag() -> upsert mode
> !hasPrimaryKey() && hasChangeFlag() -> retract mode
>
> Are we fine with this?
>
> Regarding reading `topic`, `partition`, `offset` or custom properties
> from message headers. I already discussed this in my unified connector
> document. We don't need built-in functions for all these properties.
> Those things depend on the connector and format, it is their
> responsibility to extend the table schema in order to expose those
> properties (e.g. by providing a Map for all these kind
> of properties).
>
> Example:
>
> CREATE TABLE myTopic (
>  col1 INT,
>  col2 VARCHAR,
>  col3 MAP,
>  col4 AS SYSTEMROWTIME()
> )
> PARTITION BY (col0 LONG)
> WITH (
>connector.type = kafka
>format.type = key-value-metadata
>format.key-format.type = avro
>format.value-format.type = json
> )
>
> The format defines to use a KeyedDeserializationSchema that extends the
> schema by a metadata column. The PARTITION BY declares the columns for
> Kafka's key in Avro format. col1 till col2 are Kafka's JSON columns.
>
> Thanks for your feedback,
> Timo
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#type-strings
>
>
> Am 13.12.18 um 09:50 schrieb Jark Wu:
> > Hi all,
> >
> > Here are a bunch of my thoughts:
> >
> > 8). support row/map/array data type
> > That's fine with me if we want to support them in the MVP. In my mind, we
> > can have the field type syntax like this:
> >
> > ```
> > filedType ::=
> >  {
> >  

[jira] [Created] (FLINK-11188) Bounded over should not enable state retention time

2018-12-17 Thread Hequn Cheng (JIRA)
Hequn Cheng created FLINK-11188:
---

 Summary: Bounded over should not enable state retention time 
 Key: FLINK-11188
 URL: https://issues.apache.org/jira/browse/FLINK-11188
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Reporter: Hequn Cheng
Assignee: Hequn Cheng


As discussed in FLINK-11172, time-based operations (GROUP BY windows, OVER 
windows, time-windowed join, etc.) are inherently bound by time and 
automatically clean up their state. We should not add state cleanup or TTL for 
these operators.

If I understand correctly, we should not add the retention logic for 
rows-bounded operations either. I think we should disable state retention logic 
for:
 - ProcTimeBoundedRangeOver
 - ProcTimeBoundedRowsOver
 - RowTimeBoundedRangeOver
 - RowTimeBoundedRowsOver

Any suggestions are appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Flink Kerberos Improvement

2018-12-17 Thread Rong Rong
Hi All,

We have been experimenting integration of Kerberos with Flink in our Corp
environment and found out some limitations on the current Flink-Kerberos
security mechanism running with Apache YARN.

Based on the Hadoop Kerberos security guide [1]. Apparently there are only
a subset of the suggested long-running service security mechanism is
supported in Flink. Furthermore, the current model does not work well with
superuser impersonating actual users [2] for deployment purposes, which is
a widely adopted way to launch application in corp environments.

We would like to propose an improvement [3] to introduce the other comment
methods [1] for securing long-running application on YARN and enable
impersonation mode. Any comments and suggestions are highly appreciated.

Many thanks,
Rong

[1]
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html#Securing_Long-lived_YARN_Services
[2]
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html
[3]
https://docs.google.com/document/d/1rBLCpyQKg6Ld2P0DEgv4VIOMTwv4sitd7h7P5r202IE/edit?usp=sharing


Re: StreamingFileSink causing AmazonS3Exception

2018-12-17 Thread Addison Higham
Issue opened here: https://issues.apache.org/jira/browse/FLINK-11187

On Mon, Dec 17, 2018 at 2:37 PM Addison Higham  wrote:

> Oh this is timely!
>
> I hope I can save you some pain Kostas! (cc-ing to flink dev to get
> feedback there for what I believe to be a confirmed bug)
>
>
> I was just about to open up a flink issue for this after digging (really)
> deep and figuring out the issue over the weekend.
>
> The problem arises due the flink hands input streams to the
> S3AccessHelper. If you turn on debug logs for s3, you will eventually see
> this stack trace:
>
> 2018-12-17 05:55:46,546 DEBUG
> org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient  -
> FYI: failed to reset content inputstream before throwing up
> java.io.IOException: Resetting to invalid mark
>   at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.lastReset(AmazonHttpClient.java:1145)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1070)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3306)
>   at
> org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3291)
>   at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1576)
>   at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$uploadPart$8(WriteOperationHelper.java:474)
>   at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
>   at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
>   at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
>   at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
>   at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:231)
>   at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.retry(WriteOperationHelper.java:123)
>   at
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.uploadPart(WriteOperationHelper.java:471)
>   at
> org.apache.flink.fs.s3hadoop.HadoopS3AccessHelper.uploadPart(HadoopS3AccessHelper.java:74)
>   at
> org.apache.flink.fs.s3.common.writer.RecoverableMultiPartUploadImpl$UploadTask.run(RecoverableMultiPartUploadImpl.java:319)
>   at
> org.apache.flink.fs.s3.common.utils.BackPressuringExecutor$SemaphoreReleasingRunnable.run(BackPressuringExecutor.java:92)
>   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>
> From this, you can see that for (some reason) AWS fails to write a
> multi-part chunk and then tries to reset the input stream in order to retry
> but fails (because the InputStream is not mark-able)
>
> That exception is swallowed (it seems like it should be raised up to
> client, but isn't for an unknown reason). The s3-client then tries to
> repeat the request using it's built in retry logic, however, because the
> InputStream is consumed
> and has no more bytes 

[jira] [Created] (FLINK-11187) StreamingFileSink with S3 backend transient socket timeout issues

2018-12-17 Thread Addison Higham (JIRA)
Addison Higham created FLINK-11187:
--

 Summary: StreamingFileSink with S3 backend transient socket 
timeout issues 
 Key: FLINK-11187
 URL: https://issues.apache.org/jira/browse/FLINK-11187
 Project: Flink
  Issue Type: Bug
  Components: FileSystem, Streaming Connectors
Affects Versions: 1.7.0, 1.7.1
Reporter: Addison Higham
Assignee: Addison Higham
 Fix For: 1.7.2


When using the StreamingFileSink with S3A backend, occasionally, errors like 
this will occur:
{noformat}
Caused by: 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
 Your socket connection to the server was not read from or written to within 
the timeout period. Idle connections will be closed. (Service: Amazon S3; 
Status Code: 400; Error Code: RequestTimeout; Request ID: xxx; S3 Extended 
Request ID: xxx, S3 Extended Request ID: xxx
   at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
   at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
   at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056){noformat}
This causes a restart of flink job, which is often able to recover from, but 
under heavy load, this can become very frequent.

 

Turning on debug logs you can find the following relevant stack trace:
{noformat}
2018-12-17 05:55:46,546 DEBUG 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient  - FYI: 
failed to reset content inputstream before throwing up
java.io.IOException: Resetting to invalid mark
  at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.lastReset(AmazonHttpClient.java:1145)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1070)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3306)
  at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3291)
  at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1576)
  at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$uploadPart$8(WriteOperationHelper.java:474)
  at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
  at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
  at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
  at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
  at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:231)
  at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.retry(WriteOperationHelper.java:123)
  at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.uploadPart(WriteOperationHelper.java:471)
  at 

Re: StreamingFileSink causing AmazonS3Exception

2018-12-17 Thread Addison Higham
Oh this is timely!

I hope I can save you some pain Kostas! (cc-ing to flink dev to get
feedback there for what I believe to be a confirmed bug)


I was just about to open up a flink issue for this after digging (really)
deep and figuring out the issue over the weekend.

The problem arises due the flink hands input streams to the S3AccessHelper.
If you turn on debug logs for s3, you will eventually see this stack trace:

2018-12-17 05:55:46,546 DEBUG
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient  -
FYI: failed to reset content inputstream before throwing up
java.io.IOException: Resetting to invalid mark
  at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:112)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.lastReset(AmazonHttpClient.java:1145)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1070)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3306)
  at
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3291)
  at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1576)
  at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$uploadPart$8(WriteOperationHelper.java:474)
  at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
  at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
  at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
  at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
  at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:231)
  at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.retry(WriteOperationHelper.java:123)
  at
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.uploadPart(WriteOperationHelper.java:471)
  at
org.apache.flink.fs.s3hadoop.HadoopS3AccessHelper.uploadPart(HadoopS3AccessHelper.java:74)
  at
org.apache.flink.fs.s3.common.writer.RecoverableMultiPartUploadImpl$UploadTask.run(RecoverableMultiPartUploadImpl.java:319)
  at
org.apache.flink.fs.s3.common.utils.BackPressuringExecutor$SemaphoreReleasingRunnable.run(BackPressuringExecutor.java:92)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)

>From this, you can see that for (some reason) AWS fails to write a
multi-part chunk and then tries to reset the input stream in order to retry
but fails (because the InputStream is not mark-able)

That exception is swallowed (it seems like it should be raised up to
client, but isn't for an unknown reason). The s3-client then tries to
repeat the request using it's built in retry logic, however, because the
InputStream is consumed
and has no more bytes to write, we never fill up the expected
content-length that the s3 put request is expecting. Eventually, after it
hits the max number of retries, it fails and you get the error above.

I just started running a fix for this (which is a hack not the real
solution) here:

[jira] [Created] (FLINK-11186) Support for event-time balancing for multiple Kafka comsumer partitions

2018-12-17 Thread Tom Schamberger (JIRA)
Tom Schamberger created FLINK-11186:
---

 Summary: Support for event-time balancing for multiple Kafka 
comsumer partitions
 Key: FLINK-11186
 URL: https://issues.apache.org/jira/browse/FLINK-11186
 Project: Flink
  Issue Type: New Feature
  Components: DataStream API, Kafka Connector
Reporter: Tom Schamberger


Currently, it is not possible with Flink to back-pressure individual Kafka 
partitions, which are faster in terms of event-time. This leads to unnecessary 
memory consumption and can lead to deadlocks in the case of back-pressure.

When multiple Kafka topics are consumed, succeeding event-time window operators 
have to wait until the last Kafka partition has produced a sufficient watermark 
to be triggered. If individual Kafka partitions differ in read performance or 
the event-time of messages within partitions is not monotonically distributed, 
this can lead to a situation, where 'fast' partitions (event-time makes fast 
progress) outperform slower partitions until back-pressuring prevents all 
partitions from being further consumed. This leads to a deadlock of the 
application.

I suggest, that windows should be able to back-pressure individual partitions, 
which progress faster in terms of event-time, so that slow partitions can keep 
up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11185) StreamSourceOperatorWatermarksTest failed with InterruptedException

2018-12-17 Thread Biao Liu (JIRA)
Biao Liu created FLINK-11185:


 Summary: StreamSourceOperatorWatermarksTest failed with 
InterruptedException
 Key: FLINK-11185
 URL: https://issues.apache.org/jira/browse/FLINK-11185
 Project: Flink
  Issue Type: Test
Reporter: Biao Liu
 Fix For: 1.8.0


{{StreamSourceOperatorWatermarksTest}} may be an unstable case. It failed in 
travis integration test.

See details in: [https://api.travis-ci.org/v3/job/467336457/log.txt]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11184) Rework TableSource and TableSink interfaces

2018-12-17 Thread Timo Walther (JIRA)
Timo Walther created FLINK-11184:


 Summary: Rework TableSource and TableSink interfaces
 Key: FLINK-11184
 URL: https://issues.apache.org/jira/browse/FLINK-11184
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


There are a couple of shortcomings with the current {{TableSource}} and 
{{TableSink}} interface design. Some of the issues are covered in a [basic 
design 
document|https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit#heading=h.41fd6rs7b3cf]
 that was published a while ago.

The design document has not been updated for some time and partially overlaps 
with the [current SQL DDL 
discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-td25006.html]
 for the {{CREATE TABLE}} statement on the ML.

What needs to be solved:
- How to unify sources and sinks in regards of schema and time attributes?
- How to define watermarks, timestamp extractors or timestamp ingestion?
- How to define primary keys and partitioning keys?
- How to differentiate between update modes for tables (i.e. how to read from a 
append, retraction, or update table)?
- How to express all of the above without pulling in to many dependencies on 
other Flink modules if source and sink interfaces are located in 
{{flink-table-spi}} package?

As of the current state of the discussion, it seems that we might extend 
{{TableSchema}} to allow for returning the information above and remove current 
interfaces such as {{DefinedRowtimeAttribute}} or {{DefinedFieldMapping}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Releasing Flink 1.6.3

2018-12-17 Thread Tzu-Li (Gordon) Tai
Thanks.

I'll proceed to create a release candidate for 1.6.3 now.
Hopefully we can have the release out by the end of this week / early next
week.

Cheers,
Gordon

On Thu, Dec 13, 2018 at 9:36 AM Jeff Zhang  wrote:

> +1 for a more stable flink 1.6 release.
>
> Hequn Cheng  于2018年12月13日周四 上午9:32写道:
>
> > Hi Gordon,
> >
> > Thanks for the discussion!
> > +1 for creating the 1.6.3 release. It would be nice if we have a new fix
> > for the previous release.
> >
> > Best, Hequn
> >
> > On Wed, Dec 12, 2018 at 11:05 PM Till Rohrmann 
> > wrote:
> >
> > > Thanks for starting this discussion Gordon. +1 for creating the 1.6.3
> > > release since we have already quite some good fixes in the release-1.6
> > > branch.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Dec 12, 2018 at 12:58 PM vino yang 
> > wrote:
> > >
> > > > Hi Gordon,
> > > >
> > > > +1 to release Flink 1.6.3.
> > > >
> > > > Best,
> > > > Vino
> > > >
> > > > Tzu-Li (Gordon) Tai  于2018年12月12日周三 下午7:35写道:
> > > >
> > > > > Hi Flink community,
> > > > >
> > > > > I would like to ask what you think about releasing 1.6.3.
> > > > >
> > > > > There's a few critical fixes in the 1.6 branch that users would
> > benefit
> > > > > from with another bugfix release for the series.
> > > > >
> > > > > Some are already mentioned by Till in the recent 1.5.6 release
> > > discussion
> > > > > thread, and their fixes are also present in the 1.6 branch [1] -
> > > > > Serializer duplication problems: FLINK-10839, FLINK-10639
> > > > > Fixing retraction: FLINK-10674
> > > > > Deadlock during spilling data in SpillableSubpartition: FLINK-10491
> > > > >
> > > > > Some 1.6 specific fixes are -
> > > > > Problem with restoring 1.5 broadcast state: FLINK-11087
> > > > > Problem with restoring FlinkKafkaProducer: FLINK-10353
> > > > > LockableTypeSerializer duplication problem: FLINK-10816
> > > > >
> > > > > Another issue where the fix isn't merged yet, but we should
> probably
> > > > > include is FLINK-11083 (problem with restoring table state, PR is
> > > > > available).
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Cheers,
> > > > > Gordon
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Creating-last-bug-fix-release-for-1-5-branch-td25794.html
> > > > >
> > > >
> > >
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: [VOTE] Release 1.7.1, release candidate #2

2018-12-17 Thread jincheng sun
Hi Chesnay,

Thanks for bring up the 1.7.1-rc2 release vote!

I had do some test check as follows:

1. Local source code compilation [mvn clean install] [success]
2. Run all TestCases(both java and scala),except NettyEpollITCase in the
flink-test module in IDEA [find some errors]
3. Remove the FLINK-10543 from the Improvement list of release-note,due to
the problem of compatibility.
4. sh bin/start-cluster.sh and submit both stream and batch word-count
example with 1 parallelism. [success]

About 4 test error are find when run all the flink-test. I have had created
the JIRAs an will check them one by one, Of course, anyone is welcome to
help check. The detail please see :
https://issues.apache.org/jira/browse/FLINK-11178

Cheers,
Jincheng



Chesnay Schepler  于2018年12月17日周一 上午3:39写道:

> Hi everyone,
> Please review and vote on the release candidate #2 for the version
> 1.7.1, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to
> be deployed to dist.apache.org [2], which are signed with the key with
> fingerprint 11D464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.7.1-rc2" [5].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Chesnay
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344412
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.7.1-rc2/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1198
> [5]
>
> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=refs/tags/release-1.7.1-rc2
>
>
>


[jira] [Created] (FLINK-11183) Flink 1.7 wrong memory graphite metrics

2018-12-17 Thread Mario Georgiev (JIRA)
 Mario Georgiev created FLINK-11183:
---

 Summary: Flink 1.7 wrong memory graphite metrics
 Key: FLINK-11183
 URL: https://issues.apache.org/jira/browse/FLINK-11183
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.7.0
Reporter:  Mario Georgiev


Hello, 

After upgrading from flink 1.6 to flink 1.7 graphite metrics for memory started 
reporting wrong numbers. All the jobs are reporting the same memory used. 

Was there any major change to the metrics collection in flink-graphite-1.7.0? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11182) JobManagerHACheckpointRecoveryITCase need be improved

2018-12-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-11182:
---

 Summary: JobManagerHACheckpointRecoveryITCase need be improved
 Key: FLINK-11182
 URL: https://issues.apache.org/jira/browse/FLINK-11182
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: sunjincheng


警告: An exception was thrown by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been 
shutdown
 at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
 at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
 at 
org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
 at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
 at 
org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
 at 
org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
 at 
org.jboss.netty.channel.DefaultChannelPipeline.execute(DefaultChannelPipeline.java:636)
 at org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
 at 
org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
 at 
org.jboss.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:658)
 at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:781)
 at 
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
 at 
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
 at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:784)
 at 
org.jboss.netty.channel.SimpleChannelHandler.disconnectRequested(SimpleChannelHandler.java:320)
 at 
org.jboss.netty.channel.SimpleChannelHandler.handleDownstream(SimpleChannelHandler.java:274)
 at 
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
 at 
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
 at org.jboss.netty.channel.Channels.disconnect(Channels.java:781)
 at org.jboss.netty.channel.AbstractChannel.disconnect(AbstractChannel.java:219)
 at 
akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:241)
 at 
akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:240)
 at scala.util.Success.foreach(Try.scala:236)
 at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
 at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
 at 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
 at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
 at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
 at 
akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
 at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
 at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
 at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11180) ProcessFailureCancelingITCase#testCancelingOnProcessFailure

2018-12-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-11180:
---

 Summary: 
ProcessFailureCancelingITCase#testCancelingOnProcessFailure
 Key: FLINK-11180
 URL: https://issues.apache.org/jira/browse/FLINK-11180
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: sunjincheng


tag: release-1.7.1-rc2

org.apache.flink.util.FlinkException: Could not create the 
DispatcherResourceManagerComponent.

at 
org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:242)
 at 
org.apache.flink.test.recovery.ProcessFailureCancelingITCase.testCancelingOnProcessFailure(ProcessFailureCancelingITCase.java:148)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
 at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
 at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
 at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
 at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.net.BindException: Address already in use
 at sun.nio.ch.Net.bind0(Native Method)
 at sun.nio.ch.Net.bind(Net.java:433)
 at sun.nio.ch.Net.bind(Net.java:425)
 at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
 at 
org.apache.flink.shaded.netty4.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
 at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
 at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
 at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11181) SimpleRecoveryITCaseBase test error

2018-12-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-11181:
---

 Summary: SimpleRecoveryITCaseBase test error
 Key: FLINK-11181
 URL: https://issues.apache.org/jira/browse/FLINK-11181
 Project: Flink
  Issue Type: Sub-task
Reporter: sunjincheng


Run many times always fail.

at 
org.apache.flink.test.recovery.SimpleRecoveryITCaseBase.executeAndRunAssertions(SimpleRecoveryITCaseBase.java:124)
 at 
org.apache.flink.test.recovery.SimpleRecoveryITCaseBase.testRestart(SimpleRecoveryITCaseBase.java:150)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
 at org.junit.runners.Suite.runChild(Suite.java:128)
 at org.junit.runners.Suite.runChild(Suite.java:27)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
 at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
 at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
 at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
 at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11179) JoinCancelingITCase#testCancelSortMatchWhileDoingHeavySorting test error

2018-12-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-11179:
---

 Summary:  
JoinCancelingITCase#testCancelSortMatchWhileDoingHeavySorting test error
 Key: FLINK-11179
 URL: https://issues.apache.org/jira/browse/FLINK-11179
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: sunjincheng


tag: release-1.7.1-rc2

java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find 
Flink job (f8abcfa2bf2f9bf13024075e51891d2e)

at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
 at 
org.apache.flink.client.program.MiniClusterClient.cancel(MiniClusterClient.java:118)
 at 
org.apache.flink.test.cancelling.CancelingTestBase.runAndCancelJob(CancelingTestBase.java:109)
 at 
org.apache.flink.test.cancelling.JoinCancelingITCase.executeTaskWithGenerator(JoinCancelingITCase.java:94)
 at 
org.apache.flink.test.cancelling.JoinCancelingITCase.testCancelSortMatchWhileDoingHeavySorting(JoinCancelingITCase.java:99)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
 at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
 at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
 at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
 at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could 
not find Flink job (f8abcfa2bf2f9bf13024075e51891d2e)
 at 
org.apache.flink.runtime.dispatcher.Dispatcher.getJobMasterGatewayFuture(Dispatcher.java:766)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11178) Check flink-test module for 1.7.1-rc2

2018-12-17 Thread sunjincheng (JIRA)
sunjincheng created FLINK-11178:
---

 Summary: Check flink-test module for 1.7.1-rc2
 Key: FLINK-11178
 URL: https://issues.apache.org/jira/browse/FLINK-11178
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: sunjincheng


Will create some sub JIRAs for flink-test test error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11177) Node health tracker

2018-12-17 Thread ryantaocer (JIRA)
ryantaocer created FLINK-11177:
--

 Summary: Node health tracker
 Key: FLINK-11177
 URL: https://issues.apache.org/jira/browse/FLINK-11177
 Project: Flink
  Issue Type: Sub-task
  Components: JobManager
Reporter: ryantaocer


* A NodeHealthTracker module to monitor the status of nodes including machine 
states and task manager workload.
 * It will be scoring the task managers to help task scheduling avoiding the 
problematic nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: CEP - Support for multiple pattern

2018-12-17 Thread Dawid Wysakowicz
Hi Jiayi,

There is a ticket[1] for supporting dynamic patterns which I would say
is super set of what you are actually suggesting.

[1]https://issues.apache.org/jira/browse/FLINK-7129

On 17/12/2018 03:17, fudian.fd wrote:
> Hi Jiayi,
>
> As far as I know, there is no plan to support this feature. But I think it 
> may be a very useful feature as it can eliminate the redundant network 
> transmission compared to multiple operators to support multiple patterns. You 
> can create an issue and we can discuss further about it on the JIRA page.  CC 
> @Dawid
>
> Regards,
> Dian
>
>> 在 2018年12月15日,下午5:07,bupt_ljy  写道:
>>
>> Hi, all
>> It’s actually very common that we construct more than one rule on the same 
>> data source. And I’m developing some such kind of features for our 
>> businesses and some ideas come up.
>>
>>
>> Do we have any plans for supporting multiple patterns in CEP?
>>
>>
>> Best,
>> Jiayi Liao



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-11176) Improve the harness tests to use the code-generated operator

2018-12-17 Thread Dian Fu (JIRA)
Dian Fu created FLINK-11176:
---

 Summary: Improve the harness tests to use the code-generated 
operator
 Key: FLINK-11176
 URL: https://issues.apache.org/jira/browse/FLINK-11176
 Project: Flink
  Issue Type: Test
  Components: Table API  SQL
Reporter: Dian Fu
Assignee: Dian Fu


As a follow up work of FLINK-11074, we need to update all the harness test to 
use code-generated operator instead of hard-coded ones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11175) The sample code describing the Processing time used in TableSource is incorrect

2018-12-17 Thread vinoyang (JIRA)
vinoyang created FLINK-11175:


 Summary: The sample code describing the Processing time used in 
TableSource is incorrect
 Key: FLINK-11175
 URL: https://issues.apache.org/jira/browse/FLINK-11175
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Table API  SQL
Reporter: vinoyang
Assignee: TANG Wen-hui


[https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/streaming/time_attributes.html#using-a-tablesource]

In the sample code, we must explicitly specify the field name and type of the 
processing time attribute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)