[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605279#comment-16605279
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

lvhuyen commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input 
format
URL: https://github.com/apache/flink/pull/6483#issuecomment-418967608
 
 
   @HuangZhenQiu 
   Here is the schema of that parquet file, printed in Zeppelin.
   > root
   >  |-- metrics_date: timestamp (nullable = true)
   >  |-- counter: long (nullable = true)
   >  |-- meter: double (nullable = true)
   >  |-- customer_id: string (nullable = true)
   I also attach that sample file here: 
[https://github.com/lvhuyen/flink/blob/parquet_input_format(7243)/flink-formats/flink-parquet/src/test/resources/test.parquet](url
   )
   
   I tried to debug in IntelliJ, that column is in fact stored as primitive 
type int96 (not 64), and as Apache's 
[https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnReaderImpl.java](url),
 int96 is treated as a String (line 274). The way they converted from ByteArray 
into a String at line 393 of 
[https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/io/api/Binary.java](url)
 seems to be irreversible and leads to data loss (my data has metrics_date = 
2018-09-01 15:02:55.0, which was read as a bytes array of [0, 118, -95, -103, 
69, 49, 0, 0, -5, -126, 37, 0]. After that line 393, I got a string with length 
= 12 which has the same character at 3, 4, 9, and 10th position. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API  SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] lvhuyen commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-09-05 Thread GitBox
lvhuyen commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input 
format
URL: https://github.com/apache/flink/pull/6483#issuecomment-418967608
 
 
   @HuangZhenQiu 
   Here is the schema of that parquet file, printed in Zeppelin.
   > root
   >  |-- metrics_date: timestamp (nullable = true)
   >  |-- counter: long (nullable = true)
   >  |-- meter: double (nullable = true)
   >  |-- customer_id: string (nullable = true)
   I also attach that sample file here: 
[https://github.com/lvhuyen/flink/blob/parquet_input_format(7243)/flink-formats/flink-parquet/src/test/resources/test.parquet](url
   )
   
   I tried to debug in IntelliJ, that column is in fact stored as primitive 
type int96 (not 64), and as Apache's 
[https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnReaderImpl.java](url),
 int96 is treated as a String (line 274). The way they converted from ByteArray 
into a String at line 393 of 
[https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/io/api/Binary.java](url)
 seems to be irreversible and leads to data loss (my data has metrics_date = 
2018-09-01 15:02:55.0, which was read as a bytes array of [0, 118, -95, -103, 
69, 49, 0, 0, -5, -126, 37, 0]. After that line 393, I got a string with length 
= 12 which has the same character at 3, 4, 9, and 10th position. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605208#comment-16605208
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-418951266
 
 
   @lvhuyen 
   Thanks for using parquet input format and give me feedback. So the type 
timstamp is logic type in parquet. It is internally stored as primitive type 
int64. So it should be read out as long. From the error, it looks like the 
timestamp is read as String and try to set to a field of BigInteger. Would you 
please paste me the parquet schema for the file? Thanks
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API  SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-09-05 Thread GitBox
HuangZhenQiu commented on issue #6483: [FLINK-7243][flink-formats] Add parquet 
input format
URL: https://github.com/apache/flink/pull/6483#issuecomment-418951266
 
 
   @lvhuyen 
   Thanks for using parquet input format and give me feedback. So the type 
timstamp is logic type in parquet. It is internally stored as primitive type 
int64. So it should be read out as long. From the error, it looks like the 
timestamp is read as String and try to set to a field of BigInteger. Would you 
please paste me the parquet schema for the file? Thanks
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10273) Access composite type fields after a function

2018-09-05 Thread Rong Rong (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605112#comment-16605112
 ] 

Rong Rong commented on FLINK-10273:
---

I was able to reproduce it on SQL but not Table API. See: 
https://github.com/apache/flink/compare/master...walterddr:test-FLINK-10273
SQL API throws Exception:

{code}
org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered 
"." at line 1, column 19.
Was expecting one of:
 
"ORDER" ...

at 
org.apache.flink.table.calcite.FlinkPlannerImpl.parse(FlinkPlannerImpl.scala:79)
at 
org.apache.flink.table.api.TableEnvironment.sqlQuery(TableEnvironment.scala:649)
at 
org.apache.flink.table.runtime.stream.sql.SqlITCase.testMyTest(SqlITCase.scala:89)

Caused by: org.apache.calcite.sql.parser.SqlParseException: Encountered "." at 
line 1, column 19.
Was expecting one of:
 
"ORDER" ...

at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.convertException(SqlParserImpl.java:347)
at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.normalizeException(SqlParserImpl.java:128)
at 
org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:137)
at org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.java:162)
at 
org.apache.flink.table.calcite.FlinkPlannerImpl.parse(FlinkPlannerImpl.scala:75)
... 30 more
Caused by: org.apache.calcite.sql.parser.impl.ParseException: Encountered "." 
at line 1, column 19.
Was expecting one of:
 
"ORDER" ...

at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.generateParseException(SqlParserImpl.java:23019)
at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.jj_consume_token(SqlParserImpl.java:22836)
at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.SqlStmtEof(SqlParserImpl.java:870)
at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.parseSqlStmtEof(SqlParserImpl.java:184)
at 
org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:130)
... 32 more
{code}

while Scala Table API is perfectly fine getting the field out. 
[~twalthr] does this match your finding? I can take a quick look, it doesn't 
seem to be related to FLINK-10019, but seems familiar when we dealt with 
FLINK-7923.

> Access composite type fields after a function
> -
>
> Key: FLINK-10273
> URL: https://issues.apache.org/jira/browse/FLINK-10273
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Affects Versions: 1.7.0
>Reporter: Timo Walther
>Priority: Major
>
> If a function returns a composite type, for example, {{Row(lon: Float, lat: 
> Float)}}. There is currently no way of accessing fields.
> Both queries fail with exceptions:
> {code}
> select t.c.lat, t.c.lon FROM (select toCoords(12) as c) AS t
> {code}
> {code}
> select toCoords(12).lat
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605106#comment-16605106
 ] 

ASF GitHub Bot commented on FLINK-10205:


isunjin commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418925630
 
 
   I have put the the document here, note that the document not only include 
this issue, but includes other failover improvements: 
   
[https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing](https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing
 )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> ---
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] isunjin commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: InputSplit Fault tolerant for DataSource…

2018-09-05 Thread GitBox
isunjin commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418925630
 
 
   I have put the the document here, note that the document not only include 
this issue, but includes other failover improvements: 
   
[https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing](https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing
 )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-8035) Unable to submit job when HA is enabled

2018-09-05 Thread Jason Kania (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605052#comment-16605052
 ] 

Jason Kania commented on FLINK-8035:


This issue affects more than just submission. Many of the flink command line 
calls also timeout because of this issue. In my case, I am unable to upgrade 
the zookeeper because of other components so I have had to abandon HA mode.

> Unable to submit job when HA is enabled
> ---
>
> Key: FLINK-8035
> URL: https://issues.apache.org/jira/browse/FLINK-8035
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.4.0
> Environment: Mac OS X
>Reporter: Robert Metzger
>Priority: Critical
>
> Steps to reproduce:
> - Get Flink 1.4 (f5a0b4bdfb)
> - Get ZK (3.3.6 in this case)
> - Put the following flink-conf.yaml:
> {code}
> high-availability: zookeeper
> high-availability.storageDir: file:///tmp/flink-ha
> high-availability.zookeeper.quorum: localhost:2181
> high-availability.zookeeper.path.cluster-id: /my-namespace
> {code}
> - Start Flink, submit a job (any streaming example will do)
> The job submission will time out. On the JobManager, it seems that the job 
> submission gets stuck when trying to submit something to Zookeeper.
> In the JM UI, the job will sit there in status "CREATED"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zentol closed pull request #6664: [hotfix][docs] Remove redundant word in javadocs

2018-09-05 Thread GitBox
zentol closed pull request #6664: [hotfix][docs] Remove redundant word in 
javadocs
URL: https://github.com/apache/flink/pull/6664
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ParallelSourceFunction.java
 
b/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ParallelSourceFunction.java
index d3ad0f9220f..591ebccea9f 100644
--- 
a/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ParallelSourceFunction.java
+++ 
b/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ParallelSourceFunction.java
@@ -21,7 +21,7 @@
 
 /**
  * A stream data source that is executed in parallel. Upon execution, the 
runtime will
- * execute as many parallel instances of this function function as configured 
parallelism
+ * execute as many parallel instances of this function as configured 
parallelism
  * of the source.
  *
  * This interface acts only as a marker to tell the system that this source 
may
diff --git 
a/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/RichParallelSourceFunction.java
 
b/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/RichParallelSourceFunction.java
index 94b85b69aba..46c1443664d 100644
--- 
a/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/RichParallelSourceFunction.java
+++ 
b/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/RichParallelSourceFunction.java
@@ -22,7 +22,7 @@
 
 /**
  * Base class for implementing a parallel data source. Upon execution, the 
runtime will
- * execute as many parallel instances of this function function as configured 
parallelism
+ * execute as many parallel instances of this function as configured 
parallelism
  * of the source.
  *
  * The data source has access to context information (such as the number of 
parallel


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (FLINK-9150) Prepare for Java 10

2018-09-05 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-9150.
---
Resolution: Duplicate

The issue raised here is a duplicate of 
https://issues.apache.org/jira/browse/FLINK-10209.

There's also little point in having (and especially bumping) a java 10 issue if 
java 9 isn't supported yet.

> Prepare for Java 10
> ---
>
> Key: FLINK-9150
> URL: https://issues.apache.org/jira/browse/FLINK-9150
> Project: Flink
>  Issue Type: Task
>  Components: Build System
>Reporter: Ted Yu
>Priority: Major
>
> Java 9 is not a LTS release.
> When compiling with Java 10, I see the following compilation error:
> {code}
> [ERROR] Failed to execute goal on project flink-shaded-hadoop2: Could not 
> resolve dependencies for project 
> org.apache.flink:flink-shaded-hadoop2:jar:1.6-SNAPSHOT: Could not find 
> artifact jdk.tools:jdk.tools:jar:1.6 at specified path 
> /a/jdk-10/../lib/tools.jar -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10193) Default RPC timeout is used when triggering savepoint via JobMasterGateway

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604897#comment-16604897
 ] 

ASF GitHub Bot commented on FLINK-10193:


GJL commented on issue #6601: [FLINK-10193] Add @RpcTimeout to 
JobMasterGateway.triggerSavepoint
URL: https://github.com/apache/flink/pull/6601#issuecomment-418868217
 
 
   @jelmerk Should be merged soon. We did not have much time recently due to 
Flink Forward.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default RPC timeout is used when triggering savepoint via JobMasterGateway
> --
>
> Key: FLINK-10193
> URL: https://issues.apache.org/jira/browse/FLINK-10193
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
> Attachments: SlowToCheckpoint.java
>
>
> When calling {{JobMasterGateway#triggerSavepoint(String, boolean, Time)}}, 
> the default timeout is used because the time parameter of the method  is not 
> annotated with {{@RpcTimeout}}. 
> *Expected behavior*
> * timeout for the RPC should be {{RpcUtils.INF_TIMEOUT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] GJL commented on issue #6601: [FLINK-10193] Add @RpcTimeout to JobMasterGateway.triggerSavepoint

2018-09-05 Thread GitBox
GJL commented on issue #6601: [FLINK-10193] Add @RpcTimeout to 
JobMasterGateway.triggerSavepoint
URL: https://github.com/apache/flink/pull/6601#issuecomment-418868217
 
 
   @jelmerk Should be merged soon. We did not have much time recently due to 
Flink Forward.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (FLINK-9150) Prepare for Java 10

2018-09-05 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473198#comment-16473198
 ] 

Ted Yu edited comment on FLINK-9150 at 9/5/18 8:09 PM:
---

Similar error is encountered when building against jdk 11 .


was (Author: yuzhih...@gmail.com):
Similar error is encountered when building against jdk 11.

> Prepare for Java 10
> ---
>
> Key: FLINK-9150
> URL: https://issues.apache.org/jira/browse/FLINK-9150
> Project: Flink
>  Issue Type: Task
>  Components: Build System
>Reporter: Ted Yu
>Priority: Major
>
> Java 9 is not a LTS release.
> When compiling with Java 10, I see the following compilation error:
> {code}
> [ERROR] Failed to execute goal on project flink-shaded-hadoop2: Could not 
> resolve dependencies for project 
> org.apache.flink:flink-shaded-hadoop2:jar:1.6-SNAPSHOT: Could not find 
> artifact jdk.tools:jdk.tools:jar:1.6 at specified path 
> /a/jdk-10/../lib/tools.jar -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] walterddr commented on issue #6521: [FLINK-5315][table] Adding support for distinct operation for table API on DataStream

2018-09-05 Thread GitBox
walterddr commented on issue #6521: [FLINK-5315][table] Adding support for 
distinct operation for table API on DataStream
URL: https://github.com/apache/flink/pull/6521#issuecomment-418836009
 
 
   @fhueske @hequn8128 just updated based on the feedback. Please kindly take 
another look when you have time


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-5315) Support distinct aggregations in table api

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604772#comment-16604772
 ] 

ASF GitHub Bot commented on FLINK-5315:
---

walterddr commented on issue #6521: [FLINK-5315][table] Adding support for 
distinct operation for table API on DataStream
URL: https://github.com/apache/flink/pull/6521#issuecomment-418836009
 
 
   @fhueske @hequn8128 just updated based on the feedback. Please kindly take 
another look when you have time


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support distinct aggregations in table api
> --
>
> Key: FLINK-5315
> URL: https://issues.apache.org/jira/browse/FLINK-5315
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API  SQL
>Reporter: Kurt Young
>Assignee: Rong Rong
>Priority: Major
>  Labels: pull-request-available
>
> Support distinct aggregations in Table API in the following format:
> For Expressions:
> {code:scala}
> 'a.count.distinct // Expressions distinct modifier
> {code}
> For User-defined Function:
> {code:scala}
> singleArgUdaggFunc.distinct('a) // FunctionCall distinct modifier
> multiArgUdaggFunc.distinct('a, 'b) // FunctionCall distinct modifier
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10267) [State] Fix arbitrary iterator access on RocksDBMapIterator

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604673#comment-16604673
 ] 

ASF GitHub Bot commented on FLINK-10267:


Myasuka commented on a change in pull request #6638: [FLINK-10267][State] Fix 
arbitrary iterator access on RocksDBMapIterator
URL: https://github.com/apache/flink/pull/6638#discussion_r215349665
 
 

 ##
 File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java
 ##
 @@ -595,6 +595,8 @@ private void loadCache() {
 */
if (lastEntry != null && !lastEntry.deleted) {
iterator.next();
+   cacheEntries.add(lastEntry);
+   cacheIndex = 1;
 
 Review comment:
   Thanks for your comments, I quite agree with your comments except that I 
prefer to call the `#nextEntry()` returned entry as `currentEntry`, since this 
entry points to the current position before any other `#nextEntry()` is invoked.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [State] Fix arbitrary iterator access on RocksDBMapIterator
> ---
>
> Key: FLINK-10267
> URL: https://issues.apache.org/jira/browse/FLINK-10267
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.3, 1.6.0
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.5.4
>
>
> Currently, RocksDBMapIterator would load 128 entries into local cacheEntries 
> every time if needed. Both RocksDBMapIterator#next() and 
> RocksDBMapIterator#hasNext() action might trigger to load RocksDBEntry into 
> cacheEntries.
> However, if the iterator's size larger than 128 and we continue to access the 
> iterator with following order: hasNext() -> next() -> hasNext() -> remove(), 
> we would meet weird exception when we try to remove the 128th element:
> {code:java}
> java.lang.IllegalStateException: The remove operation must be called after a 
> valid next operation.
> {code}
> Since we could not control user's access on iterator, we should fix this bug 
> to avoid unexpected exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Myasuka commented on a change in pull request #6638: [FLINK-10267][State] Fix arbitrary iterator access on RocksDBMapIterator

2018-09-05 Thread GitBox
Myasuka commented on a change in pull request #6638: [FLINK-10267][State] Fix 
arbitrary iterator access on RocksDBMapIterator
URL: https://github.com/apache/flink/pull/6638#discussion_r215349665
 
 

 ##
 File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBMapState.java
 ##
 @@ -595,6 +595,8 @@ private void loadCache() {
 */
if (lastEntry != null && !lastEntry.deleted) {
iterator.next();
+   cacheEntries.add(lastEntry);
+   cacheIndex = 1;
 
 Review comment:
   Thanks for your comments, I quite agree with your comments except that I 
prefer to call the `#nextEntry()` returned entry as `currentEntry`, since this 
entry points to the current position before any other `#nextEntry()` is invoked.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (FLINK-9172) Support external catalog factory that comes default with SQL-Client

2018-09-05 Thread Eron Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eron Wright  reassigned FLINK-9172:
---

Assignee: Eron Wright   (was: Rong Rong)

> Support external catalog factory that comes default with SQL-Client
> ---
>
> Key: FLINK-9172
> URL: https://issues.apache.org/jira/browse/FLINK-9172
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Rong Rong
>Assignee: Eron Wright 
>Priority: Major
>
> It will be great to have SQL-Client to support some external catalogs 
> out-of-the-box for SQL users to configure and utilize easily. I am currently 
> think of having an external catalog factory that spins up both streaming and 
> batch external catalog table sources and sinks. This could greatly unify and 
> provide easy access for SQL users. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] tragicjun opened a new pull request #6664: [hotfix][docs] Remove redundant word in javadocs

2018-09-05 Thread GitBox
tragicjun opened a new pull request #6664: [hotfix][docs] Remove redundant word 
in javadocs
URL: https://github.com/apache/flink/pull/6664
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10261) INSERT INTO does not work with ORDER BY clause

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604589#comment-16604589
 ] 

ASF GitHub Bot commented on FLINK-10261:


yanghua commented on a change in pull request #6648: [FLINK-10261][table] fix 
insert into with order by
URL: https://github.com/apache/flink/pull/6648#discussion_r215327517
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/stream/sql/validation/SortValidationTest.scala
 ##
 @@ -54,4 +53,12 @@ class SortValidationTest extends TableTestBase {
 val sqlQuery = "SELECT a FROM MyTable LIMIT 3"
 streamUtil.verifySql(sqlQuery, "")
   }
+
+  // test should fail because time is not order field
+  @Test(expected = classOf[TableException])
+  def testNonTimeSorting(): Unit = {
+
 
 Review comment:
   Remove this empty line looks better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> INSERT INTO does not work with ORDER BY clause
> --
>
> Key: FLINK-10261
> URL: https://issues.apache.org/jira/browse/FLINK-10261
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Reporter: Timo Walther
>Assignee: xueyu
>Priority: Major
>  Labels: pull-request-available
>
> It seems that INSERT INTO and ORDER BY do not work well together.
> An AssertionError is thrown and the ORDER BY clause is duplicated. I guess 
> this is a Calcite issue.
> Example:
> {code}
> @Test
>   def testInsertIntoMemoryTable(): Unit = {
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
> val tEnv = TableEnvironment.getTableEnvironment(env)
> MemoryTableSourceSinkUtil.clear()
> val t = StreamTestData.getSmall3TupleDataStream(env)
> .assignAscendingTimestamps(x => x._2)
>   .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
> tEnv.registerTable("sourceTable", t)
> val fieldNames = Array("d", "e", "f", "t")
> val fieldTypes = Array(Types.INT, Types.LONG, Types.STRING, 
> Types.SQL_TIMESTAMP)
>   .asInstanceOf[Array[TypeInformation[_]]]
> val sink = new MemoryTableSourceSinkUtil.UnsafeMemoryAppendTableSink
> tEnv.registerTableSink("targetTable", fieldNames, fieldTypes, sink)
> val sql = "INSERT INTO targetTable SELECT a, b, c, rowtime FROM 
> sourceTable ORDER BY a"
> tEnv.sqlUpdate(sql)
> env.execute()
> {code}
> Error:
> {code}
> java.lang.AssertionError: not a query: SELECT `sourceTable`.`a`, 
> `sourceTable`.`b`, `sourceTable`.`c`, `sourceTable`.`rowtime`
> FROM `sourceTable` AS `sourceTable`
> ORDER BY `a`
> ORDER BY `a`
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3069)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:557)
>   at 
> org.apache.flink.table.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:104)
>   at 
> org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:717)
>   at 
> org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:683)
>   at 
> org.apache.flink.table.runtime.stream.sql.SqlITCase.testInsertIntoMemoryTable(SqlITCase.scala:735)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10261) INSERT INTO does not work with ORDER BY clause

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604591#comment-16604591
 ] 

ASF GitHub Bot commented on FLINK-10261:


yanghua commented on a change in pull request #6648: [FLINK-10261][table] fix 
insert into with order by
URL: https://github.com/apache/flink/pull/6648#discussion_r215327571
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/sql/SortITCase.scala
 ##
 @@ -105,6 +107,36 @@ class SortITCase extends StreamingWithStateTestBase {
   "20")
 assertEquals(expected, SortITCase.testResults)
   }
+
+  @Test
+  def testInsertIntoMemoryTableOrderBy(): Unit = {
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
+val tEnv = TableEnvironment.getTableEnvironment(env)
+MemoryTableSourceSinkUtil.clear()
+
+val t = StreamTestData.getSmall3TupleDataStream(env)
+.assignAscendingTimestamps(x => x._2)
 
 Review comment:
   The "." aligns with the next line looks better?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> INSERT INTO does not work with ORDER BY clause
> --
>
> Key: FLINK-10261
> URL: https://issues.apache.org/jira/browse/FLINK-10261
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Reporter: Timo Walther
>Assignee: xueyu
>Priority: Major
>  Labels: pull-request-available
>
> It seems that INSERT INTO and ORDER BY do not work well together.
> An AssertionError is thrown and the ORDER BY clause is duplicated. I guess 
> this is a Calcite issue.
> Example:
> {code}
> @Test
>   def testInsertIntoMemoryTable(): Unit = {
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
> val tEnv = TableEnvironment.getTableEnvironment(env)
> MemoryTableSourceSinkUtil.clear()
> val t = StreamTestData.getSmall3TupleDataStream(env)
> .assignAscendingTimestamps(x => x._2)
>   .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
> tEnv.registerTable("sourceTable", t)
> val fieldNames = Array("d", "e", "f", "t")
> val fieldTypes = Array(Types.INT, Types.LONG, Types.STRING, 
> Types.SQL_TIMESTAMP)
>   .asInstanceOf[Array[TypeInformation[_]]]
> val sink = new MemoryTableSourceSinkUtil.UnsafeMemoryAppendTableSink
> tEnv.registerTableSink("targetTable", fieldNames, fieldTypes, sink)
> val sql = "INSERT INTO targetTable SELECT a, b, c, rowtime FROM 
> sourceTable ORDER BY a"
> tEnv.sqlUpdate(sql)
> env.execute()
> {code}
> Error:
> {code}
> java.lang.AssertionError: not a query: SELECT `sourceTable`.`a`, 
> `sourceTable`.`b`, `sourceTable`.`c`, `sourceTable`.`rowtime`
> FROM `sourceTable` AS `sourceTable`
> ORDER BY `a`
> ORDER BY `a`
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3069)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:557)
>   at 
> org.apache.flink.table.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:104)
>   at 
> org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:717)
>   at 
> org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:683)
>   at 
> org.apache.flink.table.runtime.stream.sql.SqlITCase.testInsertIntoMemoryTable(SqlITCase.scala:735)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10261) INSERT INTO does not work with ORDER BY clause

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604590#comment-16604590
 ] 

ASF GitHub Bot commented on FLINK-10261:


yanghua commented on a change in pull request #6648: [FLINK-10261][table] fix 
insert into with order by
URL: https://github.com/apache/flink/pull/6648#discussion_r215327606
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/sql/SortITCase.scala
 ##
 @@ -105,6 +107,36 @@ class SortITCase extends StreamingWithStateTestBase {
   "20")
 assertEquals(expected, SortITCase.testResults)
   }
+
+  @Test
+  def testInsertIntoMemoryTableOrderBy(): Unit = {
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
+val tEnv = TableEnvironment.getTableEnvironment(env)
+MemoryTableSourceSinkUtil.clear()
+
+val t = StreamTestData.getSmall3TupleDataStream(env)
+.assignAscendingTimestamps(x => x._2)
+  .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
+tEnv.registerTable("sourceTable", t)
+
+val fieldNames = Array("d", "e", "f", "t")
+val fieldTypes = Array(Types.INT, Types.LONG, Types.STRING, 
Types.SQL_TIMESTAMP)
+  .asInstanceOf[Array[TypeInformation[_]]]
+val sink = new MemoryTableSourceSinkUtil.UnsafeMemoryAppendTableSink
+tEnv.registerTableSink("targetTable", fieldNames, fieldTypes, sink)
+
+val sql = "INSERT INTO targetTable SELECT a, b, c, rowtime " +
+  "FROM sourceTable order by rowtime, a desc"
 
 Review comment:
   SQL keyword (`ORDER BY`) upper case looks better to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> INSERT INTO does not work with ORDER BY clause
> --
>
> Key: FLINK-10261
> URL: https://issues.apache.org/jira/browse/FLINK-10261
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Reporter: Timo Walther
>Assignee: xueyu
>Priority: Major
>  Labels: pull-request-available
>
> It seems that INSERT INTO and ORDER BY do not work well together.
> An AssertionError is thrown and the ORDER BY clause is duplicated. I guess 
> this is a Calcite issue.
> Example:
> {code}
> @Test
>   def testInsertIntoMemoryTable(): Unit = {
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
> val tEnv = TableEnvironment.getTableEnvironment(env)
> MemoryTableSourceSinkUtil.clear()
> val t = StreamTestData.getSmall3TupleDataStream(env)
> .assignAscendingTimestamps(x => x._2)
>   .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
> tEnv.registerTable("sourceTable", t)
> val fieldNames = Array("d", "e", "f", "t")
> val fieldTypes = Array(Types.INT, Types.LONG, Types.STRING, 
> Types.SQL_TIMESTAMP)
>   .asInstanceOf[Array[TypeInformation[_]]]
> val sink = new MemoryTableSourceSinkUtil.UnsafeMemoryAppendTableSink
> tEnv.registerTableSink("targetTable", fieldNames, fieldTypes, sink)
> val sql = "INSERT INTO targetTable SELECT a, b, c, rowtime FROM 
> sourceTable ORDER BY a"
> tEnv.sqlUpdate(sql)
> env.execute()
> {code}
> Error:
> {code}
> java.lang.AssertionError: not a query: SELECT `sourceTable`.`a`, 
> `sourceTable`.`b`, `sourceTable`.`c`, `sourceTable`.`rowtime`
> FROM `sourceTable` AS `sourceTable`
> ORDER BY `a`
> ORDER BY `a`
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3069)
>   at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:557)
>   at 
> org.apache.flink.table.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:104)
>   at 
> org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:717)
>   at 
> org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:683)
>   at 
> org.apache.flink.table.runtime.stream.sql.SqlITCase.testInsertIntoMemoryTable(SqlITCase.scala:735)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> 

[GitHub] yanghua commented on a change in pull request #6648: [FLINK-10261][table] fix insert into with order by

2018-09-05 Thread GitBox
yanghua commented on a change in pull request #6648: [FLINK-10261][table] fix 
insert into with order by
URL: https://github.com/apache/flink/pull/6648#discussion_r215327571
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/sql/SortITCase.scala
 ##
 @@ -105,6 +107,36 @@ class SortITCase extends StreamingWithStateTestBase {
   "20")
 assertEquals(expected, SortITCase.testResults)
   }
+
+  @Test
+  def testInsertIntoMemoryTableOrderBy(): Unit = {
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
+val tEnv = TableEnvironment.getTableEnvironment(env)
+MemoryTableSourceSinkUtil.clear()
+
+val t = StreamTestData.getSmall3TupleDataStream(env)
+.assignAscendingTimestamps(x => x._2)
 
 Review comment:
   The "." aligns with the next line looks better?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on a change in pull request #6648: [FLINK-10261][table] fix insert into with order by

2018-09-05 Thread GitBox
yanghua commented on a change in pull request #6648: [FLINK-10261][table] fix 
insert into with order by
URL: https://github.com/apache/flink/pull/6648#discussion_r215327606
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/sql/SortITCase.scala
 ##
 @@ -105,6 +107,36 @@ class SortITCase extends StreamingWithStateTestBase {
   "20")
 assertEquals(expected, SortITCase.testResults)
   }
+
+  @Test
+  def testInsertIntoMemoryTableOrderBy(): Unit = {
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
+val tEnv = TableEnvironment.getTableEnvironment(env)
+MemoryTableSourceSinkUtil.clear()
+
+val t = StreamTestData.getSmall3TupleDataStream(env)
+.assignAscendingTimestamps(x => x._2)
+  .toTable(tEnv, 'a, 'b, 'c, 'rowtime.rowtime)
+tEnv.registerTable("sourceTable", t)
+
+val fieldNames = Array("d", "e", "f", "t")
+val fieldTypes = Array(Types.INT, Types.LONG, Types.STRING, 
Types.SQL_TIMESTAMP)
+  .asInstanceOf[Array[TypeInformation[_]]]
+val sink = new MemoryTableSourceSinkUtil.UnsafeMemoryAppendTableSink
+tEnv.registerTableSink("targetTable", fieldNames, fieldTypes, sink)
+
+val sql = "INSERT INTO targetTable SELECT a, b, c, rowtime " +
+  "FROM sourceTable order by rowtime, a desc"
 
 Review comment:
   SQL keyword (`ORDER BY`) upper case looks better to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on a change in pull request #6648: [FLINK-10261][table] fix insert into with order by

2018-09-05 Thread GitBox
yanghua commented on a change in pull request #6648: [FLINK-10261][table] fix 
insert into with order by
URL: https://github.com/apache/flink/pull/6648#discussion_r215327517
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/stream/sql/validation/SortValidationTest.scala
 ##
 @@ -54,4 +53,12 @@ class SortValidationTest extends TableTestBase {
 val sqlQuery = "SELECT a FROM MyTable LIMIT 3"
 streamUtil.verifySql(sqlQuery, "")
   }
+
+  // test should fail because time is not order field
+  @Test(expected = classOf[TableException])
+  def testNonTimeSorting(): Unit = {
+
 
 Review comment:
   Remove this empty line looks better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10267) [State] Fix arbitrary iterator access on RocksDBMapIterator

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604586#comment-16604586
 ] 

ASF GitHub Bot commented on FLINK-10267:


Myasuka commented on issue #6638: [FLINK-10267][State] Fix arbitrary iterator 
access on RocksDBMapIterator
URL: https://github.com/apache/flink/pull/6638#issuecomment-418780838
 
 
   @azagrebin Really thanks for your review, and sorry for the late reply since 
I just have a couple busy days after a weekend.
   I'll check your review comments and add new commit soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [State] Fix arbitrary iterator access on RocksDBMapIterator
> ---
>
> Key: FLINK-10267
> URL: https://issues.apache.org/jira/browse/FLINK-10267
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.3, 1.6.0
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.5.4
>
>
> Currently, RocksDBMapIterator would load 128 entries into local cacheEntries 
> every time if needed. Both RocksDBMapIterator#next() and 
> RocksDBMapIterator#hasNext() action might trigger to load RocksDBEntry into 
> cacheEntries.
> However, if the iterator's size larger than 128 and we continue to access the 
> iterator with following order: hasNext() -> next() -> hasNext() -> remove(), 
> we would meet weird exception when we try to remove the 128th element:
> {code:java}
> java.lang.IllegalStateException: The remove operation must be called after a 
> valid next operation.
> {code}
> Since we could not control user's access on iterator, we should fix this bug 
> to avoid unexpected exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Myasuka commented on issue #6638: [FLINK-10267][State] Fix arbitrary iterator access on RocksDBMapIterator

2018-09-05 Thread GitBox
Myasuka commented on issue #6638: [FLINK-10267][State] Fix arbitrary iterator 
access on RocksDBMapIterator
URL: https://github.com/apache/flink/pull/6638#issuecomment-418780838
 
 
   @azagrebin Really thanks for your review, and sorry for the late reply since 
I just have a couple busy days after a weekend.
   I'll check your review comments and add new commit soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604577#comment-16604577
 ] 

ASF GitHub Bot commented on FLINK-10205:


yanghua commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418776835
 
 
   @isunjin It seems that you don't need to have permission to edit the wiki 
right away. You can write the document through Google Doc and give others 
access permission, and then others can discuss and review it. When the 
community feels that the design document is ok, it is not too late to edit the 
wiki. @zentol right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> ---
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yanghua commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: InputSplit Fault tolerant for DataSource…

2018-09-05 Thread GitBox
yanghua commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418776835
 
 
   @isunjin It seems that you don't need to have permission to edit the wiki 
right away. You can write the document through Google Doc and give others 
access permission, and then others can discuss and review it. When the 
community feels that the design document is ok, it is not too late to edit the 
wiki. @zentol right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-9172) Support external catalog factory that comes default with SQL-Client

2018-09-05 Thread Rong Rong (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604542#comment-16604542
 ] 

Rong Rong commented on FLINK-9172:
--

[~eronwright] oh absolutely! Please feel free to assigned it to yourself. We 
had an internal implementation already but it would probably be a while for me 
to parse it out for a PR. 
I think the ultimate question is still how the sql-client configuration is 
going to look like for the {{ExternalCatalog}}. Much appreciate the 
contribution! Please feel free to share the PR.

> Support external catalog factory that comes default with SQL-Client
> ---
>
> Key: FLINK-9172
> URL: https://issues.apache.org/jira/browse/FLINK-9172
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Rong Rong
>Assignee: Rong Rong
>Priority: Major
>
> It will be great to have SQL-Client to support some external catalogs 
> out-of-the-box for SQL users to configure and utilize easily. I am currently 
> think of having an external catalog factory that spins up both streaming and 
> batch external catalog table sources and sinks. This could greatly unify and 
> provide easy access for SQL users. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9735) Potential resource leak in RocksDBStateBackend#getDbOptions

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604528#comment-16604528
 ] 

ASF GitHub Bot commented on FLINK-9735:
---

yanghua commented on issue #6660: [FLINK-9735] Potential resource leak in 
RocksDBStateBackend#getDbOptions
URL: https://github.com/apache/flink/pull/6660#issuecomment-418767941
 
 
   @azagrebin and @StefanRRichter can you review this PR? thanks~


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Potential resource leak in RocksDBStateBackend#getDbOptions
> ---
>
> Key: FLINK-9735
> URL: https://issues.apache.org/jira/browse/FLINK-9735
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> Here is related code:
> {code}
> if (optionsFactory != null) {
>   opt = optionsFactory.createDBOptions(opt);
> }
> {code}
> opt, an DBOptions instance, should be closed before being rewritten.
> getColumnOptions has similar issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yanghua commented on issue #6660: [FLINK-9735] Potential resource leak in RocksDBStateBackend#getDbOptions

2018-09-05 Thread GitBox
yanghua commented on issue #6660: [FLINK-9735] Potential resource leak in 
RocksDBStateBackend#getDbOptions
URL: https://github.com/apache/flink/pull/6660#issuecomment-418767941
 
 
   @azagrebin and @StefanRRichter can you review this PR? thanks~


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] TisonKun commented on issue #6644: [hotfix] check correct parameter

2018-09-05 Thread GitBox
TisonKun commented on issue #6644: [hotfix] check correct parameter
URL: https://github.com/apache/flink/pull/6644#issuecomment-418762782
 
 
   thanks @GJL !
   
   cc @zentol @NicoK 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10287) Flink HA Persist Cancelled Job in Zookeeper

2018-09-05 Thread Sayat Satybaldiyev (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604505#comment-16604505
 ] 

Sayat Satybaldiyev commented on FLINK-10287:


Might be related to https://issues.apache.org/jira/browse/FLINK-10286

> Flink HA Persist Cancelled Job in Zookeeper
> ---
>
> Key: FLINK-10287
> URL: https://issues.apache.org/jira/browse/FLINK-10287
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.6.0
>Reporter: Sayat Satybaldiyev
>Priority: Major
> Attachments: Screenshot from 2018-09-05 16-48-34.png
>
>
> Flink HA persisted canceled job in Zookeeper, which makes HA mode quite 
> fragile. In case JM get restarted, it tries to recover canceled job and after 
> some time fails completely being not able to recover it. 
>  
> How to reproduce:
>  # Have Flink HA 1.6 cluster
>  # Cancel a running flink job
>  # Observe that flink didn't remove ZK metadata.
> !Screenshot from 2018-09-05 16-48-34.png!
> {code:java}
> ls /flink/flink_ns/jobgraphs/46d8d3555936c0d8e6b6ec21cc02bb11
> [7f392fd9-cedc-4978-9186-1f54b98eeeb7]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10287) Flink HA Persist Cancelled Job in Zookeeper

2018-09-05 Thread Sayat Satybaldiyev (JIRA)
Sayat Satybaldiyev created FLINK-10287:
--

 Summary: Flink HA Persist Cancelled Job in Zookeeper
 Key: FLINK-10287
 URL: https://issues.apache.org/jira/browse/FLINK-10287
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.6.0
Reporter: Sayat Satybaldiyev
 Attachments: Screenshot from 2018-09-05 16-48-34.png

Flink HA persisted canceled job in Zookeeper, which makes HA mode quite 
fragile. In case JM get restarted, it tries to recover canceled job and after 
some time fails completely being not able to recover it. 

 

How to reproduce:
 # Have Flink HA 1.6 cluster
 # Cancel a running flink job
 # Observe that flink didn't remove ZK metadata.

!Screenshot from 2018-09-05 16-48-34.png!
{code:java}
ls /flink/flink_ns/jobgraphs/46d8d3555936c0d8e6b6ec21cc02bb11
[7f392fd9-cedc-4978-9186-1f54b98eeeb7]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7964) Add Apache Kafka 1.0/1.1 connectors

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604489#comment-16604489
 ] 

ASF GitHub Bot commented on FLINK-7964:
---

pnowojski commented on issue #6577: [FLINK-7964] Add Apache Kafka 1.0/1.1 
connectors
URL: https://github.com/apache/flink/pull/6577#issuecomment-418755545
 
 
   I’m not sure if auto detecting support for transactions is worth the 
trouble. It might be as easy as switching to at least once mode, but it might 
be just easier to support only Kafka 0.11+ with this unified connector. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Apache Kafka 1.0/1.1 connectors
> ---
>
> Key: FLINK-7964
> URL: https://issues.apache.org/jira/browse/FLINK-7964
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.4.0
>Reporter: Hai Zhou
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> Kafka 1.0.0 is no mere bump of the version number. The Apache Kafka Project 
> Management Committee has packed a number of valuable enhancements into the 
> release. Here is a summary of a few of them:
> * Since its introduction in version 0.10, the Streams API has become hugely 
> popular among Kafka users, including the likes of Pinterest, Rabobank, 
> Zalando, and The New York Times. In 1.0, the the API continues to evolve at a 
> healthy pace. To begin with, the builder API has been improved (KIP-120). A 
> new API has been added to expose the state of active tasks at runtime 
> (KIP-130). The new cogroup API makes it much easier to deal with partitioned 
> aggregates with fewer StateStores and fewer moving parts in your code 
> (KIP-150). Debuggability gets easier with enhancements to the print() and 
> writeAsText() methods (KIP-160). And if that’s not enough, check out KIP-138 
> and KIP-161 too. For more on streams, check out the Apache Kafka Streams 
> documentation, including some helpful new tutorial videos.
> * Operating Kafka at scale requires that the system remain observable, and to 
> make that easier, we’ve made a number of improvements to metrics. These are 
> too many to summarize without becoming tedious, but Connect metrics have been 
> significantly improved (KIP-196), a litany of new health check metrics are 
> now exposed (KIP-188), and we now have a global topic and partition count 
> (KIP-168). Check out KIP-164 and KIP-187 for even more.
> * We now support Java 9, leading, among other things, to significantly faster 
> TLS and CRC32C implementations. Over-the-wire encryption will be faster now, 
> which will keep Kafka fast and compute costs low when encryption is enabled.
> * In keeping with the security theme, KIP-152 cleans up the error handling on 
> Simple Authentication Security Layer (SASL) authentication attempts. 
> Previously, some authentication error conditions were indistinguishable from 
> broker failures and were not logged in a clear way. This is cleaner now.
> * Kafka can now tolerate disk failures better. Historically, JBOD storage 
> configurations have not been recommended, but the architecture has 
> nevertheless been tempting: after all, why not rely on Kafka’s own 
> replication mechanism to protect against storage failure rather than using 
> RAID? With KIP-112, Kafka now handles disk failure more gracefully. A single 
> disk failure in a JBOD broker will not bring the entire broker down; rather, 
> the broker will continue serving any log files that remain on functioning 
> disks.
> * Since release 0.11.0, the idempotent producer (which is the producer used 
> in the presence of a transaction, which of course is the producer we use for 
> exactly-once processing) required max.in.flight.requests.per.connection to be 
> equal to one. As anyone who has written or tested a wire protocol can attest, 
> this put an upper bound on throughput. Thanks to KAFKA-5949, this can now be 
> as large as five, relaxing the throughput constraint quite a bit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] pnowojski commented on issue #6577: [FLINK-7964] Add Apache Kafka 1.0/1.1 connectors

2018-09-05 Thread GitBox
pnowojski commented on issue #6577: [FLINK-7964] Add Apache Kafka 1.0/1.1 
connectors
URL: https://github.com/apache/flink/pull/6577#issuecomment-418755545
 
 
   I’m not sure if auto detecting support for transactions is worth the 
trouble. It might be as easy as switching to at least once mode, but it might 
be just easier to support only Kafka 0.11+ with this unified connector. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10286) Flink Persist Invalid Job Graph in Zookeeper

2018-09-05 Thread Sayat Satybaldiyev (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604477#comment-16604477
 ] 

Sayat Satybaldiyev commented on FLINK-10286:


[~GJL] I believe it causes the error mentioned in the mail list. As when flink 
tried to recover the misconfigured job the whole cluster failed.  

 

http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/HA-stand-alone-cluster-error-td20380.html

> Flink Persist Invalid Job Graph in Zookeeper
> 
>
> Key: FLINK-10286
> URL: https://issues.apache.org/jira/browse/FLINK-10286
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.6.0
>Reporter: Sayat Satybaldiyev
>Priority: Major
>
> In HA mode Flink 1.6, Flink persist job graph in Zookpeer even if the job was 
> not accepted by Job Manager. This particularly bad as later if JM dies and 
> restarts JM tries to recover the job and obviously fails and dies completely.
>  
> How to reproduce:
> 1. Have HA Flink cluster 1.6
> 2. Submit invalid job, in my case I'm put invalid file schema for rocksdb 
> state backed
>  
>  
> {code:java}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
> env.enableCheckpointing(5000);
> RocksDBStateBackend backend = new 
> RocksDBStateBackend("hddd:///tmp/flink/rocksdb");
> backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
> env.setStateBackend(backend);
> {code}
>  
> Client returns:
>  
>  
> {code:java}
> The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: Could not submit 
> job (JobID: 9680f02ae2f3806c3b4da25bfacd0749)
> {code}
>  
>  
> JM does not accept job, this truncated error log from JM:
>  
>  
> {code:java}
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
> submit job.
> ... 24 more
> Caused by: java.util.concurrent.CompletionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobExecutionException: Could not set up 
> JobManager
>  
> Caused by: java.lang.RuntimeException: Failed to start checkpoint ID counter: 
> Could not find a file system implementation for scheme 'hddd'. The scheme is 
> not directly supported by Flink and no Hadoop file system to support this 
> scheme could be loaded.
> {code}
>  
>  
>  
> 4. Go to ZK and observe that JM has saved job to ZK
> ls /flink/flink_ns/jobgraphs/9680f02ae2f3806c3b4da25bfacd0749
>  [7f392fd9-cedc-4978-9186-1f54b98eeeb7]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10286) Flink Persist Invalid Job Graph in Zookeeper

2018-09-05 Thread Sayat Satybaldiyev (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sayat Satybaldiyev updated FLINK-10286:
---
Description: 
In HA mode Flink 1.6, Flink persist job graph in Zookpeer even if the job was 
not accepted by Job Manager. This particularly bad as later if JM dies and 
restarts JM tries to recover the job and obviously fails and dies completely.

 

How to reproduce:

1. Have HA Flink cluster 1.6

2. Submit invalid job, in my case I'm put invalid file schema for rocksdb state 
backed

 

 
{code:java}
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
env.enableCheckpointing(5000);
RocksDBStateBackend backend = new 
RocksDBStateBackend("hddd:///tmp/flink/rocksdb");
backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
env.setStateBackend(backend);
{code}
 

Client returns:

 

 
{code:java}
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Could not submit 
job (JobID: 9680f02ae2f3806c3b4da25bfacd0749)
{code}
 

 

JM does not accept job, this truncated error log from JM:

 

 
{code:java}
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
submit job.
... 24 more
Caused by: java.util.concurrent.CompletionException: 
java.lang.RuntimeException: 
org.apache.flink.runtime.client.JobExecutionException: Could not set up 
JobManager
 
Caused by: java.lang.RuntimeException: Failed to start checkpoint ID counter: 
Could not find a file system implementation for scheme 'hddd'. The scheme is 
not directly supported by Flink and no Hadoop file system to support this 
scheme could be loaded.
{code}
 

 

 

4. Go to ZK and observe that JM has saved job to ZK

ls /flink/flink_ns/jobgraphs/9680f02ae2f3806c3b4da25bfacd0749
 [7f392fd9-cedc-4978-9186-1f54b98eeeb7]

  was:
In HA mode Flink 1.6, Flink persist job graph in Zookpeer even if the job was 
not accepted by Job Manager. This particularly bad as later if JM dies and 
restarts JM tries to recover the job and obviously fails and dies completely.

 

How to reproduce:

1. Have HA Flink cluster 1.6

2. Submit invalid job, in my case I'm put invalid file schema for rocksdb state 
backed

```

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
env.enableCheckpointing(5000);
RocksDBStateBackend backend = new 
RocksDBStateBackend("hddd:///tmp/flink/rocksdb");

backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
env.setStateBackend(backend);

```

Client returns:

```

The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Could not submit 
job (JobID: 9680f02ae2f3806c3b4da25bfacd0749)

```

JM does not accept job, this truncated error log from JM:

```

Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
submit job.
... 24 more
Caused by: java.util.concurrent.CompletionException: 
java.lang.RuntimeException: 
org.apache.flink.runtime.client.JobExecutionException: Could not set up 
JobManager

 

Caused by: java.lang.RuntimeException: Failed to start checkpoint ID counter: 
Could not find a file system implementation for scheme 'hddd'. The scheme is 
not directly supported by Flink and no Hadoop file system to support this 
scheme could be loaded.

 

```

4. Go to ZK and observe that JM has saved job to ZK

ls /flink/flink_ns/jobgraphs/9680f02ae2f3806c3b4da25bfacd0749
[7f392fd9-cedc-4978-9186-1f54b98eeeb7]


> Flink Persist Invalid Job Graph in Zookeeper
> 
>
> Key: FLINK-10286
> URL: https://issues.apache.org/jira/browse/FLINK-10286
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.6.0
>Reporter: Sayat Satybaldiyev
>Priority: Major
>
> In HA mode Flink 1.6, Flink persist job graph in Zookpeer even if the job was 
> not accepted by Job Manager. This particularly bad as later if JM dies and 
> restarts JM tries to recover the job and obviously fails and dies completely.
>  
> How to reproduce:
> 1. Have HA Flink cluster 1.6
> 2. Submit invalid job, in my case I'm put invalid file schema for rocksdb 
> state backed
>  
>  
> {code:java}
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
> env.enableCheckpointing(5000);
> RocksDBStateBackend backend = new 
> RocksDBStateBackend("hddd:///tmp/flink/rocksdb");
> backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
> env.setStateBackend(backend);
> {code}
>  
> Client returns:
>  
>  
> {code:java}
> The program finished with the 

[jira] [Created] (FLINK-10286) Flink Persist Invalid Job Graph in Zookeeper

2018-09-05 Thread Sayat Satybaldiyev (JIRA)
Sayat Satybaldiyev created FLINK-10286:
--

 Summary: Flink Persist Invalid Job Graph in Zookeeper
 Key: FLINK-10286
 URL: https://issues.apache.org/jira/browse/FLINK-10286
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.6.0
Reporter: Sayat Satybaldiyev


In HA mode Flink 1.6, Flink persist job graph in Zookpeer even if the job was 
not accepted by Job Manager. This particularly bad as later if JM dies and 
restarts JM tries to recover the job and obviously fails and dies completely.

 

How to reproduce:

1. Have HA Flink cluster 1.6

2. Submit invalid job, in my case I'm put invalid file schema for rocksdb state 
backed

```

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
env.enableCheckpointing(5000);
RocksDBStateBackend backend = new 
RocksDBStateBackend("hddd:///tmp/flink/rocksdb");

backend.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);
env.setStateBackend(backend);

```

Client returns:

```

The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Could not submit 
job (JobID: 9680f02ae2f3806c3b4da25bfacd0749)

```

JM does not accept job, this truncated error log from JM:

```

Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
submit job.
... 24 more
Caused by: java.util.concurrent.CompletionException: 
java.lang.RuntimeException: 
org.apache.flink.runtime.client.JobExecutionException: Could not set up 
JobManager

 

Caused by: java.lang.RuntimeException: Failed to start checkpoint ID counter: 
Could not find a file system implementation for scheme 'hddd'. The scheme is 
not directly supported by Flink and no Hadoop file system to support this 
scheme could be loaded.

 

```

4. Go to ZK and observe that JM has saved job to ZK

ls /flink/flink_ns/jobgraphs/9680f02ae2f3806c3b4da25bfacd0749
[7f392fd9-cedc-4978-9186-1f54b98eeeb7]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10193) Default RPC timeout is used when triggering savepoint via JobMasterGateway

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604432#comment-16604432
 ] 

ASF GitHub Bot commented on FLINK-10193:


jelmerk commented on issue #6601: [FLINK-10193] Add @RpcTimeout to 
JobMasterGateway.triggerSavepoint
URL: https://github.com/apache/flink/pull/6601#issuecomment-418739345
 
 
   Any ETA on this ? for non trivial jobs with a lot of state this is critical 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default RPC timeout is used when triggering savepoint via JobMasterGateway
> --
>
> Key: FLINK-10193
> URL: https://issues.apache.org/jira/browse/FLINK-10193
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.5.3, 1.6.0, 1.7.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
> Attachments: SlowToCheckpoint.java
>
>
> When calling {{JobMasterGateway#triggerSavepoint(String, boolean, Time)}}, 
> the default timeout is used because the time parameter of the method  is not 
> annotated with {{@RpcTimeout}}. 
> *Expected behavior*
> * timeout for the RPC should be {{RpcUtils.INF_TIMEOUT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jelmerk commented on issue #6601: [FLINK-10193] Add @RpcTimeout to JobMasterGateway.triggerSavepoint

2018-09-05 Thread GitBox
jelmerk commented on issue #6601: [FLINK-10193] Add @RpcTimeout to 
JobMasterGateway.triggerSavepoint
URL: https://github.com/apache/flink/pull/6601#issuecomment-418739345
 
 
   Any ETA on this ? for non trivial jobs with a lot of state this is critical 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604402#comment-16604402
 ] 

ASF GitHub Bot commented on FLINK-10205:


isunjin commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418730339
 
 
   I have my document ready, but I don’t have permission to edit the wiki, 
Stephan, could you grant me the permission?
   
   
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals 
 
   
   Jin
   
   > On Sep 5, 2018, at 8:47 PM, Chesnay Schepler  
wrote:
   > 
   > In the JIRA a design document was explicitly requested. Is this accessible 
anywhere?
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub 
, or mute the 
thread 
.
   > 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> ---
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] isunjin commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: InputSplit Fault tolerant for DataSource…

2018-09-05 Thread GitBox
isunjin commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418730339
 
 
   I have my document ready, but I don’t have permission to edit the wiki, 
Stephan, could you grant me the permission?
   
   
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals 
 
   
   Jin
   
   > On Sep 5, 2018, at 8:47 PM, Chesnay Schepler  
wrote:
   > 
   > In the JIRA a design document was explicitly requested. Is this accessible 
anywhere?
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub 
, or mute the 
thread 
.
   > 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zentol commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: InputSplit Fault tolerant for DataSource…

2018-09-05 Thread GitBox
zentol commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418717373
 
 
   In the JIRA a design document was explicitly requested. Is this accessible 
anywhere?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604351#comment-16604351
 ] 

ASF GitHub Bot commented on FLINK-10205:


zentol commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418717042
 
 
   @isunjin Please fill out the PR template.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> ---
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604353#comment-16604353
 ] 

ASF GitHub Bot commented on FLINK-10205:


zentol commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418717373
 
 
   In the JIRA a design document was explicitly requested. Is this accessible 
anywhere?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> ---
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager
>Reporter: JIN SUN
>Assignee: JIN SUN
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zentol commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: InputSplit Fault tolerant for DataSource…

2018-09-05 Thread GitBox
zentol commented on issue #6657: [FLINK-10205] [JobManager] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6657#issuecomment-418717042
 
 
   @isunjin Please fill out the PR template.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-10285) Bump shade-plugin to 3.1.1

2018-09-05 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-10285:
-
Summary: Bump shade-plugin to 3.1.1  (was: Bum shade-plugin to 3.1.1)

> Bump shade-plugin to 3.1.1
> --
>
> Key: FLINK-10285
> URL: https://issues.apache.org/jira/browse/FLINK-10285
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.7.0
>
>
> The {{maven-shade-plugin}} fails on java 9. The official java9-compatible 
> version is 3.1.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10285) Bum shade-plugin to 3.1.1

2018-09-05 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-10285:


 Summary: Bum shade-plugin to 3.1.1
 Key: FLINK-10285
 URL: https://issues.apache.org/jira/browse/FLINK-10285
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Affects Versions: 1.7.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.7.0


The {{maven-shade-plugin}} fails on java 9. The official java9-compatible 
version is 3.1.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10284) TumblingEventTimeWindows's offset should can be less than zero.

2018-09-05 Thread Jiayi Liao (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liao updated FLINK-10284:
---
Component/s: Streaming

> TumblingEventTimeWindows's offset should can be less than zero.
> ---
>
> Key: FLINK-10284
> URL: https://issues.apache.org/jira/browse/FLINK-10284
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Major
>
> My goal is to create a window from 0am to the next day's 0am within GMT+8 
> timezone, so I choose to use TumblingEventTimeWindows.of(Time.of(1, 
> TimeUnit.DAYS)), which uses UTC timezone to implement the range of "day". And 
> it doesn't allow me to write an offset -8 in the TumblingEventTimeWindows, 
> which can help me fix the offset of timezone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10284) TumblingEventTimeWindows's offset should can be less than zero.

2018-09-05 Thread Jiayi Liao (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liao updated FLINK-10284:
---
Affects Version/s: 1.6.0

> TumblingEventTimeWindows's offset should can be less than zero.
> ---
>
> Key: FLINK-10284
> URL: https://issues.apache.org/jira/browse/FLINK-10284
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Major
>
> My goal is to create a window from 0am to the next day's 0am within GMT+8 
> timezone, so I choose to use TumblingEventTimeWindows.of(Time.of(1, 
> TimeUnit.DAYS)), which uses UTC timezone to implement the range of "day". And 
> it doesn't allow me to write an offset -8 in the TumblingEventTimeWindows, 
> which can help me fix the offset of timezone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-10284) TumblingEventTimeWindows's offset should can be less than zero.

2018-09-05 Thread Jiayi Liao (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liao reassigned FLINK-10284:
--

Assignee: Jiayi Liao

> TumblingEventTimeWindows's offset should can be less than zero.
> ---
>
> Key: FLINK-10284
> URL: https://issues.apache.org/jira/browse/FLINK-10284
> Project: Flink
>  Issue Type: Bug
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Major
>
> My goal is to create a window from 0am to the next day's 0am within GMT+8 
> timezone, so I choose to use TumblingEventTimeWindows.of(Time.of(1, 
> TimeUnit.DAYS)), which uses UTC timezone to implement the range of "day". And 
> it doesn't allow me to write an offset -8 in the TumblingEventTimeWindows, 
> which can help me fix the offset of timezone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10284) TumblingEventTimeWindows's offset should can be less than zero.

2018-09-05 Thread Jiayi Liao (JIRA)
Jiayi Liao created FLINK-10284:
--

 Summary: TumblingEventTimeWindows's offset should can be less than 
zero.
 Key: FLINK-10284
 URL: https://issues.apache.org/jira/browse/FLINK-10284
 Project: Flink
  Issue Type: Bug
Reporter: Jiayi Liao


My goal is to create a window from 0am to the next day's 0am within GMT+8 
timezone, so I choose to use TumblingEventTimeWindows.of(Time.of(1, 
TimeUnit.DAYS)), which uses UTC timezone to implement the range of "day". And 
it doesn't allow me to write an offset -8 in the TumblingEventTimeWindows, 
which can help me fix the offset of timezone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10111) Test failure because of HAQueryableStateFsBackendITCase#testMapState throws NPE

2018-09-05 Thread lyndldeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lyndldeng updated FLINK-10111:
--
Summary: Test failure because of 
HAQueryableStateFsBackendITCase#testMapState throws NPE  (was: Test failure 
because of HAQueryableStateFsBackendITCase#testMapState thorws NPE)

> Test failure because of HAQueryableStateFsBackendITCase#testMapState throws 
> NPE
> ---
>
> Key: FLINK-10111
> URL: https://issues.apache.org/jira/browse/FLINK-10111
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: vinoyang
>Assignee: lyndldeng
>Priority: Critical
>
> stack trace : 
> {code:java}
> testMapState(org.apache.flink.queryablestate.itcases.HAQueryableStateFsBackendITCase)
>   Time elapsed: 0.528 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.flink.queryablestate.itcases.AbstractQueryableStateTestBase.testMapState(AbstractQueryableStateTestBase.java:840)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}
> travis log : https://travis-ci.org/apache/flink/jobs/412636761



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-10111) Test failure because of HAQueryableStateFsBackendITCase#testMapState thorws NPE

2018-09-05 Thread lyndldeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lyndldeng reassigned FLINK-10111:
-

Assignee: lyndldeng

> Test failure because of HAQueryableStateFsBackendITCase#testMapState thorws 
> NPE
> ---
>
> Key: FLINK-10111
> URL: https://issues.apache.org/jira/browse/FLINK-10111
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: vinoyang
>Assignee: lyndldeng
>Priority: Critical
>
> stack trace : 
> {code:java}
> testMapState(org.apache.flink.queryablestate.itcases.HAQueryableStateFsBackendITCase)
>   Time elapsed: 0.528 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.flink.queryablestate.itcases.AbstractQueryableStateTestBase.testMapState(AbstractQueryableStateTestBase.java:840)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}
> travis log : https://travis-ci.org/apache/flink/jobs/412636761



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10209) Exclude jdk.tools dependency from hadoop when running with java 9

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604280#comment-16604280
 ] 

ASF GitHub Bot commented on FLINK-10209:


zentol opened a new pull request #6663:  [FLINK-10209][build] Exclude jdk.tools 
dependency from hadoop 
URL: https://github.com/apache/flink/pull/6663
 
 
   ## What is the purpose of the change
   
   This PR excludes the `jdk.tools` dependency from hadoop when building with 
java 9. This dependency is for the `tools.jar` that was distributed in jdk8 and 
below, but no longer present in jdk9+.
   
   ## Brief change log
   
   * add `java9` profile that is automatically activated when using java 9
   * add `dependencyManagement` entry for `hadoop-common` that excludes the 
`jdk.tools` dependency
   * restrict `japicmp` plugin to flink artifacts
 * This change was necessary since the plugin apparently looks at all 
artifacts defined in the pom. The added `hadoop-common` entry in the 
`dependencyManagement` has no version attached and thus can't be resolved on 
it's own in non-hadoop modules, which always caused the build to fail. 
Explicitly filtering for flink modules appears to fix this.
   
   ## Verifying this change
   
   Use `mvn help:active-profiles` to verify that the profile is not activated 
on java 8 but on java 9.
   
   I ran this commit (along with other prerequisites for java 9) here: 
https://travis-ci.org/zentol/flink/builds/423911810



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Exclude jdk.tools dependency from hadoop when running with java 9
> -
>
> Key: FLINK-10209
> URL: https://issues.apache.org/jira/browse/FLINK-10209
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> {{hadoop-common}} has a {{jdk.tools}} dependency which cannot be resolved on 
> java 9. At least for compiling we have to exclude this dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10209) Exclude jdk.tools dependency from hadoop when running with java 9

2018-09-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10209:
---
Labels: pull-request-available  (was: )

> Exclude jdk.tools dependency from hadoop when running with java 9
> -
>
> Key: FLINK-10209
> URL: https://issues.apache.org/jira/browse/FLINK-10209
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> {{hadoop-common}} has a {{jdk.tools}} dependency which cannot be resolved on 
> java 9. At least for compiling we have to exclude this dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zentol opened a new pull request #6663: [FLINK-10209][build] Exclude jdk.tools dependency from hadoop

2018-09-05 Thread GitBox
zentol opened a new pull request #6663:  [FLINK-10209][build] Exclude jdk.tools 
dependency from hadoop 
URL: https://github.com/apache/flink/pull/6663
 
 
   ## What is the purpose of the change
   
   This PR excludes the `jdk.tools` dependency from hadoop when building with 
java 9. This dependency is for the `tools.jar` that was distributed in jdk8 and 
below, but no longer present in jdk9+.
   
   ## Brief change log
   
   * add `java9` profile that is automatically activated when using java 9
   * add `dependencyManagement` entry for `hadoop-common` that excludes the 
`jdk.tools` dependency
   * restrict `japicmp` plugin to flink artifacts
 * This change was necessary since the plugin apparently looks at all 
artifacts defined in the pom. The added `hadoop-common` entry in the 
`dependencyManagement` has no version attached and thus can't be resolved on 
it's own in non-hadoop modules, which always caused the build to fail. 
Explicitly filtering for flink modules appears to fix this.
   
   ## Verifying this change
   
   Use `mvn help:active-profiles` to verify that the profile is not activated 
on java 8 but on java 9.
   
   I ran this commit (along with other prerequisites for java 9) here: 
https://travis-ci.org/zentol/flink/builds/423911810



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10281) Table function parse regular expression contains backslash failed

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604274#comment-16604274
 ] 

ASF GitHub Bot commented on FLINK-10281:


yanghua commented on issue #6659: [FLINK-10281] Table function parse regular 
expression contains backslash failed
URL: https://github.com/apache/flink/pull/6659#issuecomment-418688875
 
 
   cc @twalthr @fhueske 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Table function parse regular expression contains backslash failed
> -
>
> Key: FLINK-10281
> URL: https://issues.apache.org/jira/browse/FLINK-10281
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> for example,  regular expression matches text ("\w") or number ("\d") :
> {code:java}
> testAllApis(
>   "foothebar".regexExtract("foo([\\w]+)", 1),   //OK, the method got 
> 'foo([\w]+)'
>   "'foothebar'.regexExtract('foo([w]+)', 1)",   //failed, the method got 
> 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get compile error.
>   "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)", //OK, the method got 
> 'foo([\w]+)' but must pass four '\'
>   "thebar"
> )
> {code}
> the "similar to" function has the same issue.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yanghua commented on issue #6659: [FLINK-10281] Table function parse regular expression contains backslash failed

2018-09-05 Thread GitBox
yanghua commented on issue #6659: [FLINK-10281] Table function parse regular 
expression contains backslash failed
URL: https://github.com/apache/flink/pull/6659#issuecomment-418688875
 
 
   cc @twalthr @fhueske 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-10283) FileCache logs unnecessary warnings

2018-09-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10283:
---
Labels: pull-request-available  (was: )

> FileCache logs unnecessary warnings
> ---
>
> Key: FLINK-10283
> URL: https://issues.apache.org/jira/browse/FLINK-10283
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.6.0, 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> When running any job the {{FileCache}} will log the following warning:
> {code}
> improper use of releaseJob() without a matching number of createTmpFiles() 
> calls for jobId 7d288909946d5eb4c8d40616ad1b20d2
> {code}
> This warning is misleading as {{createTmpFiles}} is only called when the 
> {{DistributedCache}} was used, but {{releaseJob}} is always called as part of 
> the task cleanup routine.
> The warning should simply be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10283) FileCache logs unnecessary warnings

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604273#comment-16604273
 ] 

ASF GitHub Bot commented on FLINK-10283:


zentol opened a new pull request #6662: [FLINK-10283][runtime] Remove 
misleading logging in FileCache
URL: https://github.com/apache/flink/pull/6662
 
 
   ## What is the purpose of the change
   
   Removes a misleading WARN message  as createTmpFiles is only called when the 
DistributedCache was used, but releaseJob is always called as part of the task 
cleanup routine.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FileCache logs unnecessary warnings
> ---
>
> Key: FLINK-10283
> URL: https://issues.apache.org/jira/browse/FLINK-10283
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.6.0, 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> When running any job the {{FileCache}} will log the following warning:
> {code}
> improper use of releaseJob() without a matching number of createTmpFiles() 
> calls for jobId 7d288909946d5eb4c8d40616ad1b20d2
> {code}
> This warning is misleading as {{createTmpFiles}} is only called when the 
> {{DistributedCache}} was used, but {{releaseJob}} is always called as part of 
> the task cleanup routine.
> The warning should simply be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zentol opened a new pull request #6662: [FLINK-10283][runtime] Remove misleading logging in FileCache

2018-09-05 Thread GitBox
zentol opened a new pull request #6662: [FLINK-10283][runtime] Remove 
misleading logging in FileCache
URL: https://github.com/apache/flink/pull/6662
 
 
   ## What is the purpose of the change
   
   Removes a misleading WARN message  as createTmpFiles is only called when the 
DistributedCache was used, but releaseJob is always called as part of the task 
cleanup routine.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (FLINK-10283) FileCache logs unnecessary warnings

2018-09-05 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-10283:


 Summary: FileCache logs unnecessary warnings
 Key: FLINK-10283
 URL: https://issues.apache.org/jira/browse/FLINK-10283
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 1.6.0, 1.7.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.6.1, 1.7.0


When running any job the {{FileCache}} will log the following warning:

{code}
improper use of releaseJob() without a matching number of createTmpFiles() 
calls for jobId 7d288909946d5eb4c8d40616ad1b20d2
{code}

This warning is misleading as {{createTmpFiles}} is only called when the 
{{DistributedCache}} was used, but {{releaseJob}} is always called as part of 
the task cleanup routine.
The warning should simply be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9172) Support external catalog factory that comes default with SQL-Client

2018-09-05 Thread Eron Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604258#comment-16604258
 ] 

Eron Wright  commented on FLINK-9172:
-

[~walterddr] I had some time and put together a PR based on the above 
discussion.  Would you be OK with assigning this to me?  Thanks!

> Support external catalog factory that comes default with SQL-Client
> ---
>
> Key: FLINK-9172
> URL: https://issues.apache.org/jira/browse/FLINK-9172
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Rong Rong
>Assignee: Rong Rong
>Priority: Major
>
> It will be great to have SQL-Client to support some external catalogs 
> out-of-the-box for SQL users to configure and utilize easily. I am currently 
> think of having an external catalog factory that spins up both streaming and 
> batch external catalog table sources and sinks. This could greatly unify and 
> provide easy access for SQL users. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zentol closed pull request #6647: [hotfix][docs] Fix error html tag in batch index documentatition

2018-09-05 Thread GitBox
zentol closed pull request #6647: [hotfix][docs] Fix error html tag in batch 
index documentatition
URL: https://github.com/apache/flink/pull/6647
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/dev/batch/index.md b/docs/dev/batch/index.md
index c624fce8954..62e41174c1b 100644
--- a/docs/dev/batch/index.md
+++ b/docs/dev/batch/index.md
@@ -592,7 +592,7 @@ val output: DataSet[(Int, String, Double)] = 
input.sum(0).min(2)
   
 
 
-
+
   Join
   
 Joins two data sets by creating all pairs of elements that are equal 
on their keys.
@@ -608,7 +608,7 @@ val result = input1.join(input2).where(0).equalTo(1)
 describe whether the join happens through partitioning or 
broadcasting, and whether it uses
 a sort-based or a hash-based algorithm. Please refer to the
 Transformations 
Guide for
-a list of possible hints and an example.
+a list of possible hints and an example.
 If no hint is specified, the system will try to make an estimate of 
the input sizes and
 pick the best strategy according to those estimates.
 {% highlight scala %}
@@ -700,7 +700,6 @@ val result = in.partitionByRange(0).mapPartition { ... }
 {% endhighlight %}
   
 
-
 
   Custom Partitioning
   


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10282) Provide separate thread-pool for REST endpoint

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604213#comment-16604213
 ] 

ASF GitHub Bot commented on FLINK-10282:


zentol opened a new pull request #6661: FLINK-10282][runtime] Separate RPC and 
REST thread-pools
URL: https://github.com/apache/flink/pull/6661
 
 
   ## What is the purpose of the change
   
   With this PR the REST endpoints are given their own thread-pool and don't 
share it with the Dispatchers RPC system.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Provide separate thread-pool for REST endpoint
> --
>
> Key: FLINK-10282
> URL: https://issues.apache.org/jira/browse/FLINK-10282
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime, REST
>Affects Versions: 1.5.1, 1.6.0, 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
>
> The REST endpoints currently share their thread-pools with the RPC system, 
> which can cause the Dispatcher to become unresponsive if the REST parts are 
> overloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10282) Provide separate thread-pool for REST endpoint

2018-09-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10282:
---
Labels: pull-request-available  (was: )

> Provide separate thread-pool for REST endpoint
> --
>
> Key: FLINK-10282
> URL: https://issues.apache.org/jira/browse/FLINK-10282
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime, REST
>Affects Versions: 1.5.1, 1.6.0, 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
>
> The REST endpoints currently share their thread-pools with the RPC system, 
> which can cause the Dispatcher to become unresponsive if the REST parts are 
> overloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zentol opened a new pull request #6661: FLINK-10282][runtime] Separate RPC and REST thread-pools

2018-09-05 Thread GitBox
zentol opened a new pull request #6661: FLINK-10282][runtime] Separate RPC and 
REST thread-pools
URL: https://github.com/apache/flink/pull/6661
 
 
   ## What is the purpose of the change
   
   With this PR the REST endpoints are given their own thread-pool and don't 
share it with the Dispatchers RPC system.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (FLINK-10282) Provide separate thread-pool for REST endpoint

2018-09-05 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-10282:


 Summary: Provide separate thread-pool for REST endpoint
 Key: FLINK-10282
 URL: https://issues.apache.org/jira/browse/FLINK-10282
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime, REST
Affects Versions: 1.6.0, 1.5.1, 1.7.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.6.1, 1.7.0, 1.5.4


The REST endpoints currently share their thread-pools with the RPC system, 
which can cause the Dispatcher to become unresponsive if the REST parts are 
overloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9735) Potential resource leak in RocksDBStateBackend#getDbOptions

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604175#comment-16604175
 ] 

ASF GitHub Bot commented on FLINK-9735:
---

yanghua opened a new pull request #6660: [FLINK-9735] Potential resource leak 
in RocksDBStateBackend#getDbOptions
URL: https://github.com/apache/flink/pull/6660
 
 
   ## What is the purpose of the change
   
   *This pull request fix Potential resource leak in 
RocksDBStateBackend#getDbOptions*
   
   
   ## Brief change log
   
 - *Close DBOption object before rewriting it*
   
   ## Verifying this change
   
   This change is already covered by existing tests*.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ **not documented**)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Potential resource leak in RocksDBStateBackend#getDbOptions
> ---
>
> Key: FLINK-9735
> URL: https://issues.apache.org/jira/browse/FLINK-9735
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> Here is related code:
> {code}
> if (optionsFactory != null) {
>   opt = optionsFactory.createDBOptions(opt);
> }
> {code}
> opt, an DBOptions instance, should be closed before being rewritten.
> getColumnOptions has similar issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9735) Potential resource leak in RocksDBStateBackend#getDbOptions

2018-09-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-9735:
--
Labels: pull-request-available  (was: )

> Potential resource leak in RocksDBStateBackend#getDbOptions
> ---
>
> Key: FLINK-9735
> URL: https://issues.apache.org/jira/browse/FLINK-9735
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> Here is related code:
> {code}
> if (optionsFactory != null) {
>   opt = optionsFactory.createDBOptions(opt);
> }
> {code}
> opt, an DBOptions instance, should be closed before being rewritten.
> getColumnOptions has similar issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yanghua opened a new pull request #6660: [FLINK-9735] Potential resource leak in RocksDBStateBackend#getDbOptions

2018-09-05 Thread GitBox
yanghua opened a new pull request #6660: [FLINK-9735] Potential resource leak 
in RocksDBStateBackend#getDbOptions
URL: https://github.com/apache/flink/pull/6660
 
 
   ## What is the purpose of the change
   
   *This pull request fix Potential resource leak in 
RocksDBStateBackend#getDbOptions*
   
   
   ## Brief change log
   
 - *Close DBOption object before rewriting it*
   
   ## Verifying this change
   
   This change is already covered by existing tests*.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ **not documented**)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10215) Add configuration of java option for historyserver

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604150#comment-16604150
 ] 

ASF GitHub Bot commented on FLINK-10215:


liuxianjiao commented on issue #6612: [FLINK-10215]Add configuration of java 
option  for historyserver
URL: https://github.com/apache/flink/pull/6612#issuecomment-418655574
 
 
   @tillrohrmann @twalthr can you help review this PR? Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add configuration of java option  for historyserver 
> 
>
> Key: FLINK-10215
> URL: https://issues.apache.org/jira/browse/FLINK-10215
> Project: Flink
>  Issue Type: Improvement
>  Components: History Server
>Affects Versions: 1.6.0, 2.0.0
>Reporter: liuxianjiao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>
> We hava configuation to set java option for taskmanger and jobmanager, but 
> there is a lack of historyserver's configuration. So, I add a 
> "env.java.opts.historyserver" ,which is just like "env.java.opts.jobmanager" 
> and 
> "env.java.opts.taskmanager".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] liuxianjiao commented on issue #6612: [FLINK-10215]Add configuration of java option for historyserver

2018-09-05 Thread GitBox
liuxianjiao commented on issue #6612: [FLINK-10215]Add configuration of java 
option  for historyserver
URL: https://github.com/apache/flink/pull/6612#issuecomment-418655574
 
 
   @tillrohrmann @twalthr can you help review this PR? Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-7243) Add ParquetInputFormat

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604138#comment-16604138
 ] 

ASF GitHub Bot commented on FLINK-7243:
---

lvhuyen commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input 
format
URL: https://github.com/apache/flink/pull/6483#issuecomment-418650688
 
 
   Hi Zhenqiu,
   
   I tried to read a parquet file using the code in your PR, and got the 
following error when reading a column of type timestamp (nullable = true).
   
   Thanks and best regards,
   
   ```
   Caused by: java.lang.IllegalArgumentException: Can not set 
java.math.BigInteger field ... to java.lang.String
   at 
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167)
   at 
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171)
   at 
sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:81)
   at java.lang.reflect.Field.set(Field.java:764)
   at 
org.apache.flink.formats.parquet.ParquetPojoInputFormat.convert(ParquetPojoInputFormat.java:96)
   at 
org.apache.flink.formats.parquet.ParquetInputFormat.nextRecord(ParquetInputFormat.java:159)
   at 
org.apache.flink.streaming.api.functions.source.ContinuousFileReaderOperator$SplitReader.run(ContinuousFileReaderOperator.java:323)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ParquetInputFormat
> --
>
> Key: FLINK-7243
> URL: https://issues.apache.org/jira/browse/FLINK-7243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API  SQL
>Reporter: godfrey he
>Assignee: Zhenqiu Huang
>Priority: Major
>  Labels: pull-request-available
>
> Add a {{ParquetInputFormat}} to read data from a Apache Parquet file. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] lvhuyen commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input format

2018-09-05 Thread GitBox
lvhuyen commented on issue #6483: [FLINK-7243][flink-formats] Add parquet input 
format
URL: https://github.com/apache/flink/pull/6483#issuecomment-418650688
 
 
   Hi Zhenqiu,
   
   I tried to read a parquet file using the code in your PR, and got the 
following error when reading a column of type timestamp (nullable = true).
   
   Thanks and best regards,
   
   ```
   Caused by: java.lang.IllegalArgumentException: Can not set 
java.math.BigInteger field ... to java.lang.String
   at 
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167)
   at 
sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171)
   at 
sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:81)
   at java.lang.reflect.Field.set(Field.java:764)
   at 
org.apache.flink.formats.parquet.ParquetPojoInputFormat.convert(ParquetPojoInputFormat.java:96)
   at 
org.apache.flink.formats.parquet.ParquetInputFormat.nextRecord(ParquetInputFormat.java:159)
   at 
org.apache.flink.streaming.api.functions.source.ContinuousFileReaderOperator$SplitReader.run(ContinuousFileReaderOperator.java:323)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (FLINK-10125) Unclosed ByteArrayDataOutputView in RocksDBMapState

2018-09-05 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-10125.

Resolution: Won't Fix

> Unclosed ByteArrayDataOutputView in RocksDBMapState
> ---
>
> Key: FLINK-10125
> URL: https://issues.apache.org/jira/browse/FLINK-10125
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>   ByteArrayDataOutputView dov = new ByteArrayDataOutputView(1);
> {code}
> dov is used in a try block but it is not closed in case of Exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10125) Unclosed ByteArrayDataOutputView in RocksDBMapState

2018-09-05 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604128#comment-16604128
 ] 

vinoyang commented on FLINK-10125:
--

[~azagrebin] *ByteArrayDataOutputView* was been replaced with 
*DataOutputSerializer* and there is no _close_ method. So I suggest that we can 
just close this issue? cc [~yuzhih...@gmail.com]

> Unclosed ByteArrayDataOutputView in RocksDBMapState
> ---
>
> Key: FLINK-10125
> URL: https://issues.apache.org/jira/browse/FLINK-10125
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>   ByteArrayDataOutputView dov = new ByteArrayDataOutputView(1);
> {code}
> dov is used in a try block but it is not closed in case of Exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10281) Table function parse regular expression contains backslash failed

2018-09-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10281:
---
Labels: pull-request-available  (was: )

> Table function parse regular expression contains backslash failed
> -
>
> Key: FLINK-10281
> URL: https://issues.apache.org/jira/browse/FLINK-10281
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> for example,  regular expression matches text ("\w") or number ("\d") :
> {code:java}
> testAllApis(
>   "foothebar".regexExtract("foo([\\w]+)", 1),   //OK, the method got 
> 'foo([\w]+)'
>   "'foothebar'.regexExtract('foo([w]+)', 1)",   //failed, the method got 
> 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get compile error.
>   "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)", //OK, the method got 
> 'foo([\w]+)' but must pass four '\'
>   "thebar"
> )
> {code}
> the "similar to" function has the same issue.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10281) Table function parse regular expression contains backslash failed

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604103#comment-16604103
 ] 

ASF GitHub Bot commented on FLINK-10281:


yanghua opened a new pull request #6659: [FLINK-10281] Table function parse 
regular expression contains backslash failed
URL: https://github.com/apache/flink/pull/6659
 
 
   
   ## What is the purpose of the change
   
   *This pull request fixes Table function parse regular expression contains 
backslash failed*
   
   
   ## Brief change log
   
 - *Changed `singleQuoteStringLiteral` in `ExpressionParser`*
   
   ## Verifying this change
   
   
   This change is already covered by existing tests, such as 
*LiteralTest#testParseRegularExpression*.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ **not documented**)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Table function parse regular expression contains backslash failed
> -
>
> Key: FLINK-10281
> URL: https://issues.apache.org/jira/browse/FLINK-10281
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> for example,  regular expression matches text ("\w") or number ("\d") :
> {code:java}
> testAllApis(
>   "foothebar".regexExtract("foo([\\w]+)", 1),   //OK, the method got 
> 'foo([\w]+)'
>   "'foothebar'.regexExtract('foo([w]+)', 1)",   //failed, the method got 
> 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get compile error.
>   "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)", //OK, the method got 
> 'foo([\w]+)' but must pass four '\'
>   "thebar"
> )
> {code}
> the "similar to" function has the same issue.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yanghua opened a new pull request #6659: [FLINK-10281] Table function parse regular expression contains backslash failed

2018-09-05 Thread GitBox
yanghua opened a new pull request #6659: [FLINK-10281] Table function parse 
regular expression contains backslash failed
URL: https://github.com/apache/flink/pull/6659
 
 
   
   ## What is the purpose of the change
   
   *This pull request fixes Table function parse regular expression contains 
backslash failed*
   
   
   ## Brief change log
   
 - *Changed `singleQuoteStringLiteral` in `ExpressionParser`*
   
   ## Verifying this change
   
   
   This change is already covered by existing tests, such as 
*LiteralTest#testParseRegularExpression*.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ **not documented**)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-10209) Exclude jdk.tools dependency from hadoop when running with java 9

2018-09-05 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604088#comment-16604088
 ] 

Chesnay Schepler commented on FLINK-10209:
--

Travis is passing even if we exclude the dependency in general. However I'm not 
really comfortable with making this change for jdk8 builds since I don't really 
know about possible repercussions in different environments.

> Exclude jdk.tools dependency from hadoop when running with java 9
> -
>
> Key: FLINK-10209
> URL: https://issues.apache.org/jira/browse/FLINK-10209
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.7.0
>
>
> {{hadoop-common}} has a {{jdk.tools}} dependency which cannot be resolved on 
> java 9. At least for compiling we have to exclude this dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10150) Chained batch operators interfere with each other other

2018-09-05 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-10150:
-
Summary: Chained batch operators interfere with each other other  (was: 
Inconsistent number of "Records received" / "Records sent")

> Chained batch operators interfere with each other other
> ---
>
> Key: FLINK-10150
> URL: https://issues.apache.org/jira/browse/FLINK-10150
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics, Webfrontend
>Affects Versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0
>Reporter: Helmut Zechmann
>Assignee: Chesnay Schepler
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
> Attachments: record_counts_flink_1_3.png, record_counts_flink_1_4.png
>
>
> The flink web ui displays an inconsistent number of "Records received" / 
> "Records sent” in the job overview "Subtasks" view.
> When I run the example wordcount batch job with a small input file on flink 
> 1.3.2 I get
>  * 3 records sent by the first subtask and
>  * 3 records received by the second subtask
> This is the result I would expect.
>  
> If I run the same job on flink 1.4.0 / 1.5.2 / 1.6.0 I get
>  * 13 records sent by the first subtask and
>  * 3 records received by the second subtask
> In real life jobs the numbers are much more strange.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-10150) Chained batch operators interfere with each other other

2018-09-05 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-10150.

Resolution: Fixed

master: 93ac95866e4473982a89e563d58ef0e374b3c0ba
1.6: 38f75527335a711ee0374a0fd3d28087d8568fb9
1.5: 37b93a38c722648e26e655e8513ed0d6d76dc7e0

> Chained batch operators interfere with each other other
> ---
>
> Key: FLINK-10150
> URL: https://issues.apache.org/jira/browse/FLINK-10150
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics, Webfrontend
>Affects Versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0
>Reporter: Helmut Zechmann
>Assignee: Chesnay Schepler
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0, 1.5.4
>
> Attachments: record_counts_flink_1_3.png, record_counts_flink_1_4.png
>
>
> The flink web ui displays an inconsistent number of "Records received" / 
> "Records sent” in the job overview "Subtasks" view.
> When I run the example wordcount batch job with a small input file on flink 
> 1.3.2 I get
>  * 3 records sent by the first subtask and
>  * 3 records received by the second subtask
> This is the result I would expect.
>  
> If I run the same job on flink 1.4.0 / 1.5.2 / 1.6.0 I get
>  * 13 records sent by the first subtask and
>  * 3 records received by the second subtask
> In real life jobs the numbers are much more strange.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10243) Add option to reduce latency metrics granularity

2018-09-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-10243:
---
Labels: pull-request-available  (was: )

> Add option to reduce latency metrics granularity
> 
>
> Key: FLINK-10243
> URL: https://issues.apache.org/jira/browse/FLINK-10243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Configuration, Metrics
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> The latency is currently tracked separately from each operator subtask to 
> each source subtask. The total number of latency metrics in the cluster is 
> thus {{(# of sources) * (# of operators) * parallelism²}}, i.e. quadratic 
> scaling.
> If we'd ignore the source subtask the scaling would be a lot more manageable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10243) Add option to reduce latency metrics granularity

2018-09-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604079#comment-16604079
 ] 

ASF GitHub Bot commented on FLINK-10243:


zentol opened a new pull request #6658: [FLINK-10243][metrics] Make latency 
metrics granularity configurable
URL: https://github.com/apache/flink/pull/6658
 
 
   ## What is the purpose of the change
   
   This PR makes the latency metric granularity configurable.
   Let's say we have 2 sources S1 S2 and one operator O, each with a 
parallelism of 2.
   
   In SUBTASK mode the latencies are tracked from each source subtask to each 
operator subtask, which is the current behavior.
   
   In OPERATOR mode we no longer differentiate between source subtasks. This 
mode is the new default since it is signifantly more stable than SUBTASK since 
it scales linearly with the number of operators and parallelism.
   
   In SINGLE mode we no longer differentiate between different sources. This 
mode is a bit questionable, but comes at virtually no cost so why not.
   
   ## Brief change log
   
   * add `MetricOptions#LATENCY_SOURCE_GRANULARITY` config option
   * add `LatencyStats#Granularity` enum that contains granularity-dependent 
behavior
   * extend `LatencyStats` to accept a `Granularity` argument
   * adjust `AbstractStreamOperator` to read configured granularity
   * update documentation
   
   
   ## Verifying this change
   
   The `LatencyStats` are currently untested. This PR adds 
the`LatencyStatsTest` class covering all aspects of this PR.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add option to reduce latency metrics granularity
> 
>
> Key: FLINK-10243
> URL: https://issues.apache.org/jira/browse/FLINK-10243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Configuration, Metrics
>Affects Versions: 1.7.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> The latency is currently tracked separately from each operator subtask to 
> each source subtask. The total number of latency metrics in the cluster is 
> thus {{(# of sources) * (# of operators) * parallelism²}}, i.e. quadratic 
> scaling.
> If we'd ignore the source subtask the scaling would be a lot more manageable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zentol opened a new pull request #6658: [FLINK-10243][metrics] Make latency metrics granularity configurable

2018-09-05 Thread GitBox
zentol opened a new pull request #6658: [FLINK-10243][metrics] Make latency 
metrics granularity configurable
URL: https://github.com/apache/flink/pull/6658
 
 
   ## What is the purpose of the change
   
   This PR makes the latency metric granularity configurable.
   Let's say we have 2 sources S1 S2 and one operator O, each with a 
parallelism of 2.
   
   In SUBTASK mode the latencies are tracked from each source subtask to each 
operator subtask, which is the current behavior.
   
   In OPERATOR mode we no longer differentiate between source subtasks. This 
mode is the new default since it is signifantly more stable than SUBTASK since 
it scales linearly with the number of operators and parallelism.
   
   In SINGLE mode we no longer differentiate between different sources. This 
mode is a bit questionable, but comes at virtually no cost so why not.
   
   ## Brief change log
   
   * add `MetricOptions#LATENCY_SOURCE_GRANULARITY` config option
   * add `LatencyStats#Granularity` enum that contains granularity-dependent 
behavior
   * extend `LatencyStats` to accept a `Granularity` argument
   * adjust `AbstractStreamOperator` to read configured granularity
   * update documentation
   
   
   ## Verifying this change
   
   The `LatencyStats` are currently untested. This PR adds 
the`LatencyStatsTest` class covering all aspects of this PR.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services