[jira] [Created] (FLINK-20578) Cannot create empty array using ARRAY[]

2020-12-11 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-20578:
-

 Summary: Cannot create empty array using ARRAY[]
 Key: FLINK-20578
 URL: https://issues.apache.org/jira/browse/FLINK-20578
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Affects Versions: 1.11.2
Reporter: Fabian Hueske


Calling the ARRAY function without an element (`ARRAY[]`) results in an error 
message.

Is that the expected behavior?

How can users create empty arrays?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19790) Writing MAP to Kafka with JSON format produces incorrect data.

2020-10-23 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-19790:
-

 Summary: Writing MAP to Kafka with JSON format 
produces incorrect data.
 Key: FLINK-19790
 URL: https://issues.apache.org/jira/browse/FLINK-19790
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Ecosystem
Affects Versions: 1.11.2
Reporter: Fabian Hueske


Running the following SQL script writes incorrect data to Kafka:
{code:java}
CREATE TEMPORARY TABLE tmp_1 (m MAP) WITH (
  'connector' = 'kafka',
  'format' = 'json',
  'properties.bootstrap.servers' = '...',
  'properties.group.id' = '...',
  'topic' = 'tmp-1'
);

CREATE TEMPORARY TABLE gen (k STRING, v STRING) WITH (
  'connector' = 'datagen'
);

CREATE TEMPORARY VIEW gen_short AS
SELECT SUBSTR(k, 0, 4) AS k, SUBSTR(v, 0, 4) AS v FROM gen;

INSERT INTO tmp_1
SELECT MAP[k, v] FROM gen_short; {code}
Printing the content of the {{tmp-1}} topics results in the following output:
{code:java}
$ kafka-console-consumer --bootstrap-server ... --from-beginning --topic tmp-1 
| head -n 5
{"m":{"8a93":"6102"}}
{"m":{"8a93":"6102","7922":"f737"}}
{"m":{"8a93":"6102","7922":"f737","9b63":"15b0"}}
{"m":{"8a93":"6102","7922":"f737","9b63":"15b0","c38b":"b55c"}}
{"m":{"8a93":"6102","7922":"f737","9b63":"15b0","c38b":"b55c","222c":"f3e2"}}
{code}
As you can see, the map is not correctly encoded as JSON and written to Kafka.

I've run the query with the Blink planner with object reuse and operator 
pipelining disabled.
Writing with Avro works as expected.

Hence I assume that the JSON encoder/serializer reuses the Map object when 
encoding the JSON.

 

 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19398) Hive connector fails with IllegalAccessError if submitted as usercode

2020-09-24 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-19398:
-

 Summary: Hive connector fails with IllegalAccessError if submitted 
as usercode
 Key: FLINK-19398
 URL: https://issues.apache.org/jira/browse/FLINK-19398
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.11.2, 1.12.0
Reporter: Fabian Hueske


Using Flink's Hive connector fails if the dependency is loaded with the user 
code classloader with the following exception.


{code:java}
java.lang.IllegalAccessError: tried to access method 
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.(Lorg/apache/flink/core/fs/Path;Lorg/apache/flink/streaming/api/functions/sink/filesystem/BucketAssigner;Lorg/apache/flink/streaming/api/functions/sink/filesystem/BucketFactory;Lorg/apache/flink/streaming/api/functions/sink/filesystem/BucketWriter;Lorg/apache/flink/streaming/api/functions/sink/filesystem/RollingPolicy;ILorg/apache/flink/streaming/api/functions/sink/filesystem/OutputFileConfig;)V
 from class 
org.apache.flink.streaming.api.functions.sink.filesystem.HadoopPathBasedBulkFormatBuilder
at 
org.apache.flink.streaming.api.functions.sink.filesystem.HadoopPathBasedBulkFormatBuilder.createBuckets(HadoopPathBasedBulkFormatBuilder.java:127)
 
~[flink-connector-hive_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.table.filesystem.stream.StreamingFileWriter.initializeState(StreamingFileWriter.java:81)
 ~[flink-table-blink_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:106)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:258)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
 ~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528) 
~[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) 
[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) 
[flink-dist_2.12-1.11.2-stream2-SNAPSHOT.jar:1.11.2-stream2-SNAPSHOT]
{code}

The problem is the constructor of {{Buckets}} with default visibility which is 
called from {{HadoopPathBasedBulkFormatBuilder}} . This works as long as both 
classes are loaded with the same classloader but when they are loaded in 
different classloaders, the access fails.

{{Buckets}} is loaded with the system CL because it is part of 
flink-streaming-java. 

 

To solve this issue, we should change the visibility of the {{Buckets}} 
constructor to {{public}}.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19321) CollectSinkFunction does not define serialVersionUID

2020-09-21 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-19321:
-

 Summary: CollectSinkFunction does not define serialVersionUID
 Key: FLINK-19321
 URL: https://issues.apache.org/jira/browse/FLINK-19321
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.11.2, 1.12.0
Reporter: Fabian Hueske


The {{org.apache.flink.streaming.api.operators.collect.CollectSinkFunction}} 
function does not define a {{serialVersionUID}}.

Function objects are serialized using Java Serialization and should define a 
{{serialVersionUID.}}
 

If no o serialVersionUID is defined, Java automatically generates IDs to check 
compatibility of objects during deserialization. However, the generation 
depends on the environment (JVM, class version, etc.) and can hence lead to 
{{[java.io|http://java.io/].InvalidClassException}} even if the classes are 
compatible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18988) Continuous query with LATERAL and LIMIT produces wrong result

2020-08-18 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18988:
-

 Summary: Continuous query with LATERAL and LIMIT produces wrong 
result
 Key: FLINK-18988
 URL: https://issues.apache.org/jira/browse/FLINK-18988
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.11.1
Reporter: Fabian Hueske


I was trying out the example queries provided in this blog post: 
[https://materialize.io/lateral-joins-and-demand-driven-queries/] to check if 
Flink supports the same and found that the queries were translated and executed 
but produced the wrong result.

I used the SQL Client and Kafka (running at kafka:9092) to store the table 
data. I executed the following statements:
{code:java}
-- create cities table
CREATE TABLE cities (
  name STRING NOT NULL,
  state STRING NOT NULL,
  pop INT NOT NULL
) WITH (
  'connector' = 'kafka',
  'topic' = 'cities',
  'properties.bootstrap.servers' = 'kafka:9092',
  'properties.group.id' = 'mygroup', 
  'scan.startup.mode' = 'earliest-offset',
  'format' = 'json'
);

-- fill cities table
INSERT INTO cities VALUES
  ('Los_Angeles', 'CA', 3979576),
  ('Phoenix', 'AZ', 1680992),
  ('Houston', 'TX', 2320268),
  ('San_Diego', 'CA', 1423851),
  ('San_Francisco', 'CA', 881549),
  ('New_York', 'NY', 8336817),
  ('Dallas', 'TX', 1343573),
  ('San_Antonio', 'TX', 1547253),
  ('San_Jose', 'CA', 1021795),
  ('Chicago', 'IL', 2695598),
  ('Austin', 'TX', 978908);

-- execute query
SELECT state, name 
FROM
  (SELECT DISTINCT state FROM cities) states,
  LATERAL (
SELECT name, pop
FROM cities
WHERE state = states.state
ORDER BY pop
DESC LIMIT 3
  );

-- result
state  name
   CA   Los_Angeles
   NY  New_York
   IL   Chicago

-- expected result
state | name
--+-
TX    | Dallas
AZ    | Phoenix
IL    | Chicago
TX    | Houston
CA    | San_Jose
NY    | New_York
CA    | San_Diego
CA    | Los_Angeles
TX    | San_Antonio

{code}
As you can see from the query result, Flink computes the top3 cities over all 
states, not for every state individually. Hence, I assume that this is a bug in 
the query optimizer or one of the rewriting rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18704) Support DECIMAL types in datagen Table source connector

2020-07-24 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18704:
-

 Summary: Support DECIMAL types in datagen Table source connector
 Key: FLINK-18704
 URL: https://issues.apache.org/jira/browse/FLINK-18704
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Ecosystem
Affects Versions: 1.11.1, 1.11.0
Reporter: Fabian Hueske


It would be great if the [{{datagen}} source 
connector|https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/datagen.html]
 would support {{DECIMAL}} types.

Since data is randomly generated and FLOAT and DOUBLE are supported, we could 
implement this feature by creatig a {{BigDecimal}} from a random float.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18683) Support @DataTypeHint for TableFunction output types

2020-07-23 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18683:
-

 Summary: Support @DataTypeHint for TableFunction output types
 Key: FLINK-18683
 URL: https://issues.apache.org/jira/browse/FLINK-18683
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Fabian Hueske


For ScalarFunctions, the return type of an eval method can be declared with a 
{{@DataTypeHint}}:


{code:java}
@DataTypeHint("INT")
public Integer eval(Integer value) {
  return value * 2;
}{code}

This does not work for TableFunctions because the {{@DataTypeHint}} annotation 
refers to the {{void}} return type. Hence, {{TableFunction}} {{eval()}} methods 
must always be annotated with the more complex {{@FunctionHint}} method.
However, I think that context, it is clear that the {{@DataTypeHint}} 
annotation refers to the actual return type of the table function (the type 
parameter of {{TableFunction}}).



We could consider allowing {{@DataTypeHint}} annotations also on 
{{TableFunction}} classes (defining the output type of all eval methods) and 
{{eval()}} methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18673) Calling ROW() in a UDF results in UnsupportedOperationException

2020-07-22 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18673:
-

 Summary: Calling ROW() in a UDF results in 
UnsupportedOperationException
 Key: FLINK-18673
 URL: https://issues.apache.org/jira/browse/FLINK-18673
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Fabian Hueske


Given a UDF {{func}} that accepts a {{ROW(INT, STRING)}} as parameter, it 
cannot be called like this:


{code:java}
SELECT func(ROW(a, b)) FROM t{code}
while this works

 

 
{code:java}
SELECT func(r) FROM (SELECT ROW(a, b) FROM t){code}
 

The exception returned is:
{quote}
org.apache.flink.table.api.ValidationException: SQL validation failed. null
{quote}
with an empty {{UnsupportedOperationException}} as cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18672) Fix Scala code examples for UDF type inference annotations

2020-07-22 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18672:
-

 Summary: Fix Scala code examples for UDF type inference annotations
 Key: FLINK-18672
 URL: https://issues.apache.org/jira/browse/FLINK-18672
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Table SQL / API
Reporter: Fabian Hueske


The Scala code examples for the [UDF type inference 
annotations|https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/functions/udfs.html#type-inference]
 are not correct.

For example: the following {{FunctionHint}} annotation 

{code:scala}
@FunctionHint(
  input = Array(@DataTypeHint("INT"), @DataTypeHint("INT")),
  output = @DataTypeHint("INT")
)
{code}


needs to be changed to

{code:scala}
@FunctionHint(
  input = Array(new DataTypeHint("INT"), new DataTypeHint("INT")),
  output = new DataTypeHint("INT")
)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18657) Change semantics of DESCRIBE SELECT statement

2020-07-21 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18657:
-

 Summary: Change semantics of DESCRIBE SELECT statement
 Key: FLINK-18657
 URL: https://issues.apache.org/jira/browse/FLINK-18657
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.11.0
Reporter: Fabian Hueske


Running a {{DESCRIBE SELECT ...}} returns the same as {{EXPLAIN PLAN FOR SELECT 
...}}, i.e., an unstructured, textual description of the execution plan.

I think it would be more consistent and also more useful if {{DESCRIBE SELECT 
...}} would display the result schema of the {{SELECT}} query, just like 
{{DESCRIBE TABLE xxx}} shows the schema of the table {{xxx}}. This would be 
useful for users who need to create a sink table to write the result schema of 
a SELECT query and figure out the schema of the query.

This would be a breaking change, but {{DESCRIBE SELECT ...}} seems to be an 
undocumented feature (at least it is not listed on the [DESCRIBE 
docs|https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/describe.html])
 and the original behavior would be still available via {{EXPLAIN PLAN FOR 
SELECT ...}} (which is documented).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18623) CREATE TEMPORARY TABLE not documented

2020-07-17 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18623:
-

 Summary: CREATE TEMPORARY TABLE not documented
 Key: FLINK-18623
 URL: https://issues.apache.org/jira/browse/FLINK-18623
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Table SQL / API
Affects Versions: 1.11.0
Reporter: Fabian Hueske


The {{CREATE TEMPORARY TABLE}} syntax that was added with FLINK-15591 is not 
included in the [{{CREATE TABLE}} 
documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/create.html#create-table]
 and therefore not visible to our users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18378) CatalogManager checks for CatalogTableImpl instead of CatalogTable

2020-06-19 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18378:
-

 Summary: CatalogManager checks for CatalogTableImpl instead of 
CatalogTable
 Key: FLINK-18378
 URL: https://issues.apache.org/jira/browse/FLINK-18378
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.11.0
Reporter: Fabian Hueske


The {{CatalogManager}} checks for {{CatalogTableImpl}} instead of 
{{CatalogTable}} to decide whether to resolve the table schema. See 
https://github.com/apache/flink/blob/release-1.11/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/CatalogManager.java#L369

Resolving the table schema adjusts the type of fields which are referenced by 
watermarks, i.e., changes their type from {{TIMESTAMP(3)}} to {{TIMESTAMP(3) 
ROWTIME}}. If table schema is not properly resolved some queries involving time 
attributes will fail during type validation.

However, {{CatalogTableImpl}} is an internal implementation of the public 
{{CatalogTable}} interface. Hence, external {{Catalog}} implementations will 
not work with {{CatalogTableImpl}} but rather {{CatalogTable}} and hence might 
fail to work correctly with queries that involve event-time attributes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-18298) Rename TableResult headers of SHOW statements

2020-06-15 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-18298:
-

 Summary: Rename TableResult headers of SHOW statements
 Key: FLINK-18298
 URL: https://issues.apache.org/jira/browse/FLINK-18298
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.11.0
Reporter: Fabian Hueske


The SHOW TABLES and SHOW FUNCTIONS commands are listing all tables and 
functions of the currently selected database.
 With FLIP-84, the result is passed back as a TableResult object, that includes 
the schema of the result as a TableSchema.

The column name for the result of SHOW TABLES and SHOW FUNCTION is "result":
  
{code:java}
SHOW TABLES;
result
myTable1
myTable2
{code}
 
 I think this name is not very descriptive and too generic. 
 IMO it would be nice to change it to "table name" and "function name", 
respectively.
{code:java}
SHOW TABLES; 
table names
myTable1 
myTable2{code}
Would be nice to get this little improvement in before the 1.11 release.

cc [~godfreyhe], [~twalthr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17845) Can't remove a table connector property with ALTER TABLE

2020-05-20 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-17845:
-

 Summary: Can't remove a table connector property with ALTER TABLE
 Key: FLINK-17845
 URL: https://issues.apache.org/jira/browse/FLINK-17845
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Fabian Hueske


It is not possible to remove an existing table property from a table.
Looking at the [source 
code|https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/sqlexec/SqlToOperationConverter.java#L295]
 this seems to be the intended semantics, but it seems counter-intuitive to me.

If I create a table with the following statement:

{code}
CREATE TABLE `testTable` (
  id INT
)
WITH (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topicX' = 'test',  -- Woops, I made a typo here
[...]
)
{code}
The statement will be successfully executed. However, the table cannot be used 
due to the typo.

Fixing the typo with the following DDL is not possible:

{code}
ALTER TABLE `testTable` SET (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'test',  -- Fixing the typo
)
{code}

because the key {{connector.topicX}} is not removed.

Right now it seems that the only way to fix a table with an invalid key is to 
DROP and CREATE it. I think that this use case should be supported by ALTER 
TABLE.
I would even argue that the expected behavior is that previous properties are 
removed and replaced by the new properties.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16716) Update Roadmap after Flink 1.10 release

2020-03-23 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-16716:
-

 Summary: Update Roadmap after Flink 1.10 release
 Key: FLINK-16716
 URL: https://issues.apache.org/jira/browse/FLINK-16716
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Fabian Hueske


The roadmap on the Flink website needs to be updated to reflect the new 
features of Flink 1.10 and the planned features and improvements of future 
releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16540) Fully specify bugfix version of Docker containers in Flink Playground docker-compose.yaml files

2020-03-11 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-16540:
-

 Summary: Fully specify bugfix version of Docker containers in 
Flink Playground docker-compose.yaml files
 Key: FLINK-16540
 URL: https://issues.apache.org/jira/browse/FLINK-16540
 Project: Flink
  Issue Type: Improvement
Reporter: Fabian Hueske
Assignee: Fabian Hueske


I recently noticed that we do not guarantee client-jobmanager compatibility 
among different bugfix versions of the same minor version (see 
https://github.com/ververica/sql-training/issues/8 for details).

In this case, a job submitted via a Flink 1.9.0 client could not be executed on 
a Flink 1.9.2 cluster.

For the playgrounds this can easily happen, because we build a custom Docker 
image (with a fixed bugfix version) and load the latest Flink images for the 
latest bugfix version of the same minor version.

We can avoid such problems by fixing the bugfix version of the Flink images 
that we run in the playground.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16309) ElasticSearch 7 connector is missing in SQL connector list

2020-02-27 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-16309:
-

 Summary: ElasticSearch 7 connector is missing in SQL connector list
 Key: FLINK-16309
 URL: https://issues.apache.org/jira/browse/FLINK-16309
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Fabian Hueske


The ES7 connector is not listed on 
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16308) SQL connector download links are broken

2020-02-27 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-16308:
-

 Summary: SQL connector download links are broken
 Key: FLINK-16308
 URL: https://issues.apache.org/jira/browse/FLINK-16308
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Fabian Hueske


The download links for the SQL connectors on 
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html
 are broken because central.maven.org is down.

The URLs should be updated to 
https://repo.maven.apache.org/maven2/org/apache/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16067) Flink's CalciteParser swallows error position information

2020-02-14 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-16067:
-

 Summary: Flink's CalciteParser swallows error position information
 Key: FLINK-16067
 URL: https://issues.apache.org/jira/browse/FLINK-16067
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.10.0
Reporter: Fabian Hueske


The parser used to parse SQL queries submitted through 
{{TableEnvironmentImpl.sqlUpdate}} does not add the original exception from 
Calcite as a cause to Flink's 
{{org.apache.flink.table.api.SqlParserException}}. 

However, Calcite's exception contains the position in the SQL query where the 
parser failed.
This info would help users to fix their queries.

This used to work with Flink 1.9.x.

CC [~dwysakowicz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15826) Add renameFunction() to Catalog

2020-01-31 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-15826:
-

 Summary: Add renameFunction() to Catalog
 Key: FLINK-15826
 URL: https://issues.apache.org/jira/browse/FLINK-15826
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.11.0
Reporter: Fabian Hueske


The {{Catalog}} interface lacks a method to rename a function.

It is possible to change all properties (via {{alterFunction()}}) but it is not 
possible to rename a database.

A {{renameTable()}} method is exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15825) Add renameDatabase() to Catalog

2020-01-31 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-15825:
-

 Summary: Add renameDatabase() to Catalog
 Key: FLINK-15825
 URL: https://issues.apache.org/jira/browse/FLINK-15825
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.11.0
Reporter: Fabian Hueske


The {{Catalog}} interface lacks a method to rename a database. 

It is possible to change all properties (via {{alterDatabase()}}) but it is not 
possible to rename a database.

A {{renameTable()}} method is exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15644) Add support for SQL query validation

2020-01-17 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-15644:
-

 Summary: Add support for SQL query validation 
 Key: FLINK-15644
 URL: https://issues.apache.org/jira/browse/FLINK-15644
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Fabian Hueske


It would be good if the {{TableEnvironment}} would offer methods to check the 
validity of SQL queries. Such a method could be used by services (CLI query 
shells, notebooks, SQL UIs) that are backed by Flink and execute their queries 
on Flink.

Validation should be available in two levels:
 # Validation of syntax and semantics: This includes parsing the query, 
checking the catalog for dbs, tables, fields, type checks for expressions and 
functions, etc. This will check if the query is a valid SQL query.
 # Validation that query is supported: Checks if Flink can execute the given 
query. Some syntactically and semantically valid SQL queries are not supported, 
esp. in a streaming context. This requires running the optimizer. If the 
optimizer generates an execution plan, the query can be executed. This check 
includes the first step and is more expensive.

The reason for this separation is that the first check can be done much fast as 
it does not involve calling the optimizer. Hence, it would be suitable for fast 
checks in an interactive query editor. The second check might take more time 
(depending on the complexity of the query) and might not be suitable for rapid 
checks but only on explicit user request.

Requirements:
 * validation does not modify the state of the {{TableEnvironment}}, i.e. it 
does not add plan operators
 * validation does not require connector dependencies
 * validation can identify the update mode of a continuous query result 
(append-only, upsert, retraction).

Out of scope for this issue:
 * better error messages for unsupported features as suggested by FLINK-7217



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14313) Add Gojek to Chinese Powered By page

2019-10-02 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-14313:
-

 Summary: Add Gojek to Chinese Powered By page
 Key: FLINK-14313
 URL: https://issues.apache.org/jira/browse/FLINK-14313
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Project Website
Reporter: Fabian Hueske


Add Gojek to Chinese Powered By Flink page.
The relevant commit is: 7fc857030998ea8ce6366bfec63850e08e24c563



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14227) Add Razorpay to Chinese Powered By page

2019-09-26 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-14227:
-

 Summary: Add Razorpay to Chinese Powered By page
 Key: FLINK-14227
 URL: https://issues.apache.org/jira/browse/FLINK-14227
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Project Website
Reporter: Fabian Hueske


Razorpay was added to the English Powered By page with commit: 
[87a034140e97be42616e1a3dbe58e4f7a014e560|https://github.com/apache/flink-web/commit/87a034140e97be42616e1a3dbe58e4f7a014e560].

It should be added to the Chinese Powered By (and index.html) page as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14213) Link from Flink website to Getting Started Overview page

2019-09-25 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-14213:
-

 Summary: Link from Flink website to Getting Started Overview page
 Key: FLINK-14213
 URL: https://issues.apache.org/jira/browse/FLINK-14213
 Project: Flink
  Issue Type: Sub-task
  Components: Project Website
Reporter: Fabian Hueske
Assignee: Fabian Hueske


The "Tutorials" link on the Flink website currently links to the "Local Setup 
Tutorial" of the documentation.

We should replace that link with a "Getting Started" link that points to the 
Getting Started Overview page (which lists code walkthroughs and playgrounds).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14117) Translate changes on documentation index page to Chinese

2019-09-18 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-14117:
-

 Summary: Translate changes on documentation index page to Chinese
 Key: FLINK-14117
 URL: https://issues.apache.org/jira/browse/FLINK-14117
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Affects Versions: 1.10.0
Reporter: Fabian Hueske


The changes of commit 
[ee0d6fdf0604d74bd1cf9a6eb9cf5338ac1aa4f9|https://github.com/apache/flink/commit/ee0d6fdf0604d74bd1cf9a6eb9cf5338ac1aa4f9#diff-1a523bd9fa0dbf998008b37579210e12]
 on the documentation index page should be translated to Chinese.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14116) Translate changes on Getting Started Overview to Chinese

2019-09-18 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-14116:
-

 Summary: Translate changes on Getting Started Overview to Chinese
 Key: FLINK-14116
 URL: https://issues.apache.org/jira/browse/FLINK-14116
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Affects Versions: 1.10.0
Reporter: Fabian Hueske


The changes of commit 
[https://github.com/apache/flink/blob/master/docs/getting-started/walkthroughs/datastream_api.zh.md|https://github.com/apache/flink/commit/ee0d6fdf0604d74bd1cf9a6eb9cf5338ac1aa4f9#diff-759f29741e3adc9d9cdc95c996f25869]
 on the Getting Started Overview should be translated to Chinese



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14115) Translate DataStream Code Walkthrough to Chinese

2019-09-18 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-14115:
-

 Summary: Translate DataStream Code Walkthrough to Chinese
 Key: FLINK-14115
 URL: https://issues.apache.org/jira/browse/FLINK-14115
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Affects Versions: 1.10.0
Reporter: Fabian Hueske


The new DataStream Code Walkthrough should be translated to Chinese:

https://github.com/apache/flink/blob/master/docs/getting-started/walkthroughs/datastream_api.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-13994) Translate "Getting Started" overview to Chinese

2019-09-06 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-13994:
-

 Summary: Translate "Getting Started" overview to Chinese
 Key: FLINK-13994
 URL: https://issues.apache.org/jira/browse/FLINK-13994
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Reporter: Fabian Hueske


The "Getting Started" overview page needs to be translated to Chinese: 

https://github.com/apache/flink/blob/master/docs/getting-started/index.zh.md



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13975) Translate "Upcoming Events" on Chinese index.html

2019-09-05 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-13975:
-

 Summary: Translate "Upcoming Events" on Chinese index.html
 Key: FLINK-13975
 URL: https://issues.apache.org/jira/browse/FLINK-13975
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Project Website
Reporter: Fabian Hueske


We recently added a section for "Upcoming Events" to the index page of the 
Flink website.

We need to translate "Upcoming Events" on the Chinese version of the main page.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13942) Add Overview page for Getting Started section

2019-09-02 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-13942:
-

 Summary: Add Overview page for Getting Started section
 Key: FLINK-13942
 URL: https://issues.apache.org/jira/browse/FLINK-13942
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 1.9.0, 1.10.0
Reporter: Fabian Hueske
Assignee: Fabian Hueske


The Getting Started section provide different types of tutorials that target 
users with different interests and backgrounds.

We should add a brief overview page that describes the different tutorials such 
that users easily find the material that they need to get started with Flink.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13872) Translate Operations Playground to Chinese

2019-08-27 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-13872:
-

 Summary: Translate Operations Playground to Chinese
 Key: FLINK-13872
 URL: https://issues.apache.org/jira/browse/FLINK-13872
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Affects Versions: 1.9.0
Reporter: Fabian Hueske


The [Operations 
Playground|https://ci.apache.org/projects/flink/flink-docs-release-1.9/getting-started/docker-playgrounds/flink_operations_playground.html]
 is a quick and convenient way to learn about Flink's operational features (job 
submission, failure recovery, job updates, scaling, metrics).

We should translate it to Chinese as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13863) Update Operations Playground to Flink 1.9.0

2019-08-26 Thread Fabian Hueske (Jira)
Fabian Hueske created FLINK-13863:
-

 Summary: Update Operations Playground to Flink 1.9.0
 Key: FLINK-13863
 URL: https://issues.apache.org/jira/browse/FLINK-13863
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 1.9.0
Reporter: Fabian Hueske
Assignee: Fabian Hueske


Update the operations playground to Flink 1.9.0
This includes:
* Updating the flink-playgrounds repository
* Updating the "Getting Started/Docker Playgrounds" section in the 
documentation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13583) Add SK telecom to Chinese Powered By page

2019-08-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-13583:
-

 Summary: Add SK telecom to Chinese Powered By page
 Key: FLINK-13583
 URL: https://issues.apache.org/jira/browse/FLINK-13583
 Project: Flink
  Issue Type: Task
  Components: Project Website
Reporter: Fabian Hueske


SK telecom was added to English Powered By page with commit 
8e73be1df9d3af875e2abcd3610fe20b8e7415ac

It should also be added to the Chinese Powered By page.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-12833) Add Klaviyo to Chinese PoweredBy page

2019-06-13 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-12833:
-

 Summary: Add Klaviyo to Chinese PoweredBy page
 Key: FLINK-12833
 URL: https://issues.apache.org/jira/browse/FLINK-12833
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Project Website
Reporter: Fabian Hueske


Commit b54ecfa930653bcfecd60df3414deca5291c6cb3 added Klaviyo to the English 
PoweredBy page.

It should be added to the Chinese page as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12486) RocksDB State Backend docs don't explain that operator state is kept on heap

2019-05-10 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-12486:
-

 Summary: RocksDB State Backend docs don't explain that operator 
state is kept on heap
 Key: FLINK-12486
 URL: https://issues.apache.org/jira/browse/FLINK-12486
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Runtime / State Backends
Affects Versions: 1.8.0, 1.7.2, 1.9.0
Reporter: Fabian Hueske


The documentation of the [RocksDB State 
Backend|https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/state_backends.html#the-rocksdbstatebackend]
 does not explain that operator state is maintained on the JVM heap and not in 
RocksDB (on disk).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12460) Change taskmanager.tmp.dirs to io.tmp.dirs in configuration docs

2019-05-09 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-12460:
-

 Summary: Change taskmanager.tmp.dirs to io.tmp.dirs in 
configuration docs
 Key: FLINK-12460
 URL: https://issues.apache.org/jira/browse/FLINK-12460
 Project: Flink
  Issue Type: Task
  Components: Documentation, Runtime / Configuration
Affects Versions: 1.8.0, 1.7.2
Reporter: Fabian Hueske


The [configuration 
documentation|https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html]
 explains the deprecated parameter {{taskmanager.tmp.dirs}} which should be 
updated to {{io.tmp.dirs}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12204) Improve JDBCOutputFormat ClassCastException

2019-04-15 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-12204:
-

 Summary: Improve JDBCOutputFormat ClassCastException
 Key: FLINK-12204
 URL: https://issues.apache.org/jira/browse/FLINK-12204
 Project: Flink
  Issue Type: Task
  Components: Connectors / JDBC
Affects Versions: 1.8.0
Reporter: Fabian Hueske
 Fix For: 1.9.0, 1.8.1


ClassCastExceptions thrown by JDBCOutputFormat are not very helpful because 
they do not provide information for which input field the cast failed.

We should catch the exception and enrich it with information about the affected 
field to make it more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12125) Add OVH to poweredby.zh.md and index.zh.md

2019-04-08 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-12125:
-

 Summary: Add OVH to poweredby.zh.md and index.zh.md
 Key: FLINK-12125
 URL: https://issues.apache.org/jira/browse/FLINK-12125
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Project Website
Reporter: Fabian Hueske


OVH was added to the {{poweredby.md}} and {index.md}} pages in commits 
55ae4426d5b91695e1e5629b1d0a16b7a1e010f0 and 
d4a160ab336c5ae1b2f772fbeff7e003478e274b. See also PR 
https://github.com/apache/flink-web/pull/193.

The corresponding Chinese pages should be updated accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11825) Resolve name clash of StateTTL TimeCharacteristic class

2019-03-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-11825:
-

 Summary: Resolve name clash of StateTTL TimeCharacteristic class
 Key: FLINK-11825
 URL: https://issues.apache.org/jira/browse/FLINK-11825
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Affects Versions: 1.7.2
Reporter: Fabian Hueske


The StateTTL feature introduced the class 
\{{org.apache.flink.api.common.state.TimeCharacteristic}} which clashes with 
\{{org.apache.flink.streaming.api.TimeCharacteristic}}. 

This is a problem for two reasons:

1. Users get confused because the mistakenly import 
\{{org.apache.flink.api.common.state.TimeCharacteristic}}.
2. When using the StateTTL feature, users need to spell out the package name 
for \{{org.apache.flink.api.common.state.TimeCharacteristic}} because the other 
class is most likely already imported.

Since \{{org.apache.flink.streaming.api.TimeCharacteristic}} is one of the most 
used classes of the DataStream API, we should make sure that users can use it 
without import problems.
These error are hard to spot and confusing for many users. 

I see two ways to resolve the issue:

1. drop \{{org.apache.flink.api.common.state.TimeCharacteristic}} and use 
\{{org.apache.flink.streaming.api.TimeCharacteristic}} throwing an exception if 
an incorrect characteristic is used.
2. rename the class \{{org.apache.flink.api.common.state.TimeCharacteristic}} 
to some other name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11824) Event-time attribute cannot have same name as in original format

2019-03-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-11824:
-

 Summary: Event-time attribute cannot have same name as in original 
format
 Key: FLINK-11824
 URL: https://issues.apache.org/jira/browse/FLINK-11824
 Project: Flink
  Issue Type: Bug
  Components: API / Table SQL
Affects Versions: 1.7.2, 1.8.0
Reporter: Fabian Hueske


When a table is defined, event-time attributes are typically defined by linking 
them to an existing field in the original format (e.g., CSV, Avro, JSON, ...). 
However, right now, the event-time attribute in the defined table cannot have 
the same name as the original field.

The following table definition fails with an exception:

{code}
// set up execution environment
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = StreamTableEnvironment.create(env)

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

val names: Array[String] = Array("name", "t")
val types: Array[TypeInformation[_]] = Array(Types.STRING, Types.LONG)

tEnv.connect(new Kafka()
.version("universal")
.topic("namesTopic")
.property("zookeeper.connect", "localhost:2181")
.property("bootstrap.servers", "localhost:9092")
.property("group.id", "testGroup"))
  .withFormat(new Csv()
.schema(Types.ROW(names, types)))
  .withSchema(new Schema()
.field("name", Types.STRING)
.field("t", Types.SQL_TIMESTAMP) // changing "t" to "t2" works
  .rowtime(new Rowtime()
.timestampsFromField("t")
.watermarksPeriodicAscending()))
  .inAppendMode()
  .registerTableSource("Names")
{code}

{code}
Exception in thread "main" org.apache.flink.table.api.ValidationException: 
Field 't' could not be resolved by the field mapping.
at 
org.apache.flink.table.sources.TableSourceUtil$.org$apache$flink$table$sources$TableSourceUtil$$resolveInputField(TableSourceUtil.scala:491)
at 
org.apache.flink.table.sources.TableSourceUtil$$anonfun$org$apache$flink$table$sources$TableSourceUtil$$resolveInputFields$1.apply(TableSourceUtil.scala:521)
at 
org.apache.flink.table.sources.TableSourceUtil$$anonfun$org$apache$flink$table$sources$TableSourceUtil$$resolveInputFields$1.apply(TableSourceUtil.scala:521)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at 
org.apache.flink.table.sources.TableSourceUtil$.org$apache$flink$table$sources$TableSourceUtil$$resolveInputFields(TableSourceUtil.scala:521)
at 
org.apache.flink.table.sources.TableSourceUtil$.validateTableSource(TableSourceUtil.scala:127)
at 
org.apache.flink.table.plan.schema.StreamTableSourceTable.(StreamTableSourceTable.scala:33)
at 
org.apache.flink.table.api.StreamTableEnvironment.registerTableSourceInternal(StreamTableEnvironment.scala:150)
at 
org.apache.flink.table.api.TableEnvironment.registerTableSource(TableEnvironment.scala:541)
at 
org.apache.flink.table.descriptors.ConnectTableDescriptor.registerTableSource(ConnectTableDescriptor.scala:47)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11054) Ingest Long value as TIMESTAMP attribute

2018-12-03 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-11054:
-

 Summary: Ingest Long value as TIMESTAMP attribute
 Key: FLINK-11054
 URL: https://issues.apache.org/jira/browse/FLINK-11054
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Fabian Hueske


When ingesting streaming tables, a {{Long}} value that is marked as event-time 
timestamp is automatically converted into a {{TIMESTAMP}} attribute. 
However, batch table scans do not have similar functionality, i.e. to convert a 
{{Long}} during ingestion / table scan into a {{TIMESTAMP}}. This is relevant 
because features like GROUP BY windows require a {{TIMESTAMP}} parameter. 
Hence, batch queries would need to use a UDF (or later built-in function) to 
manually convert a {{Long}} attribute to {{TIMESTAMP}}.

Flink separates the concepts of format schema and table schema.
I propose to automatically convert values that are defined as {{long}} in the 
format schema and as {{TIMESTAMP}} in the table schema (both for streaming and 
batch scans).
Since, the conversion is only done if explicitly requested (right now this 
should yield an error messages), we should not break existing behavior.

What do you think [~twalthr]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10543) Leverage efficient timer deletion in relational operators

2018-10-14 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10543:
-

 Summary: Leverage efficient timer deletion in relational operators
 Key: FLINK-10543
 URL: https://issues.apache.org/jira/browse/FLINK-10543
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Fabian Hueske


FLINK-9423 added support for efficient timer deletions. This feature is 
available since Flink 1.6 and should be used by the relational operator of SQL 
and Table API.

Currently, we use a few workarounds to handle situations when deleting timers 
would be the better solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10528) Remove deprecated APIs from Table API for Flink 1.7.0

2018-10-11 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10528:
-

 Summary: Remove deprecated APIs from Table API for Flink 1.7.0
 Key: FLINK-10528
 URL: https://issues.apache.org/jira/browse/FLINK-10528
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Affects Versions: 1.7.0
Reporter: Fabian Hueske
Assignee: Fabian Hueske


There are a few APIs that have been deprecated for a while (since Flink 1.4.0) 
and that could be removed:
 * {{TableEnvironment.sql()}} as of FLINK-6442
 * {{StreamTableElnvironment.toDataStream()}} and 
{{TableConversions.toDataStream()}} as of FLINK-6543{{}}
 * {{Table.limit()}} as of FLINK-7821



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10488) Add DISTINCT operator for streaming tables that leverages time attributes

2018-10-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10488:
-

 Summary: Add DISTINCT operator for streaming tables that leverages 
time attributes
 Key: FLINK-10488
 URL: https://issues.apache.org/jira/browse/FLINK-10488
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Fabian Hueske


We can add a special DISTINCT operator that leverages time attributes and 
thereby avoids materializing the complete input table.

This operator can also be used to evaluate UNION by rewriting UNION to UNION 
ALL + DISTINCT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10474) Don't translate IN to JOIN with VALUES for streaming queries

2018-10-01 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10474:
-

 Summary: Don't translate IN to JOIN with VALUES for streaming 
queries
 Key: FLINK-10474
 URL: https://issues.apache.org/jira/browse/FLINK-10474
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Affects Versions: 1.6.1, 1.7.0
Reporter: Fabian Hueske


IN clauses are translated to JOIN with VALUES if the number of elements in the 
IN clause exceeds a certain threshold. This should not be done, because a 
streaming join is very heavy and materializes both inputs (which is fine for 
the VALUES) input but not for the other.

There are two ways to solve this:
 # don't translate IN to a JOIN at all
 # translate it to a JOIN but have a special join strategy if one input is 
bound and final (non-updating)

Option 1. should be easy to do, option 2. requires much more effort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10359) Scala example in DataSet docs is broken

2018-09-17 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10359:
-

 Summary: Scala example in DataSet docs is broken
 Key: FLINK-10359
 URL: https://issues.apache.org/jira/browse/FLINK-10359
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.6.0, 1.5.3, 1.7.0
Reporter: Fabian Hueske


The Scala example of 
[https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/batch/dataset_transformations.html#combinable-groupreducefunctions]
 is broken.

The {{asScala}} and the {{reduce}} call fetch the Java {{Iterator}} which may 
only fetched once.
{quote}The Iterable can be iterated over only once. Only the first call to 
'iterator()' will succeed.{quote}

While we are on it, it would make sense to check the other examples as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10263) User-defined function with LITERAL paramters yields CompileException

2018-08-30 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10263:
-

 Summary: User-defined function with LITERAL paramters yields 
CompileException
 Key: FLINK-10263
 URL: https://issues.apache.org/jira/browse/FLINK-10263
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.7.0
Reporter: Fabian Hueske
Assignee: Timo Walther


When using a user-defined scalar function only with literal parameters, a 
{{CompileException}} is thrown. For example

{code}
SELECT myFunc(CAST(40.750444 AS FLOAT), CAST(-73.993475 AS FLOAT))

public class MyFunc extends ScalarFunction {

public int eval(float lon, float lat) {
// do something
}
}
{code}

results in 

{code}
[ERROR] Could not execute SQL statement. Reason:
org.codehaus.commons.compiler.CompileException: Line 5, Column 10: Cannot 
determine simple type name "com"
{code}

The problem is probably caused by the expression reducer because it disappears 
if a regular attribute is added to a parameter expression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10259) Key validation for GroupWindowAggregate is broken

2018-08-30 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10259:
-

 Summary: Key validation for GroupWindowAggregate is broken
 Key: FLINK-10259
 URL: https://issues.apache.org/jira/browse/FLINK-10259
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: Fabian Hueske
 Fix For: 1.6.1, 1.7.0, 1.5.4


WindowGroups have multiple equivalent keys (start, end) that should be handled 
differently from other keys. The {{UpdatingPlanChecker}} uses equivalence 
groups to identify equivalent keys but the keys of WindowGroups are not 
correctly assigned to groups.

This means that we cannot correctly extract keys from queries that use group 
windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10192) SQL Client table visualization mode does not update correctly

2018-08-21 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10192:
-

 Summary: SQL Client table visualization mode does not update 
correctly
 Key: FLINK-10192
 URL: https://issues.apache.org/jira/browse/FLINK-10192
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.6.0
Reporter: Fabian Hueske


The table visualization modes does not seem to update correctly.
When I run a query that groups and aggregates on a few (6) distinct keys, the 
client visualizes some keys multiple times. Also the aggregated values do not 
seem to be correct.
Due to the small number of keys, these get frequently updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10156) Drop the Table.writeToSink() method

2018-08-15 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10156:
-

 Summary: Drop the Table.writeToSink() method
 Key: FLINK-10156
 URL: https://issues.apache.org/jira/browse/FLINK-10156
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Fabian Hueske


I am proposing to drop the {{Table.writeToSink()}} method.
 
*What is the method doing?*
The {{Table.writeToSink(TableSink)}} method emits a {{Table}} via a 
{{TableSink}}, for example to a Kafka topic, a file, or a database.
 
*Why should it be removed?*
The {{writeToSink()}} method was introduced before the Table API supported the 
{{Table.insertInto(String)}} method. The {{insertInto()}} method writes a table 
into a table that was previously registered with a {{TableSink}} in the 
catalog. It is the inverse method to the {{scan()}} method and the equivalent 
to an {{INSERT INTO ... SELECT}} SQL query.
 
I think we should remove {{writeToSink()}} for the following reasons:
1. It offers the same functionality as {{insertInto()}}. Removing it would 
reduce duplicated API.
2. {{writeToSink()}} requires a {{TableSink}} instance. I think TableSinks (and 
TableSources) should only be registered with the {{TableEnvironment}} and not 
be exposed to the "query part" of the Table API / SQL.
3. Registering tables in a catalog and using them for input and output is more 
aligned with SQL.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10153) Add tutorial section to documentation

2018-08-15 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10153:
-

 Summary: Add tutorial section to documentation
 Key: FLINK-10153
 URL: https://issues.apache.org/jira/browse/FLINK-10153
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Fabian Hueske
Assignee: Fabian Hueske


The current documentation does not feature a dedicated tutorials section and 
has a few issues that should be fix in order to help our (future) users getting 
started with Flink.

I propose to add a single "Tutorials" section to the documentation where users 
find step-by-step guides. The tutorials section help users with different goals:

  * Get a quick idea of the overall system
  * Implement a DataStream/DataSet/Table API/SQL job
  * Set up Flink on a local machine (or run a Docker container)

There are already a few guides to get started but they are located at different 
places and should be moved into the Tutorials section. Moreover, some sections 
such as "Project Setup" contain content that addresses users with very 
different intentions.

I propose to
* add a new Tutorials section and move all existing tutorials there (and later 
add new ones).
* move the "Quickstart" section to "Tutorials".
* remove the "Project Setup" section and move the pages to other sections (some 
pages will be split up or adjusted).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10100) Optimizer pushes partitioning past Null-Filter

2018-08-08 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10100:
-

 Summary: Optimizer pushes partitioning past Null-Filter
 Key: FLINK-10100
 URL: https://issues.apache.org/jira/browse/FLINK-10100
 Project: Flink
  Issue Type: Bug
  Components: DataSet API, Optimizer
Affects Versions: 1.5.2, 1.4.2, 1.3.3, 1.6.0, 1.7.0
Reporter: Fabian Hueske


The DataSet optimizer pushes certain operations like partitioning or sorting 
past Filter operators.
It does that because it knows that a {{FilterFunction}} cannot modify the 
records but only indicate whether a record should be forwarded or not. 

However, this causes problems if the filter should remove records with null 
keys. In that case, the partitioning can be pushed past the filter such that 
the partitioner has to deal with null keys. This can fail with a 
{{NullPointerException}}.

The following code produces an affected plan.

{code}
List rowList = new ArrayList<>();
rowList.add(Row.of(null, 1L));
rowList.add(Row.of(2L, 2L));
rowList.add(Row.of(2L, 2L));
rowList.add(Row.of(3L, 3L));
rowList.add(Row.of(null, 3L));

DataSet rows = env.fromCollection(rowList, Types.ROW(Types.LONG, 
Types.LONG));

DataSet result = rows
  .filter(r -> r.getField(0) != null)
.setParallelism(4)
  .groupBy(0)
  .reduceGroup((Iterable vals, Collector out) -> {
  long cnt = 0L;
  for(Row v : vals) { cnt++; }
out.collect(cnt);
  }).returns(Types.LONG)
.setParallelism(4);

result.output(new DiscardingOutputFormat());
System.out.println(env.getExecutionPlan());
{code}

To resolve the problem, we could remove the field-forward property of 
{{FilterFunction}}. In general, it is typically more efficient to filter before 
shipping or sorting data. So this might also improve the performance of certain 
plans.

As a *workaround* until this bug is fix, users can implement the filter with a 
{{FlatMapFunction}}. {{FlatMapFunction}} is a more generic interface and the 
optimizer cannot automatically infer how the function behaves and won't push 
partitionings or sorts past the function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10007) Security vulnerability in website build infrastructure

2018-07-31 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-10007:
-

 Summary: Security vulnerability in website build infrastructure
 Key: FLINK-10007
 URL: https://issues.apache.org/jira/browse/FLINK-10007
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Fabian Hueske


We've got a notification from Apache INFRA about a potential security 
vulnerability:

{quote}
We found a potential security vulnerability in a repository for which you have 
been granted security alert access.
@apache apache/flink-web
Known high severity security vulnerability detected in yajl-ruby < 1.3.1 
defined in Gemfile.
Gemfile update suggested: yajl-ruby ~> 1.3.1. 
{quote}

This is a problem with the build environment of the website, i.e., this 
dependency is not distributed or executed with Flink but only run when the 
website is updated.

Nonetheless, we should of course update the dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9673) Improve State efficiency of bounded OVER window operators

2018-06-27 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9673:


 Summary: Improve State efficiency of bounded OVER window operators
 Key: FLINK-9673
 URL: https://issues.apache.org/jira/browse/FLINK-9673
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Fabian Hueske


Currently, the implementations of bounded OVER window aggregations store the 
complete input for the bound interval. For example for the query:

{code}
SELECT user_id, count(action) OVER (PARTITION BY user_id ORDER BY rowtime RANGE 
INTERVAL '14' DAY PRECEDING) action_count, rowtime
FROM 
SELECT rowtime, user_id, action, val1, val2, val3, val4 FROM user
{code}

The whole records with schema {{(rowtime, user_id, action, val1, val2, val3, 
val4)}} are stored for 14 days in order to retract them after 14 days from the 
accumulators.

However, it would be sufficient to only store those fields that are required 
for the aggregtions, i.e., {{action}} in the example above. All other fields 
could be set to {{null}} and hence significantly reduce the amount of data that 
needs to be stored in state.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9541) Add robots.txt and sitemap.xml to Flink website

2018-06-06 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9541:


 Summary: Add robots.txt and sitemap.xml to Flink website
 Key: FLINK-9541
 URL: https://issues.apache.org/jira/browse/FLINK-9541
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Fabian Hueske


>From the [dev mailing 
>list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]:

{quote}
It would help to add a sitemap (and the robots.txt required to reference it) 
for flink.apache.org and ci.apache.org (for /projects/flink)

You can see what Tomcat did along these lines - 
http://tomcat.apache.org/robots.txt references 
http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing to 
http://tomcat.apache.org/sitemap-main.xml

By doing this, you can emphasize more recent versions of docs. There are other 
benefits, but reducing poor Google search results (to me) is the biggest win.

E.g.  https://www.google.com/search?q=flink+reducingstate 
 (search on flink reducing 
state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) Javadocs (hit 
#2), and then many pages of other results.

Whereas the Javadocs for 1.5 

 and 1.4 

 are nowhere to be found.
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9528) Incorrect results: Filter does not treat Upsert messages correctly.

2018-06-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9528:


 Summary: Incorrect results: Filter does not treat Upsert messages 
correctly.
 Key: FLINK-9528
 URL: https://issues.apache.org/jira/browse/FLINK-9528
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.4.2, 1.5.0, 1.3.3
Reporter: Fabian Hueske


Currently, Filters (i.e., Calcs with predicates) do not distinguish between 
retraction and upsert mode. A Calc looks at record (regardless of its update 
semantics) and either discard it (predicate evaluates to false) or pass it on 
(predicate evaluates to true).

This works fine for messages with retraction semantics but is not correct for 
upsert messages.

The following test case (can be pasted into {{TableSinkITCase}}) shows the 
problem:
{code:java}
  @Test
  def testUpsertsWithFilter(): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.getConfig.enableObjectReuse()
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val tEnv = TableEnvironment.getTableEnvironment(env)

val t = StreamTestData.get3TupleDataStream(env)
  .assignAscendingTimestamps(_._1.toLong)
  .toTable(tEnv, 'id, 'num, 'text)

t.select('text.charLength() as 'len)
  .groupBy('len)
  .select('len, 'len.count as 'cnt)
  // .where('cnt < 7)
  .writeToSink(new TestUpsertSink(Array("len"), false))

env.execute()
val results = RowCollector.getAndClearValues

val retracted = RowCollector.upsertResults(results, Array(0)).sorted
val expectedWithoutFilter = List(
  "2,1", "5,1", "9,9", "10,7", "11,1", "14,1", "25,1").sorted
val expectedWithFilter = List(
"2,1", "5,1", "11,1", "14,1", "25,1").sorted

assertEquals(expectedWithoutFilter, retracted)
// assertEquals(expectedWithFilter, retracted)
  }
{code}
When we add a filter on the aggregation result, we would expect that all rows 
that do not fulfill the condition are removed from the result. However, the 
filter only removes the upsert message such that the previous version remains 
in the result.

One solution could be to make a filter aware of the update semantics (retract 
or upsert) and convert the upsert message into a delete message if the 
predicate evaluates to false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9522) Rework Flink website

2018-06-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9522:


 Summary: Rework Flink website
 Key: FLINK-9522
 URL: https://issues.apache.org/jira/browse/FLINK-9522
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Fabian Hueske


Flink's website (flink.apache.org) represents and informs about the Flink 
system and the Flink community.

I would like to propose to rework some parts of the website to improve its 
structure, provide more valuable information about Flink, and present new 
features.

In particular, I propose to:
 * Restructure the menu into three sections:
 ## *About Flink*: Explains what Flink is and which use cases it addresses 
(What is it? Does it solve my problem? Is it working / who uses it?)
 ## *Users*: helps users of Flink (Download, Docs, How to get help)
 ## *Contributors*: Community, How to contribute, Github
 * Rework the start page: add an overview of the feature, add a new figure, 
move powered-by users above blog
 * Add a detailed description about Flink discussing architectures, 
applications, and operations. This will replacing the out-dated Features page.
 * Rework the Use Cases page
 * Addi a Getting Help page pointing to mailing list, stack overflow and common 
error messages (moved from FAQ)
 * Remove the information about the inactive IRC channel

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9422) Dedicated operator for UNION on streaming tables with time attributes

2018-05-23 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9422:


 Summary: Dedicated operator for UNION on streaming tables with 
time attributes
 Key: FLINK-9422
 URL: https://issues.apache.org/jira/browse/FLINK-9422
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Fabian Hueske


We can implement a dedicated operator for a {{UNION}} operator on tables with 
time attributes. Currently, {{UNION}} is translated into a {{UNION ALL}} and a 
subsequent {{GROUP BY}} on all attributes without aggregation functions. The 
state of the grouping operator is only clean up using state retention timers. 

The dedicated operator would leverage the monotonicity property of the time 
attribute and watermarks to automatically clean up its state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9419) UNION should not be treated as retraction producing operator

2018-05-23 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9419:


 Summary: UNION should not be treated as retraction producing 
operator
 Key: FLINK-9419
 URL: https://issues.apache.org/jira/browse/FLINK-9419
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: Fabian Hueske


The following query fails 

{code}
SELECT
user_id,
count(msg),
HOP_END(rowtime, INTERVAL '1' second, INTERVAL '1' minute)
FROM (SELECT rowtime, user_id, action_name AS msg FROM
  event_client_action
WHERE /* various clauses */
UNION SELECT rowtime, user_id, action_type AS msg FROM
   event_server_action
   WHERE /* various clauses */
  )
GROUP BY
HOP(rowtime, INTERVAL '1' second, INTERVAL '1' minute), user_id
{code}

with 

{quote}Retraction on windowed GroupBy aggregation is not supported yet. Note: 
Windowed GroupBy aggregation should not follow a non-windowed GroupBy 
aggregation.{quote}

The problem is that the {{UNION}} operator is translated into a {{UNION ALL}} 
and a subsequent {{GROUP BY}} on all attributes without an aggregation 
function. Currently, all {{GROUP BY}} operators are treated as 
retraction-producing operators. However, this is only true for grouping 
operators with aggregation functions. If the operator groups on all attributes 
and has no aggregation functions, it does not produce retractions but only 
forwards them (similar to a filter operator).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9309) Recommend HA setup on Production Readiness Checklist

2018-05-07 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9309:


 Summary: Recommend HA setup on Production Readiness Checklist
 Key: FLINK-9309
 URL: https://issues.apache.org/jira/browse/FLINK-9309
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Fabian Hueske


It would be good to recommend the [HA 
setup|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/jobmanager_high_availability.html]
 on the [Production Readiness 
Checklist|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/production_ready.html]
 to ensure that users are aware of this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9296) Support distinct aggregation on non-windowed grouped streaming tables

2018-05-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9296:


 Summary: Support distinct aggregation on non-windowed grouped 
streaming tables
 Key: FLINK-9296
 URL: https://issues.apache.org/jira/browse/FLINK-9296
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Reporter: Fabian Hueske






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9289) Parallelism of generated operators should have max parallism of input

2018-05-02 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9289:


 Summary: Parallelism of generated operators should have max 
parallism of input
 Key: FLINK-9289
 URL: https://issues.apache.org/jira/browse/FLINK-9289
 Project: Flink
  Issue Type: Bug
  Components: DataSet API
Affects Versions: 1.4.2, 1.5.0, 1.6.0
Reporter: Fabian Hueske


The DataSet API aims to chain generated operators such as key extraction 
mappers to their predecessor. This is done by assigning the same parallelism as 
the input operator.

If a generated operator has more than two inputs, the operator cannot be 
chained anymore and the operator is generated with default parallelism. This 
can lead to a {code}NoResourceAvailableException: Not enough free slots 
available to run the job.{code} as reported by a user on the mailing list: 
https://lists.apache.org/thread.html/60a8bffcce54717b6273bf3de0f43f1940fbb711590f4b90cd666c9a@%3Cuser.flink.apache.org%3E

I suggest to set the parallelism of a generated operator to the max parallelism 
of all of its inputs to fix this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9135) Remove AggregateReduceFunctionsRule once CALCITE-2216 is fixed

2018-04-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9135:


 Summary: Remove AggregateReduceFunctionsRule once CALCITE-2216 is 
fixed
 Key: FLINK-9135
 URL: https://issues.apache.org/jira/browse/FLINK-9135
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Affects Versions: 1.5.0, 1.6.0
Reporter: Fabian Hueske


We had to copy and slightly modify {{AggregateReduceFunctionsRule}} from 
Calcite to fix FLINK-8903.

We proposed the changes to Calcite as CALCITE-2216. Once this issue is fixed 
and we updated to Calcite dependency to a version that includes the fix, we can 
remove our custom rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9130) Add cancel-job option to SavepointHandlers JavaDoc and regenerate REST docs

2018-04-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9130:


 Summary: Add cancel-job option to SavepointHandlers JavaDoc and 
regenerate REST docs
 Key: FLINK-9130
 URL: https://issues.apache.org/jira/browse/FLINK-9130
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.5.0, 1.6.0
Reporter: Fabian Hueske


The Savepoint REST documentation is missing the {{cancel-job}} option.

See discussion on ML here: 
https://lists.apache.org/thread.html/dc05751fa6507388dcefc0c845facef6b36b086e256b52c3e37d71dc@%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9067) End-to-end test: Stream SQL query with failure and retry

2018-03-23 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9067:


 Summary: End-to-end test: Stream SQL query with failure and retry
 Key: FLINK-9067
 URL: https://issues.apache.org/jira/browse/FLINK-9067
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL, Tests
Affects Versions: 1.5.0
Reporter: Fabian Hueske
Assignee: Fabian Hueske


Implement a test job that runs a streaming SQL query with a temporary failure 
and recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9056) Job submission fails with AskTimeoutException if not enough slots are available

2018-03-22 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9056:


 Summary: Job submission fails with AskTimeoutException if not 
enough slots are available
 Key: FLINK-9056
 URL: https://issues.apache.org/jira/browse/FLINK-9056
 Project: Flink
  Issue Type: Improvement
  Components: Job-Submission
Affects Versions: 1.5.0
 Environment: * FLIP-6 enabled
 * Local Flink instance with fixed number of TMs
 * Job parallelism exceeds available slots
Reporter: Fabian Hueske


The error message if a job submission fails due to lack of available slots is 
not helpful:
{code:java}
Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/8f0fabba-4021-45b6-a1f7-b8afd6627640#-574617182|#-574617182]]
 after [30 ms]. Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.LocalRpcInvocation".
     at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
     at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
     at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
     at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
     at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
     at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
     at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
     at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
     at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
     at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9055) WebUI shows job as Running although not enough resources are available

2018-03-22 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9055:


 Summary: WebUI shows job as Running although not enough resources 
are available
 Key: FLINK-9055
 URL: https://issues.apache.org/jira/browse/FLINK-9055
 Project: Flink
  Issue Type: Bug
  Components: JobManager, Webfrontend
Affects Versions: 1.5.0
 Environment: * FLIP-6 enabled
 * Local Flink instance with fixed number of TMs
 * Job parallelism exceeds available slots
Reporter: Fabian Hueske


The WebUI shows a (batch) job as "Running" although not enough resources have 
been allocated to actually run the job with the requested parallelism.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9031) DataSet Job result changes when adding rebalance after union

2018-03-20 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-9031:


 Summary: DataSet Job result changes when adding rebalance after 
union
 Key: FLINK-9031
 URL: https://issues.apache.org/jira/browse/FLINK-9031
 Project: Flink
  Issue Type: Bug
  Components: DataSet API, Local Runtime, Optimizer
Affects Versions: 1.3.1
Reporter: Fabian Hueske
 Attachments: Person.java, RunAll.java, newplan.txt, oldplan.txt

A user [reported this issue on the user mailing 
list|https://lists.apache.org/thread.html/075f1a487b044079b5d61f199439cb77dd4174bd425bcb3327ed7dfc@%3Cuser.flink.apache.org%3E].
{quote}I am using Flink 1.3.1 and I have found a strange behavior on running 
the following logic:
 # Read data from file and store into DataSet
 # Split dataset in two, by checking if "field1" of POJOs is empty or not, so 
that the first dataset contains only elements with non empty "field1", and the 
second dataset will contain the other elements.
 # Each dataset is then grouped by, one by "field1" and other by another field, 
and subsequently reduced.
 # The 2 datasets are merged together by union.
 # The final dataset is written as json.

What I was expected, from output, was to find only one element with a specific 
value of "field1" because:
 # Reducing the first dataset grouped by "field1" should generate only one 
element with a specific value of "field1".
 # The second dataset should contain only elements with empty "field1".
 # Making an union of them should not duplicate any record.

This does not happen. When i read the generated jsons i see some duplicate (non 
empty) values of "field1".
 Strangely this does not happen when the union between the two datasets is not 
computed. In this case the first dataset produces elements only with distinct 
values of "field1", while second dataset produces only records with empty field 
"value1".
{quote}
The user has not enable object reuse.

Later he reports that the problem disappears when he injects a rebalance() 
after a union resolves the problem. I had a look at the execution plans for 
both cases (attached to this issue) but could not identify a problem.

Hence I assume, this might be an issue with the runtime code but we need to 
look deeper into this. The user also provided an example program consisting of 
two classes which are attached to the issue as well.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8952) Support setting the parallelism of individual operators of a query

2018-03-15 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8952:


 Summary: Support setting the parallelism of individual operators 
of a query
 Key: FLINK-8952
 URL: https://issues.apache.org/jira/browse/FLINK-8952
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Fabian Hueske


Right now it is not possible to set the parallelism of individual operators 
that were generated for a SQL or Table API query.

We could expose the optimized plan before it is translated to a {{DataStream}} 
or {{DataSet}} program to annotate operators with parallelism.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8951) Support OVER windows PARTITION BY (rounded) timestamp

2018-03-15 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8951:


 Summary: Support OVER windows PARTITION BY (rounded) timestamp
 Key: FLINK-8951
 URL: https://issues.apache.org/jira/browse/FLINK-8951
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Fabian Hueske


There are a few interesting use cases that can be addressed by queries that 
follow the following pattern

{code:sql}
SELECT sensorId COUNT(*) OVER (PARTITION BY CEIL(rowtime TO HOUR) ORDER BY temp 
ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) FROM sensors
{code}

Such queries can be used to compute rolling cascading (tumbling) windows with 
aggregates that are reset in regular intervals. This can be useful for TOP-K 
per minute/hour/day queries.

Right now, such {{OVER}} windows are not supported, because we require that the 
{{ORDER BY}} clause is defined on a timestamp (time indicator) attribute. In 
order to support this kind of queries, we would require that the {{PARTITION 
BY}} clause contains a timestamp (time indicator) attribute or a function that 
is defined on it and which is monotonicity preserving. Once the optimizer 
identifies this case, it could translate the query into a special 
time-partitioned OVER window operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8950) "Materialize" Tables to avoid recomputation.

2018-03-15 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8950:


 Summary: "Materialize" Tables to avoid recomputation.
 Key: FLINK-8950
 URL: https://issues.apache.org/jira/browse/FLINK-8950
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske


Currently, {{Table}} objects of the Table API / SQL are treated like virtual 
views, i.e., all relational operators that have been applied on them are 
recorded and translated when a {{Table}} is emitted to a {{TableSink}} or 
converted into a {{DataSet}} or {{DataStream}}.

In case a {{Table}} is accessed twice, the (sub-)query that it represents is 
translated twice into a {{DataSet}} or {{DataStream}} program and hence also 
executed twice which is inefficient. Currently, the only way to avoid this is 
to convert the {{Table}} into a {{DataSet}} or {{DataStream}}, which will cause 
the optimizer to generate a plan and register it back as a new {{Table}}.

We should offer a method to internally "materialize" a {{Table}} object, i.e., 
to optimize, generate a plan, and register the plan as an internal table. All 
queries / operations that are evaluated on the materialized {{Table}} will 
start from the same {{DataSet}} or {{DataStream}} such that it is not computed 
multiple times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8855) SQL client result serving gets stuck in result-mode=table

2018-03-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8855:


 Summary: SQL client result serving gets stuck in result-mode=table
 Key: FLINK-8855
 URL: https://issues.apache.org/jira/browse/FLINK-8855
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske
 Fix For: 1.5.0


The result serving of a query in {{result-mode=table}} get stuck after some 
time when serving an updating result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8854) Mapping of SchemaValidator.deriveFieldMapping() is incorrect.

2018-03-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8854:


 Summary: Mapping of SchemaValidator.deriveFieldMapping() is 
incorrect.
 Key: FLINK-8854
 URL: https://issues.apache.org/jira/browse/FLINK-8854
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske
 Fix For: 1.5.0


The field mapping returned by {{SchemaValidator.deriveFieldMapping()}} is not 
correct.

It should not only include all fields of the table schema, but also all fields 
of the format schema (mapped to themselves). Otherwise, it is not possible to 
use a timestamp extractor on a field that is not in table schema. 

For example this configuration would fail:

{code}
sources:
  - name: TaxiRides
schema:
  - name: rideId
type: LONG
  - name: rowTime
type: TIMESTAMP
rowtime:
  timestamps:
type: "from-field"
from: "rideTime"
  watermarks:
type: "periodic-bounded"
delay: "6"
connector:
  
format:
  property-version: 1
  type: json
  schema: "ROW(rideId LONG, rideTime TIMESTAMP)"
{code}

because {{rideTime}} is not in the table schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8853) SQL Client cannot emit query results that contain a rowtime attribute

2018-03-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8853:


 Summary: SQL Client cannot emit query results that contain a 
rowtime attribute
 Key: FLINK-8853
 URL: https://issues.apache.org/jira/browse/FLINK-8853
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske
 Fix For: 1.5.0


Emitting a query result that contains a rowtime attribute fails with the 
following exception:
{code:java}
Caused by: java.lang.ClassCastException: java.sql.Timestamp cannot be cast to 
java.lang.Long
    at 
org.apache.flink.api.common.typeutils.base.LongSerializer.serialize(LongSerializer.java:27)
    at 
org.apache.flink.api.java.typeutils.runtime.RowSerializer.serialize(RowSerializer.java:160)
    at 
org.apache.flink.api.java.typeutils.runtime.RowSerializer.serialize(RowSerializer.java:46)
    at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:125)
    at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:30)
    at 
org.apache.flink.streaming.experimental.CollectSink.invoke(CollectSink.java:66)
    ... 44 more{code}
The problem is cause by the {{ResultStore}} which configures the 
{{CollectionSink}} with the field types obtained from the {{TableSchema}}. The 
type of the rowtime field is a {{TimeIndicatorType}} which is serialized as 
Long. However, in the query result it is represented as Timestamp. Hence, the 
type must be replaced by a {{SqlTimeTypeInfo}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8852) SQL Client does not work with new FLIP-6 mode

2018-03-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8852:


 Summary: SQL Client does not work with new FLIP-6 mode
 Key: FLINK-8852
 URL: https://issues.apache.org/jira/browse/FLINK-8852
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske
 Fix For: 1.5.0


The SQL client does not submit queries to local Flink cluster that runs in 
FLIP-6 mode. It doesn't throw an exception either.

Job submission works if the legacy Flink cluster mode is used (`mode: old`)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8851) SQL Client fails if same file is used as default and env configuration

2018-03-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8851:


 Summary: SQL Client fails if same file is used as default and env 
configuration
 Key: FLINK-8851
 URL: https://issues.apache.org/jira/browse/FLINK-8851
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske
 Fix For: 1.5.0


Specifying the same file as default and environment configuration yields the 
following exception
{code:java}
Exception in thread "main" org.apache.flink.table.client.SqlClientException: 
Unexpected exception. This is a bug. Please consider filing an issue.
    at org.apache.flink.table.client.SqlClient.main(SqlClient.java:156)
Caused by: java.lang.UnsupportedOperationException
    at java.util.AbstractMap.put(AbstractMap.java:209)
    at java.util.AbstractMap.putAll(AbstractMap.java:281)
    at 
org.apache.flink.table.client.config.Environment.merge(Environment.java:107)
    at 
org.apache.flink.table.client.gateway.local.LocalExecutor.createEnvironment(LocalExecutor.java:461)
    at 
org.apache.flink.table.client.gateway.local.LocalExecutor.listTables(LocalExecutor.java:203)
    at 
org.apache.flink.table.client.cli.CliClient.callShowTables(CliClient.java:270)
    at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:198)
    at org.apache.flink.table.client.SqlClient.start(SqlClient.java:97)
    at org.apache.flink.table.client.SqlClient.main(SqlClient.java:146){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8850) SQL Client does not support Event-time

2018-03-04 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8850:


 Summary: SQL Client does not support Event-time
 Key: FLINK-8850
 URL: https://issues.apache.org/jira/browse/FLINK-8850
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske
 Fix For: 1.5.0


The SQL client fails with an exception if a table includes a rowtime attribute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8820) FlinkKafkaConsumer010 reads too many bytes

2018-03-01 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8820:


 Summary: FlinkKafkaConsumer010 reads too many bytes
 Key: FLINK-8820
 URL: https://issues.apache.org/jira/browse/FLINK-8820
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Affects Versions: 1.4.0
Reporter: Fabian Hueske


A user reported that the FlinkKafkaConsumer010 very rarely consumes too many 
bytes, i.e., the returned message is too large. The application is running for 
about a year and the problem started to occur after upgrading to Flink 1.4.0.

The user made a good effort in debugging the problem but was not able to 
reproduce it in a controlled environment. It seems that the data is correctly 
stored in Kafka.

Here's the thread on the thread on the user mailing list for a detailed 
description of the problem and analysis so far: 
https://lists.apache.org/thread.html/1d62f616d275e9e23a5215ddf7f5466051be7ea96897d827232fcb4e@%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8652) Reduce log level of QueryableStateClient.getKvState() to DEBUG

2018-02-14 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8652:


 Summary: Reduce log level of QueryableStateClient.getKvState() to 
DEBUG
 Key: FLINK-8652
 URL: https://issues.apache.org/jira/browse/FLINK-8652
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.5.0
Reporter: Fabian Hueske
 Fix For: 1.5.0


The {{QueryableStateClient}} logs each state request on {{INFO}} log level. 
This results in very verbose logging in most applications that use queryable 
state.

I propose to reduce the log level to {{DEBUG}}.

What do you think [~kkl0u], [~aljoscha]?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8651) Add support for different event-time OVER windows in a query

2018-02-14 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8651:


 Summary: Add support for different event-time OVER windows in a 
query
 Key: FLINK-8651
 URL: https://issues.apache.org/jira/browse/FLINK-8651
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske


Right now, Table API and SQL queries only support multiple OVER window 
aggregations, but all OVER windows must be of the same type.

For example the following query is currently supported:
{code:java}
SELECT c, b, 
  COUNT(a) OVER (PARTITION BY c ORDER BY rowtime RANGE BETWEEN INTERVAL '1' 
SECOND PRECEDING AND CURRENT ROW),
  SUM(a) OVER (PARTITION BY c ORDER BY rowtime RANGE BETWEEN INTERVAL '1' 
SECOND PRECEDING AND CURRENT ROW)
FROM T1
{code}
If we would change the interval or partitioning attribute of the {{SUM(a)}} 
aggregation's window, the query could not be executed.

We can add support for multiple different windows by splitting the query and 
joining it back.
 This would require an optimizer rule, that rewrites a plan from
{code:java}
IN -> OverAgg(window-A, window-B) -> OUT
{code}
to
{code:java}
 /-OverAgg(window-A)-\
IN -> Calc(uniq-id)-< >-WindowJoin(uniq-id, rowtime) -> OUT 
   
 \-OverAgg(window-B)-/
{code}

The unique id should consist of three components: the timestamp, the parallel 
index of the function instance, and a counter that just wraps around. One of 
the aggregates can be projected to only the required fields and the window join 
would join on uniq-id and timestamp equality (when we support FOLLOWING 
boundaries, we would have to join on a time range).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8566) Replace retract/insert of same record for state retention timer resets

2018-02-06 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8566:


 Summary: Replace retract/insert of same record for state retention 
timer resets
 Key: FLINK-8566
 URL: https://issues.apache.org/jira/browse/FLINK-8566
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske


Currently a simple query like {{SELECT DISTINCT a, b, c FROM tableX}} is 
translated into a plan that generates a retraction stream. However, one would 
assume that an append stream should be possible as well. In fact, the plan 
doesn't produce actual updates.

Internally, the {{DISTINCT}} is translated into a {{GROUP BY}} with all 
distinct fields being keys and no aggregation functions. The corresponding 
operator produces updates, because aggregation function might update their 
results as new records are received. So we could just implement a dedicated 
operator for {{DISTINCT}}. However, this would not work if a user configures a 
state retention time. In this case, we emit retraction/insert messages for the 
same (distinct) record whenever a new row is received to reset the state 
clean-up timers of the downstream operators. 

One way to solve this issue to implement a dedicated mechanism to update state 
clean-up timers for unchanged records instead of sending out retraction/insert 
messages with identical records. This mechanism would just be used to reset the 
timers and could also be used for append streams. For example, we could replace 
the boolean flag in CRow with a byte that can take more than two values. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8489) Data is not emitted by second ElasticSearch connector

2018-01-23 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8489:


 Summary: Data is not emitted by second ElasticSearch connector
 Key: FLINK-8489
 URL: https://issues.apache.org/jira/browse/FLINK-8489
 Project: Flink
  Issue Type: Bug
  Components: ElasticSearch Connector
Affects Versions: 1.4.0
Reporter: Fabian Hueske


A user reported [this 
issue|https://lists.apache.org/thread.html/e91c71beb45d6df879effa16c52f2c71aa6ef1a54ef0a8ac4ccc72ee@%3Cuser.flink.apache.org%3E]
 on the user@f.a.o mailing list.

*Setup:*
 * A program with two pipelines that write to ElasticSearch. The pipelines can 
be connected or completely separate.
 * ElasticSearch 5.6.4, connector {{flink-connector-elasticsearch5_2.11}}

*Problem:*
 Only one of the ES connectors correctly emits data. The other connector writes 
a single record and then stops emitting data (or does not write any data at 
all). The problem does not exist, if the second ES connector is replaced by a 
different connector (for example Cassandra).

Below is a program to reproduce the issue:
{code:java}
public class ElasticSearchTest1 {

public static void main(String[] args) throws Exception {

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

// set elasticsearch connection details 
Map config = new HashMap<>();
config.put("bulk.flush.max.actions", "1");
config.put("cluster.name", "");
List transports = new ArrayList<>(); 
transports.add(new 
InetSocketAddress(InetAddress.getByName(""), 9300));

//Set properties for Kafka Streaming
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", ""+":9092");
properties.setProperty("group.id", "testGroup");
properties.setProperty("auto.offset.reset", "latest");  

//Create consumer for log records

FlinkKafkaConsumer011 inputConsumer1 = new 
FlinkKafkaConsumer011<>("elastic_test1", new JSONDeserializationSchema(), 
properties);

DataStream firstStream = env
.addSource(inputConsumer1)
.flatMap(new CreateRecordOne());

firstStream 
.addSink(new ElasticsearchSink(config, 
transports, 
new 
ElasticSearchOutputRecord("elastic_test_index1","elastic_test_index1")));

FlinkKafkaConsumer011 inputConsumer2 = new 
FlinkKafkaConsumer011<>("elastic_test2", new JSONDeserializationSchema(), 
properties);

DataStream secondStream = env
.addSource(inputConsumer2)  
.flatMap(new CreateRecordTwo());

secondStream
.addSink(new ElasticsearchSink(config, 
transports, 
new 
ElasticSearchOutputRecord2("elastic_test_index2","elastic_test_index2")));

env.execute("Elastic Search Test");
}
}

public class ElasticSearchOutputRecord implements 
ElasticsearchSinkFunction {

String index;
String type;
// Initialize filter function
public ElasticSearchOutputRecord(String index, String type) {
this.index = index;
this.type = type;
}
// construct index request
@Override
public void process(
RecordOne record,
RuntimeContext ctx,
RequestIndexer indexer) {

// construct JSON document to index
Map json = new HashMap<>();

json.put("item_one", record.item1);  
json.put("item_two", record.item2);  

IndexRequest rqst = Requests.indexRequest()
.index(index)   // index name
.type(type) // mapping name
.source(json);

indexer.add(rqst);
}
}

public class ElasticSearchOutputRecord2 implements 
ElasticsearchSinkFunction {

String index;
String type;
// Initialize filter function
public ElasticSearchOutputRecord2(String index, String type) {
this.index = index;
this.type = type;
}
// construct index request
@Override
public void process

[jira] [Created] (FLINK-8487) State loss after multiple restart attempts

2018-01-23 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8487:


 Summary: State loss after multiple restart attempts
 Key: FLINK-8487
 URL: https://issues.apache.org/jira/browse/FLINK-8487
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.3.2
Reporter: Fabian Hueske
 Fix For: 1.5.0, 1.4.1


A user [reported this 
issue|https://lists.apache.org/thread.html/9dc9b719cf8449067ad01114fedb75d1beac7b4dff171acdcc24903d@%3Cuser.flink.apache.org%3E]
 on the user@f.a.o mailing list and analyzed the situation.

Scenario:
- A program that reads from Kafka and computes counts in a keyed 15 minute 
tumbling window.  StateBackend is RocksDB and checkpointing is enabled.

{code}
keyBy(0)
.timeWindow(Time.of(window_size, TimeUnit.MINUTES))
.allowedLateness(Time.of(late_by, TimeUnit.SECONDS))
.reduce(new ReduceFunction(), new WindowFunction())
{code}

- At some point HDFS went into a safe mode due to NameNode issues
- The following exception was thrown

{code}
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category WRITE is not supported in state standby. Visit 
https://s.apache.org/sbnn-error
..

at 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.mkdirs(HadoopFileSystem.java:453)
at 
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.mkdirs(SafetyNetWrapperFileSystem.java:111)
at 
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory.createBasePath(FsCheckpointStreamFactory.java:132)
{code}

- The pipeline came back after a few restarts and checkpoint failures, after 
the HDFS issues were resolved.

- It was evident that operator state was lost. Either it was the Kafka consumer 
that kept on advancing it's offset between a start and the next checkpoint 
failure (a minute's worth) or the the operator that had partial aggregates was 
lost. 

The user did some in-depth analysis (see [mail 
thread|https://lists.apache.org/thread.html/9dc9b719cf8449067ad01114fedb75d1beac7b4dff171acdcc24903d@%3Cuser.flink.apache.org%3E])
 and might have (according to [~aljoscha]) identified the problem.

[~stefanrichte...@gmail.com], can you have a look at this issue and check if it 
is relevant?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8474) Add documentation for HBaseTableSource

2018-01-22 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8474:


 Summary: Add documentation for HBaseTableSource
 Key: FLINK-8474
 URL: https://issues.apache.org/jira/browse/FLINK-8474
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Table API & SQL
Affects Versions: 1.4.0, 1.3.0, 1.5.0
Reporter: Fabian Hueske


The {{HBaseTableSource}} is not documented in the [Table Source and Sinks 
documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/sourceSinks.html].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8457) Documentation for Building from Source is broken

2018-01-19 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8457:


 Summary: Documentation for Building from Source is broken
 Key: FLINK-8457
 URL: https://issues.apache.org/jira/browse/FLINK-8457
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.4.0, 1.3.0
 Environment: The documentation for how to build Flink from source is 
broken for all released versions.

It only explains how to build the latest master branch which is only correct 
for the docs of the latest master but not for the docs of a release version. 
For example the [build docs for Flink 
1.4|https://ci.apache.org/projects/flink/flink-docs-release-1.4/start/building.html]
 say

{quote}This page covers how to build Flink 1.4.0 from sources.
{quote}

but explain how to build the {{master}} branch. 

I think we should rewrite this page to explain how to build specific versions 
and also explain how to build the SNAPSHOT branches of released versions (for 
example {{release-1.4}}, the latest dev branch for Flink 1.4 with all merged 
bug fix).

I guess the same holds for Flink 1.3 as well.
Reporter: Fabian Hueske






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8433) Update code example for "Managed Operator State" documentation

2018-01-14 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8433:


 Summary: Update code example for "Managed Operator State" 
documentation
 Key: FLINK-8433
 URL: https://issues.apache.org/jira/browse/FLINK-8433
 Project: Flink
  Issue Type: Bug
  Components: Documentation, State Backends, Checkpointing
Affects Versions: 1.4.0, 1.5.0
Reporter: Fabian Hueske


The code example for "Managed Operator State"  
(https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html#using-managed-operator-state)
 refers to the {{CheckpointedRestoring}} interface which was removed in Flink 
1.4.0.

The example must be updated and the interface be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8391) Improve dependency documentation for queryable state

2018-01-09 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8391:


 Summary: Improve dependency documentation for queryable state
 Key: FLINK-8391
 URL: https://issues.apache.org/jira/browse/FLINK-8391
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.5.0, 1.4.1
Reporter: Fabian Hueske


The documentation of the queryable state does not explicitly mention that the 
program that holds and updates the queryable state requires the 
{{flink-queryable-state-runtime}} dependency.

Right now, the section "Activating Queryable State" mentions, that 
{{flink-queryable-state-runtime}} needs to be copied into the {{./lib}} folder. 
However, this does not work if the program is started from an IDE, for example 
for local testing.

The docs should be updated accordingly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8331) FieldParsers do not correctly set EMPT_COLUMN error state

2017-12-29 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8331:


 Summary: FieldParsers do not correctly set EMPT_COLUMN error state
 Key: FLINK-8331
 URL: https://issues.apache.org/jira/browse/FLINK-8331
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.5.0, 1.4.1
Reporter: Fabian Hueske


Some {{FieldParser}} do not correctly set the EMPTY_COLUMN error state if a 
field is empty.
Instead, they try to parse the field value from an empty String which fails, 
e.g., in case of the {{DoubleParser}} with a {{NumberFormatException}}.

The {{RowCsvInputFormat}} has a flag to interpret empty fields as {{null}} 
values. The implementation requires that all {{FieldParser}} correctly return 
the EMPTY_COLUMN error state in case of an empty field.

Affected {{FieldParser}}:

- BigDecParser
- BigIntParser
- DoubleParser
- FloatParser
- SqlDateParser
- SqlTimeParser
- SqlTimestampParser



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8308) Update yajl-ruby dependency to 1.3.1 or higher

2017-12-22 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8308:


 Summary: Update yajl-ruby dependency to 1.3.1 or higher
 Key: FLINK-8308
 URL: https://issues.apache.org/jira/browse/FLINK-8308
 Project: Flink
  Issue Type: Task
  Components: Project Website
Reporter: Fabian Hueske
Priority: Critical
 Fix For: 1.5.0, 1.4.1


We got notified that yajl-ruby < 1.3.1, a dependency which is used to build the 
Flink website, has a  security vulnerability of high severity.

We should update yajl-ruby to 1.3.1 or higher.

Since the website is built offline and served as static HTML, I don't think 
this is a super critical issue (please correct me if I'm wrong), but we should 
resolve this soon.







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8303) Update Savepoint Compatibility Table for 1.4

2017-12-21 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8303:


 Summary: Update Savepoint Compatibility Table for 1.4
 Key: FLINK-8303
 URL: https://issues.apache.org/jira/browse/FLINK-8303
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.4.0
Reporter: Fabian Hueske
 Fix For: 1.4.1


The savepoint compatibility table of the [upgrading applications 
documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/upgrading.html#compatibility-table]
 needs to be extended for 1.4.

Also, the whole page should be double checked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8278) Scala examples in Metric documentation do not compile

2017-12-18 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8278:


 Summary: Scala examples in Metric documentation do not compile
 Key: FLINK-8278
 URL: https://issues.apache.org/jira/browse/FLINK-8278
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.3.2, 1.4.0, 1.5.0
Reporter: Fabian Hueske


The Scala examples in the [Metrics 
documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html]
 do not compile.

The line 

{code}
@transient private var counter: Counter
{code}

needs to be extended to

{code}
@transient private var counter: Counter = _
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8260) Document API of Kafka 0.11 Producer

2017-12-14 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8260:


 Summary: Document API of Kafka 0.11 Producer
 Key: FLINK-8260
 URL: https://issues.apache.org/jira/browse/FLINK-8260
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.4.0
Reporter: Fabian Hueske
Priority: Critical
 Fix For: 1.5.0, 1.4.1


The API of the Flink Kafka Producer changed for Kafka 0.11, for example there 
is no {{writeToKafkaWithTimestamps}} method anymore.

This needs to be added to the [Kafka connector 
documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-producer],
 i.e., a new tab with a code snippet needs to be added for Kafka 0.11.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8243) OrcTableSource should recursively read all files in nested directories of the input path.

2017-12-12 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8243:


 Summary: OrcTableSource should recursively read all files in 
nested directories of the input path.
 Key: FLINK-8243
 URL: https://issues.apache.org/jira/browse/FLINK-8243
 Project: Flink
  Issue Type: Improvement
  Components: Batch Connectors and Input/Output Formats
Affects Versions: 1.4.0
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical
 Fix For: 1.5.0, 1.4.1


The {{OrcTableSource}} only reads files on the first level of the provided 
input path. 

Hive's default behavior is to recursively read all nested files in the input 
path. We should follow this behavior and add a switch to disable recursive 
reading.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8242) ClassCastException in OrcTableSource.toOrcPredicate

2017-12-12 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8242:


 Summary: ClassCastException in OrcTableSource.toOrcPredicate
 Key: FLINK-8242
 URL: https://issues.apache.org/jira/browse/FLINK-8242
 Project: Flink
  Issue Type: Bug
  Components: Batch Connectors and Input/Output Formats
Affects Versions: 1.4.0
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical
 Fix For: 1.5.0, 1.4.1


The {{OrcTableSource}} tries to cast all predicate literals to {{Serializable}} 
in its {{toOrcPredicate()}} method. This fails with a {{ClassCastException}} if 
a literal is not serializable.

Instead of failing, we should ignore the predicate and log a WARN message.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8203) Make schema definition of DataStream/DataSet to Table conversion more flexible

2017-12-05 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8203:


 Summary: Make schema definition of DataStream/DataSet to Table 
conversion more flexible
 Key: FLINK-8203
 URL: https://issues.apache.org/jira/browse/FLINK-8203
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.4.0, 1.5.0
Reporter: Fabian Hueske


When converting or registering a {{DataStream}} or {{DataSet}} as {{Table}}, 
the schema of the table can be defined (by default it is extracted from the 
{{TypeInformation}}.

The schema needs to be manually specified to select (project) fields, rename 
fields, or define time attributes. Right now, there are several limitations how 
the fields can be defined that also depend on the type of the {{DataStream}} / 
{{DataSet}}. Types with explicit field ordering (e.g., tuples, case classes, 
Row) require schema definition based on the position of fields. Pojo types 
which have no fixed order of fields, require to refer to fields by name. 
Moreover, there are several restrictions on how time attributes can be defined, 
e.g., event time attribute must replace an existing field or be appended and 
proctime attributes must be appended.

I think we can make the schema definition more flexible and provide two modes:

1. Reference input fields by name: All fields in the schema definition are 
referenced by name (and possibly renamed using an alias ({{as}}). In this mode, 
fields can be reordered and projected out. Moreover, we can define proctime and 
eventtime attributes at arbitrary positions using arbitrary names (except those 
that existing the result schema). This mode can be used for any input type, 
including POJOs. This mode is used if all field references exist in the input 
type.

2. Reference input fields by position: Field references might not refer to 
existing fields in the input type. In this mode, fields are simply renamed. 
Event-time attributes can replace the field on their position in the input data 
(if it is of correct type) or be appended at the end. Proctime attributes must 
be appended at the end. This mode can only be used if the input type has a 
defined field order (tuple, case class, Row).

We need to add more tests the check for all combinations of input types and 
schema definition modes.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8179) Convert CharSequence to String when registering / converting a Table

2017-11-30 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8179:


 Summary: Convert CharSequence to String when registering / 
converting a Table
 Key: FLINK-8179
 URL: https://issues.apache.org/jira/browse/FLINK-8179
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske


Avro objects store text values as `java.lang.CharSequence`. Right now, these 
fields are treated as `ANY` type by Calcite (`GenericType` by Flink). So, 
importing a `DataStream` or `DataSet` of Avro objects results in Table where 
the text values cannot be used as String fields.

We should convert `CharSequence` fields to `String` when importing / converting 
to a Table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8177) Flink cannot be built for Hadoop 2.9.0

2017-11-30 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8177:


 Summary: Flink cannot be built for Hadoop 2.9.0
 Key: FLINK-8177
 URL: https://issues.apache.org/jira/browse/FLINK-8177
 Project: Flink
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.4.0, 1.5.0
Reporter: Fabian Hueske
Priority: Critical


Flink cannot be built for Hadoop 2.9.0 which was released on Nov. 17th, 2017.

When running 

{code}
mvn clean install -DskipTests -Dhadoop.version=2.9.0
{code}

Maven fails with the following error:

{code}
[ERROR] 
/Users/fhueske/Development/flink/flink-yarn/src/test/java/org/apache/flink/yarn/UtilsTest.java:[239,16]
 org.apache.flink.yarn.UtilsTest.TestingContainer is not abstract and does not 
override abstract method 
setExecutionType(org.apache.hadoop.yarn.api.records.ExecutionType) in 
org.apache.hadoop.yarn.api.records.Container
{code}

The problem is caused, because Hadoop 2.9.0 extended the {{Container}} 
interface with a getter ({{getExecutionType()}}) and a setter method 
({{setExecutionType()}}) and adds a class dependency on {{ExecutionType}}.

Flink's {{UtilsTest}} class defines a class {{TestingContainer}} that 
implements the {{Container}} interface.

We cannot simply update the implementation of {{TestingContainer}} because 
previous versions of Hadoop do not provide {{ExecutionType}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8140) Add AppendStreamTableSink for bucketed CSV files

2017-11-23 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8140:


 Summary: Add AppendStreamTableSink for bucketed CSV files
 Key: FLINK-8140
 URL: https://issues.apache.org/jira/browse/FLINK-8140
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Fabian Hueske
Priority: Minor


It would be good to have an {{AppendStreamTableSink}} that writes to bucketed 
CSV files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8141) Add AppendStreamTableSink for bucketed ORC files

2017-11-23 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8141:


 Summary: Add AppendStreamTableSink for bucketed ORC files
 Key: FLINK-8141
 URL: https://issues.apache.org/jira/browse/FLINK-8141
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Fabian Hueske
Priority: Minor


It would be good to have an {{AppendStreamTableSink}} that writes to bucketed 
ORC files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   >