[jira] [Commented] (FLINK-3669) WindowOperator registers a lot of timers at StreamTask

2016-04-10 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234552#comment-15234552
 ] 

Aljoscha Krettek commented on FLINK-3669:
-

Hi,
I'll be looking at your code today.

For the tests, the idea is to put tests that verify the interplay of the 
complete system into an *ITCase and smaller, self-contained tests into a *Test. 
Normally ITCases also take a lot longer to execute.

> WindowOperator registers a lot of timers at StreamTask
> --
>
> Key: FLINK-3669
> URL: https://issues.apache.org/jira/browse/FLINK-3669
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.1
>Reporter: Aljoscha Krettek
>Assignee: Konstantin Knauf
>Priority: Blocker
>
> Right now, the WindowOperator registers a timer at the StreamTask for every 
> processing-time timer that a Trigger registers. We should combine several 
> registered trigger timers to only register one low-level timer (timer 
> coalescing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-04-10 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234428#comment-15234428
 ] 

ramkrishna.s.vasudevan commented on FLINK-3579:
---

[~till.rohrmann], [~till.rohrmann] - You can merge and close this PR. Thank 
you. 

> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-04-10 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234427#comment-15234427
 ] 

ramkrishna.s.vasudevan commented on FLINK-3650:
---

Ping for reviews  [~trohrm...@apache.org]!!!

> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3477] [runtime] Add hash-based combine ...

2016-04-10 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-208083609
  
We've summarized the use-case around the hash aggregation experiments in [a 
blog post on the Peel 
webpage](http://peel-framework.org/2016/04/07/hash-aggregations-in-flink.html).

If you follow the instructions from the **Repeatability** section you 
should be able to reproduce the results on other environments without too much 
hastle.

I hope that this will be the first of many Flink-related public Peel 
bundles.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

2016-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234309#comment-15234309
 ] 

ASF GitHub Bot commented on FLINK-3477:
---

Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-208083609
  
We've summarized the use-case around the hash aggregation experiments in [a 
blog post on the Peel 
webpage](http://peel-framework.org/2016/04/07/hash-aggregations-in-flink.html).

If you follow the instructions from the **Repeatability** section you 
should be able to reproduce the results on other environments without too much 
hastle.

I hope that this will be the first of many Flink-related public Peel 
bundles.


> Add hash-based combine strategy for ReduceFunction
> --
>
> Key: FLINK-3477
> URL: https://issues.apache.org/jira/browse/FLINK-3477
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Fabian Hueske
>Assignee: Gabor Gevay
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a 
> single value. A Reduce function is incrementally applied to compute a final 
> aggregated value. This allows to hold the preaggregated value in a hash-table 
> and update it with each function call. 
> The hash-based strategy requires special implementation of an in-memory hash 
> table. The hash table should support in place updates of elements (if the 
> updated value has the same size as the new value) but also appending updates 
> with invalidation of the old value (if the binary length of the new value 
> differs). The hash table needs to be able to evict and emit all elements if 
> it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to 
> {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the 
> execution strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3665) Range partitioning lacks support to define sort orders

2016-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234216#comment-15234216
 ] 

ASF GitHub Bot commented on FLINK-3665:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/1848#discussion_r59137767
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/PartitionOperator.java
 ---
@@ -98,6 +101,14 @@ public PartitionOperator(DataSet input, Keys 
pKeys, Partitioner customP
this.customPartitioner = customPartitioner;
this.distribution = distribution;
}
+
+   public PartitionOperator withOrders(Order... orders) {
--- End diff --

Hi. I started working on this change, but I don't quite know how should I 
treat keyExpression (with wildcards especially). 

Lets take some complex example:

```
TypeInformation> ti =
new TupleTypeInfo<>(
BasicTypeInfo.INT_TYPE_INFO,
TypeExtractor.getForClass(Pojo1.class),
TypeExtractor.getForClass(PojoWithMultiplePojos.class)
);

ek = new ExpressionKeys<>(new String[] {"f2.p1.*", "f0"}, ti);

public static class Pojo1 {
public String a;
public String b;
}
public static class Pojo2 {
public String a2;
public String b2;
}
public static class PojoWithMultiplePojos {
public Pojo1 p1;
public Pojo2 p2;
public Integer i0;
}
```

What should be the output of `ek.getOriginalKeyFieldTypes`?


> Range partitioning lacks support to define sort orders
> --
>
> Key: FLINK-3665
> URL: https://issues.apache.org/jira/browse/FLINK-3665
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Fabian Hueske
> Fix For: 1.1.0
>
>
> {{DataSet.partitionByRange()}} does not allow to specify the sort order of 
> fields. This is fine if range partitioning is used to reduce skewed 
> partitioning. 
> However, it is not sufficient if range partitioning is used to sort a data 
> set in parallel. 
> Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily 
> changed, I propose to add a method {{withOrders(Order... orders)}} to 
> {{PartitionOperator}}. The method should throw an exception if the 
> partitioning method of {{PartitionOperator}} is not range partitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3665] Implemented sort orders support i...

2016-04-10 Thread dawidwys
Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/1848#discussion_r59137767
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/PartitionOperator.java
 ---
@@ -98,6 +101,14 @@ public PartitionOperator(DataSet input, Keys 
pKeys, Partitioner customP
this.customPartitioner = customPartitioner;
this.distribution = distribution;
}
+
+   public PartitionOperator withOrders(Order... orders) {
--- End diff --

Hi. I started working on this change, but I don't quite know how should I 
treat keyExpression (with wildcards especially). 

Lets take some complex example:

```
TypeInformation> ti =
new TupleTypeInfo<>(
BasicTypeInfo.INT_TYPE_INFO,
TypeExtractor.getForClass(Pojo1.class),
TypeExtractor.getForClass(PojoWithMultiplePojos.class)
);

ek = new ExpressionKeys<>(new String[] {"f2.p1.*", "f0"}, ti);

public static class Pojo1 {
public String a;
public String b;
}
public static class Pojo2 {
public String a2;
public String b2;
}
public static class PojoWithMultiplePojos {
public Pojo1 p1;
public Pojo2 p2;
public Integer i0;
}
```

What should be the output of `ek.getOriginalKeyFieldTypes`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3723) Aggregate Functions and scalar expressions shouldn't be mixed in select

2016-04-10 Thread Yijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yijie Shen updated FLINK-3723:
--
Description: 
When we type {code}select deptno, name, max(age) from dept group by 
deptno;{code} in calcite or Oracle, it will complain {code}Expression 'NAME' is 
not being grouped{code} or {code}Column 'dept.name' is invalid in the select 
list because it is not contained in either an aggregate function or the GROUP 
BY clause.{code} because of the nondeterministic result.

Therefore, I suggest to separate the current functionality of `select` into two 
api, the new `select` only handle scalar expressions, and an `agg` accept 
Aggregates.

{code}
def select(exprs: Expression*)
def agg(aggs: Aggregation*)


tbl.groupBy('deptno)
   .agg('age.max, 'age.min)
{code}

  was:
When we type {code}select deptno, name, max(age) from dept group by 
deptno;{code} in calcite or Oracle, it will complain {code}Expression 'NAME' is 
not being grouped{code} or {code}Column 'dept.name' is invalid in the select 
list because it is not contained in either an aggregate function or the GROUP 
BY clause.{code} because of the nondeterministic result.

Therefore, I suggest to separate the current functionality of `select` into two 
api, the new `select` only handle scalar expressions, and an `agg` accept 
Aggregates.

{code}
def select(exprs: Expression*)
def agg(aggs: Aggregation*)
{code}


> Aggregate Functions and scalar expressions shouldn't be mixed in select
> ---
>
> Key: FLINK-3723
> URL: https://issues.apache.org/jira/browse/FLINK-3723
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.0.1
>Reporter: Yijie Shen
>
> When we type {code}select deptno, name, max(age) from dept group by 
> deptno;{code} in calcite or Oracle, it will complain {code}Expression 'NAME' 
> is not being grouped{code} or {code}Column 'dept.name' is invalid in the 
> select list because it is not contained in either an aggregate function or 
> the GROUP BY clause.{code} because of the nondeterministic result.
> Therefore, I suggest to separate the current functionality of `select` into 
> two api, the new `select` only handle scalar expressions, and an `agg` accept 
> Aggregates.
> {code}
> def select(exprs: Expression*)
> def agg(aggs: Aggregation*)
> 
> tbl.groupBy('deptno)
>.agg('age.max, 'age.min)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3723) Aggregate Functions and scalar expressions shouldn't be mixed in select

2016-04-10 Thread Yijie Shen (JIRA)
Yijie Shen created FLINK-3723:
-

 Summary: Aggregate Functions and scalar expressions shouldn't be 
mixed in select
 Key: FLINK-3723
 URL: https://issues.apache.org/jira/browse/FLINK-3723
 Project: Flink
  Issue Type: Improvement
  Components: Table API
Affects Versions: 1.0.1
Reporter: Yijie Shen


When we type {code}select deptno, name, max(age) from dept group by 
deptno;{code} in calcite or Oracle, it will complain {code}Expression 'NAME' is 
not being grouped{code} or {code}Column 'dept.name' is invalid in the select 
list because it is not contained in either an aggregate function or the GROUP 
BY clause.{code} because of the nondeterministic result.

Therefore, I suggest to separate the current functionality of `select` into two 
api, the new `select` only handle scalar expressions, and an `agg` accept 
Aggregates.

{code}
def select(exprs: Expression*)
def agg(aggs: Aggregation*)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)