date:20170213

[jira] [Commented] (FLINK-5792) Improve “UDTF" to support with parameter constructor

2017-02-13 Thread Zhuoluo Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865221#comment-15865221
 ] 

Zhuoluo Yang commented on FLINK-5792:
-

[~sunjincheng121] Could you please attach some design documents on this ticket?

> Improve “UDTF" to support with parameter constructor
> 
>
> Key: FLINK-5792
> URL: https://issues.apache.org/jira/browse/FLINK-5792
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: sunjincheng
>Assignee: sunjincheng
>
> Improved “UDTF" to support with parameter constructor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5793) Running slot may not be add to AllocatedMap in SlotPool

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865219#comment-15865219
 ] 

ASF GitHub Bot commented on FLINK-5793:
---

GitHub user shuai-xu opened a pull request:

https://github.com/apache/flink/pull/3306

[FLINK-5793] [runtime] fix running slot may not be add to AllocatedMap in 
SlotPool bug

This pr is for jira 
#[5793](https://issues.apache.org/jira/browse/FLINK-5793)

In SlotPool, when a slot is returned by a finished task, it will try to 
find a 
pending request mataching it. If found, will give the slot to the request, 
but
not add the slot to AllocatedMap.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shuai-xu/flink jira-5793

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3306.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3306


commit 11fe5ae90bb912f88b86aa84124b7b453177a819
Author: shuai.xus 
Date:   2017-02-14T06:56:41Z

[FLINK-5793] [runtime] fix running slot may not be add to AllocatedMap in 
SlotPool bug




> Running slot may not be add to AllocatedMap in SlotPool
> ---
>
> Key: FLINK-5793
> URL: https://issues.apache.org/jira/browse/FLINK-5793
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Reporter: shuai.xu
>Assignee: shuai.xu
>
> In SlotPool, when a slot is returned by a finished task, it will try to find 
> a pending request mataching it. If found, will give the slot to the request, 
> butnot add the slot to AllocatedMap.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3306: [FLINK-5793] [runtime] fix running slot may not be...

2017-02-13 Thread shuai-xu

GitHub user shuai-xu opened a pull request:

https://github.com/apache/flink/pull/3306

[FLINK-5793] [runtime] fix running slot may not be add to AllocatedMap in 
SlotPool bug

This pr is for jira 
#[5793](https://issues.apache.org/jira/browse/FLINK-5793)

In SlotPool, when a slot is returned by a finished task, it will try to 
find a 
pending request mataching it. If found, will give the slot to the request, 
but
not add the slot to AllocatedMap.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shuai-xu/flink jira-5793

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3306.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3306


commit 11fe5ae90bb912f88b86aa84124b7b453177a819
Author: shuai.xus 
Date:   2017-02-14T06:56:41Z

[FLINK-5793] [runtime] fix running slot may not be add to AllocatedMap in 
SlotPool bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-5793) Running slot may not be add to AllocatedMap in SlotPool

2017-02-13 Thread shuai.xu (JIRA)

shuai.xu created FLINK-5793:
---

 Summary: Running slot may not be add to AllocatedMap in SlotPool
 Key: FLINK-5793
 URL: https://issues.apache.org/jira/browse/FLINK-5793
 Project: Flink
  Issue Type: Bug
  Components: Core
Reporter: shuai.xu
Assignee: shuai.xu


In SlotPool, when a slot is returned by a finished task, it will try to find a 
pending request mataching it. If found, will give the slot to the request, but  
  not add the slot to AllocatedMap.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5133) Add new setResource API for DataStream and DataSet

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865200#comment-15865200
 ] 

ASF GitHub Bot commented on FLINK-5133:
---

Github user wangzhijiang999 commented on the issue:

https://github.com/apache/flink/pull/3303
  
@StephanEwen , this PR includes the new API that would be visible to user, 
but it can not work completely because the following codes in runtime have not 
been submitted. In order not to confuse users, this PR would be fixed to hide 
the API temporarily before merging into master. What do you think?


> Add new setResource API for DataStream and DataSet
> --
>
> Key: FLINK-5133
> URL: https://issues.apache.org/jira/browse/FLINK-5133
> Project: Flink
>  Issue Type: Sub-task
>  Components: DataSet API, DataStream API
>Reporter: Zhijiang Wang
>Assignee: Zhijiang Wang
>
> This is part of the fine-grained resource configuration.
> For *DataStream*, the *setResource* API will be setted onto 
> *SingleOutputStreamOperator* similar with other existing properties like 
> parallelism, name, etc.
> For *DataSet*, the *setResource* API will be setted onto *Operator* in the 
> similar way.
> There are two parameters described with minimum *ResourceSpec* and maximum 
> *ResourceSpec* separately in the API for considering resource resize in 
> future improvements.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3303: [FLINK-5133][core] Add new setResource API for DataStream...

2017-02-13 Thread wangzhijiang999

Github user wangzhijiang999 commented on the issue:

https://github.com/apache/flink/pull/3303
  
@StephanEwen , this PR includes the new API that would be visible to user, 
but it can not work completely because the following codes in runtime have not 
been submitted. In order not to confuse users, this PR would be fixed to hide 
the API temporarily before merging into master. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-4930) Implement FLIP-6 YARN client

2017-02-13 Thread shuai.xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuai.xu resolved FLINK-4930.
-
Resolution: Fixed

fix via 3695a8e92e9c3deee368d9cc3ce89a5ab117d6a1

> Implement FLIP-6 YARN client
> 
>
> Key: FLINK-4930
> URL: https://issues.apache.org/jira/browse/FLINK-4930
> Project: Flink
>  Issue Type: Sub-task
>  Components: YARN
> Environment: {{flip-6}} feature branch
>Reporter: Stephan Ewen
>Assignee: shuai.xu
>
> The FLIP-6 YARN client can follow parts of the existing YARN client.
> The main difference is that it does not wait for the cluster to be fully 
> started and for all TaskManagers to register. It simply submits 
>   - Set up all configurations and environment variables
>   - Set up the resources: Flink jar, utility jars (logging), user jar
>   - Set up attached tokens / certificates
>   - Submit the Yarn application
>   - Listen for leader / attempt to connect to the JobManager to subscribe to 
> updates
>   - Integration with the Flink CLI (command line interface)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (FLINK-4929) Implement FLIP-6 YARN TaskManager Runner

2017-02-13 Thread shuai.xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuai.xu resolved FLINK-4929.
-
Resolution: Fixed

fix via 371997a81a7035b68cf47e7bca64ce01e3ac36a6

> Implement FLIP-6 YARN TaskManager Runner
> 
>
> Key: FLINK-4929
> URL: https://issues.apache.org/jira/browse/FLINK-4929
> Project: Flink
>  Issue Type: Sub-task
>  Components: YARN
> Environment: {{flip-6}} feature branch
>Reporter: Stephan Ewen
>Assignee: shuai.xu
>
> The YARN TaskManager Runner has the following responsibilities:
>   - Read the configuration and all environment variables and compute the 
> effective configuration
>   - Start all services (Rpc, High Availability, Security, etc)
>   - Instantiate and start the Task Manager Runner



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (FLINK-4928) Implement FLIP-6 YARN Application Master Runner

2017-02-13 Thread shuai.xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuai.xu resolved FLINK-4928.
-
Resolution: Fixed

fix via 0113e5a467858f9cd435e80df2c2626170e5de62

> Implement FLIP-6 YARN Application Master Runner
> ---
>
> Key: FLINK-4928
> URL: https://issues.apache.org/jira/browse/FLINK-4928
> Project: Flink
>  Issue Type: Sub-task
>  Components: YARN
> Environment: {{flip-6}} feature branch
>Reporter: Stephan Ewen
>Assignee: shuai.xu
>
> The Application Master Runner is the master process started in a YARN 
> container when submitting the Flink-on-YARN job to YARN.
> It has the following data available:
>   - Flink jars
>   - Job jars
>   - JobGraph
>   - Environment variables
>   - Contextual information like security tokens and certificates
> Its responsibility is the following:
>   - Read all configuration and environment variables, computing the effective 
> configuration
>   - Start all shared components (Rpc, HighAvailability Services)
>   - Start the ResourceManager
>   - Start the JobManager Runner



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (FLINK-4927) Implement FLI-6 YARN Resource Manager

2017-02-13 Thread shuai.xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuai.xu resolved FLINK-4927.
-
Resolution: Fixed

fix via e2922add100338776db765a62deb02f556845cf9

> Implement FLI-6 YARN Resource Manager
> -
>
> Key: FLINK-4927
> URL: https://issues.apache.org/jira/browse/FLINK-4927
> Project: Flink
>  Issue Type: Sub-task
>  Components: YARN
> Environment: {{flip-6}} feature branch
>Reporter: Stephan Ewen
>Assignee: shuai.xu
>
> The Flink YARN Resource Manager communicates with YARN's Resource Manager to 
> acquire and release containers.
> It is also responsible to notify the JobManager eagerly about container 
> failures.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (FLINK-5023) Add get() method in State interface

2017-02-13 Thread Xiaogang Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaogang Shi closed FLINK-5023.
---
Resolution: Won't Fix

The updates to the `State` interface will affect existing user code. We will 
not update the interface before Flink 2.0.

> Add get() method in State interface
> ---
>
> Key: FLINK-5023
> URL: https://issues.apache.org/jira/browse/FLINK-5023
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Xiaogang Shi
>Assignee: Xiaogang Shi
>
> Currently, the only method provided by the State interface is `clear()`. I 
> think we should provide another method called `get()` to return the 
> structured value (e.g., value, list, or map) under the current key. 
> In fact, the functionality of `get()` has already been implemented in all 
> types of states: e.g., `value()` in ValueState and `get()` in ListState. The 
> modification to the interface can better abstract these states.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (FLINK-5024) Add SimpleStateDescriptor to clarify the concepts

2017-02-13 Thread Xiaogang Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaogang Shi closed FLINK-5024.
---
Resolution: Invalid

Now we refactor the state descriptors with the introduction of composited 
serializers (e.g. {{ArrayListSerializer}})

> Add SimpleStateDescriptor to clarify the concepts
> -
>
> Key: FLINK-5024
> URL: https://issues.apache.org/jira/browse/FLINK-5024
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Xiaogang Shi
>Assignee: Xiaogang Shi
>
> Currently, StateDescriptors accept two type arguments : the first one is the 
> type of the created state and the second one is the type of the values in the 
> states. 
> The concepts however is a little confusing here because in ListStates, the 
> arguments passed to the StateDescriptors are the types of the list elements 
> instead of the lists. It also makes the implementation of MapStates difficult.
> I suggest not to put the type serializer in StateDescriptors, making 
> StateDescriptors independent of the data structures of the values. 
> A new type of StateDescriptor named SimpleStateDescriptor can be provided to 
> abstract those states (namely ValueState, ReducingState and FoldingState) 
> whose states are not composited. 
> The states (e.g. ListStates and MapStates) can implement their own 
> descriptors according to their data structures. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5023) Add get() method in State interface

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865167#comment-15865167
 ] 

ASF GitHub Bot commented on FLINK-5023:
---

Github user shixiaogang closed the pull request at:

https://github.com/apache/flink/pull/2768


> Add get() method in State interface
> ---
>
> Key: FLINK-5023
> URL: https://issues.apache.org/jira/browse/FLINK-5023
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Xiaogang Shi
>Assignee: Xiaogang Shi
>
> Currently, the only method provided by the State interface is `clear()`. I 
> think we should provide another method called `get()` to return the 
> structured value (e.g., value, list, or map) under the current key. 
> In fact, the functionality of `get()` has already been implemented in all 
> types of states: e.g., `value()` in ValueState and `get()` in ListState. The 
> modification to the interface can better abstract these states.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5023) Add get() method in State interface

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865166#comment-15865166
 ] 

ASF GitHub Bot commented on FLINK-5023:
---

Github user shixiaogang commented on the issue:

https://github.com/apache/flink/pull/2768
  
Close the pull request because the state descriptor now is refactored with 
the introduction of composited serializers (See 
[FLINK-5790](https://issues.apache.org/jira/browse/FLINK-5790)).


> Add get() method in State interface
> ---
>
> Key: FLINK-5023
> URL: https://issues.apache.org/jira/browse/FLINK-5023
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Xiaogang Shi
>Assignee: Xiaogang Shi
>
> Currently, the only method provided by the State interface is `clear()`. I 
> think we should provide another method called `get()` to return the 
> structured value (e.g., value, list, or map) under the current key. 
> In fact, the functionality of `get()` has already been implemented in all 
> types of states: e.g., `value()` in ValueState and `get()` in ListState. The 
> modification to the interface can better abstract these states.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #2768: [FLINK-5023][FLINK-5024] Add SimpleStateDescriptor...

2017-02-13 Thread shixiaogang

Github user shixiaogang closed the pull request at:

https://github.com/apache/flink/pull/2768


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2768: [FLINK-5023][FLINK-5024] Add SimpleStateDescriptor to cla...

2017-02-13 Thread shixiaogang

Github user shixiaogang commented on the issue:

https://github.com/apache/flink/pull/2768
  
Close the pull request because the state descriptor now is refactored with 
the introduction of composited serializers (See 
[FLINK-5790](https://issues.apache.org/jira/browse/FLINK-5790)).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-5792) Improve “UDTF" to support with parameter constructor

2017-02-13 Thread sunjincheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-5792:
---
Description: Improved “UDTF" to support with parameter constructor.   (was: 
Improved “UDF/UDTF" to support with parameter constructor. )

> Improve “UDTF" to support with parameter constructor
> 
>
> Key: FLINK-5792
> URL: https://issues.apache.org/jira/browse/FLINK-5792
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: sunjincheng
>Assignee: sunjincheng
>
> Improved “UDTF" to support with parameter constructor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-5792) Improve “UDTF" to support with parameter constructor

2017-02-13 Thread sunjincheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-5792:
---
Summary: Improve “UDTF" to support with parameter constructor  (was: 
Improve “UDF/UDTF" to support with parameter constructor)

> Improve “UDTF" to support with parameter constructor
> 
>
> Key: FLINK-5792
> URL: https://issues.apache.org/jira/browse/FLINK-5792
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: sunjincheng
>Assignee: sunjincheng
>
> Improved “UDF/UDTF" to support with parameter constructor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5790) Use list types when ListStateDescriptor extends StateDescriptor

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865162#comment-15865162
 ] 

ASF GitHub Bot commented on FLINK-5790:
---

Github user shixiaogang commented on a diff in the pull request:

https://github.com/apache/flink/pull/3305#discussion_r100967108
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/migration/util/MigrationInstantiationUtil.java
 ---
@@ -47,9 +47,16 @@ public ClassLoaderObjectInputStream(InputStream in, 
ClassLoader classLoader) thr
 
// the flink package may be at position 0 (regular 
class) or position 2 (array)
final int flinkPackagePos;
-   if ((flinkPackagePos = 
className.indexOf(FLINK_BASE_PACKAGE)) == 0 ||
-   (flinkPackagePos == 2 && 
className.startsWith(ARRAY_PREFIX)))
-   {
+   if 
(className.contains("org.apache.flink.runtime.state.ArrayListSerializer")) {
--- End diff --

The code here is a little tricky. I think we should use a Map to record all 
modified classes and their corresponding backups.


> Use list types when ListStateDescriptor extends StateDescriptor
> ---
>
> Key: FLINK-5790
> URL: https://issues.apache.org/jira/browse/FLINK-5790
> Project: Flink
>  Issue Type: Improvement
>Reporter: Xiaogang Shi
>Assignee: Xiaogang Shi
>
> Flink keeps the state serializer in {{StateDescriptor}}, but it's the 
> serializer of list elements  that is put in {{ListStateDescriptor}}. The 
> implementation is a little confusing. Some backends need to construct the 
> state serializer with the element serializer by themselves.
> We should use an {{ArrayListSerializer}}, which is composed of the serializer 
> of the element, in the {{ListStateDescriptor}}. It helps the backend to avoid 
> constructing the state serializer.
> If a backend needs customized serialization of the state (e.g. 
> {{RocksDBStateBackend}}), it still can obtain the element serializer from the 
> {{ArrayListSerializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3305: [FLINK-5790][StateBackend] Use list types when Lis...

2017-02-13 Thread shixiaogang

Github user shixiaogang commented on a diff in the pull request:

https://github.com/apache/flink/pull/3305#discussion_r100967108
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/migration/util/MigrationInstantiationUtil.java
 ---
@@ -47,9 +47,16 @@ public ClassLoaderObjectInputStream(InputStream in, 
ClassLoader classLoader) thr
 
// the flink package may be at position 0 (regular 
class) or position 2 (array)
final int flinkPackagePos;
-   if ((flinkPackagePos = 
className.indexOf(FLINK_BASE_PACKAGE)) == 0 ||
-   (flinkPackagePos == 2 && 
className.startsWith(ARRAY_PREFIX)))
-   {
+   if 
(className.contains("org.apache.flink.runtime.state.ArrayListSerializer")) {
--- End diff --

The code here is a little tricky. I think we should use a Map to record all 
modified classes and their corresponding backups.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5790) Use list types when ListStateDescriptor extends StateDescriptor

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865156#comment-15865156
 ] 

ASF GitHub Bot commented on FLINK-5790:
---

GitHub user shixiaogang opened a pull request:

https://github.com/apache/flink/pull/3305

[FLINK-5790][StateBackend] Use list types when ListStateDescriptor extends 
StateDescriptor

1. Now the state serializer, instead of the element serializer, is stored 
in `ListStateDescriptor`. 
2. `ArrayListTypeInfo` is introduced to help create serializers with the 
element type.
3. `ArrayListSerializer` is moved to the package 
org.apache.flink.api.common.typeutils.base to avoid cyclic dependencies.
4. Old implementation of `ListStateDescriptor` is kept in the migration 
package for back compatibility.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alibaba/flink flink-5790

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3305.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3305


commit e8e11b7965365178453ab6eab78c6d5ac98f3537
Author: xiaogang.sxg 
Date:   2017-02-14T05:39:30Z

Use list types when ListStateDescriptor extends StateDescriptor

commit ba8cdc919fc2e66b3e81f6f8566140bef53a9b96
Author: xiaogang.sxg 
Date:   2017-02-14T06:22:25Z

Support back compatibility for ListStateDescriptor




> Use list types when ListStateDescriptor extends StateDescriptor
> ---
>
> Key: FLINK-5790
> URL: https://issues.apache.org/jira/browse/FLINK-5790
> Project: Flink
>  Issue Type: Improvement
>Reporter: Xiaogang Shi
>Assignee: Xiaogang Shi
>
> Flink keeps the state serializer in {{StateDescriptor}}, but it's the 
> serializer of list elements  that is put in {{ListStateDescriptor}}. The 
> implementation is a little confusing. Some backends need to construct the 
> state serializer with the element serializer by themselves.
> We should use an {{ArrayListSerializer}}, which is composed of the serializer 
> of the element, in the {{ListStateDescriptor}}. It helps the backend to avoid 
> constructing the state serializer.
> If a backend needs customized serialization of the state (e.g. 
> {{RocksDBStateBackend}}), it still can obtain the element serializer from the 
> {{ArrayListSerializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3305: [FLINK-5790][StateBackend] Use list types when Lis...

2017-02-13 Thread shixiaogang

GitHub user shixiaogang opened a pull request:

https://github.com/apache/flink/pull/3305

[FLINK-5790][StateBackend] Use list types when ListStateDescriptor extends 
StateDescriptor

1. Now the state serializer, instead of the element serializer, is stored 
in `ListStateDescriptor`. 
2. `ArrayListTypeInfo` is introduced to help create serializers with the 
element type.
3. `ArrayListSerializer` is moved to the package 
org.apache.flink.api.common.typeutils.base to avoid cyclic dependencies.
4. Old implementation of `ListStateDescriptor` is kept in the migration 
package for back compatibility.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alibaba/flink flink-5790

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3305.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3305


commit e8e11b7965365178453ab6eab78c6d5ac98f3537
Author: xiaogang.sxg 
Date:   2017-02-14T05:39:30Z

Use list types when ListStateDescriptor extends StateDescriptor

commit ba8cdc919fc2e66b3e81f6f8566140bef53a9b96
Author: xiaogang.sxg 
Date:   2017-02-14T06:22:25Z

Support back compatibility for ListStateDescriptor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5406) add normalization phase for predicate logical plan rewriting between decorrelate query phase and volcano optimization phase

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865136#comment-15865136
 ] 

ASF GitHub Bot commented on FLINK-5406:
---

Github user godfreyhe commented on the issue:

https://github.com/apache/flink/pull/3101
  
Done, thanks for the suggest @fhueske


> add normalization phase for predicate logical plan rewriting between 
> decorrelate query phase and volcano optimization phase
> ---
>
> Key: FLINK-5406
> URL: https://issues.apache.org/jira/browse/FLINK-5406
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: godfrey he
>
> Normalization phase is for predicate logical plan rewriting and is 
> independent of cost module. The rules in normalization phase do not need to 
> repeatedly applied to different logical plan which is different to volcano 
> optimization phase. And the benefit of normalization phase is to reduce the 
> running time of volcano planner.
> *ReduceExpressionsRule* can apply various simplifying transformations on 
> RexNode trees. Currently, there are two transformations:
> 1) Constant reduction, which evaluates constant subtrees, replacing them with 
> a corresponding RexLiteral
> 2) Removal of redundant casts, which occurs when the argument into the cast 
> is the same as the type of the resulting cast expression
> the above transformations do not depend on the cost module,  so we can move 
> the rules in *ReduceExpressionsRule* from 
> DATASET_OPT_RULES/DATASTREAM_OPT_RULES to DataSet/DataStream Normalization 
> Rules.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3101: [FLINK-5406] [table] add normalization phase for predicat...

2017-02-13 Thread godfreyhe

Github user godfreyhe commented on the issue:

https://github.com/apache/flink/pull/3101
  
Done, thanks for the suggest @fhueske


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5791) Resource should be strictly matched when allocating for yarn

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865127#comment-15865127
 ] 

ASF GitHub Bot commented on FLINK-5791:
---

GitHub user shuai-xu opened a pull request:

https://github.com/apache/flink/pull/3304

[FLINK-5791] [runtime] Resource should be strictly matched when allocating 
for yarn

This pr is for jira 
#[5791](https://issues.apache.org/jira/browse/FLINK-5791).

It has following changes:
1. Yarn RM will pass the real ResourceProfile to TM for initializing the 
slots.
2. Add a SMRFSlotManager for allocating slots only resource strictly equal.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shuai-xu/flink jira-5791

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3304


commit 5d00959feed419ade08218a5570b884d88108dcf
Author: shuai.xus 
Date:   2017-02-14T05:48:17Z

[FLINK-5791] [runtime] Resource should be strictly matched when allocating 
for yarn




> Resource should be strictly matched when allocating for yarn
> 
>
> Key: FLINK-5791
> URL: https://issues.apache.org/jira/browse/FLINK-5791
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: shuai.xu
>Assignee: shuai.xu
>  Labels: flip-6
>
> For yarn mode, resource should be assigned as requested to avoid resource 
> wasting and OOM.
> 1. YarnResourceManager will request container according to ResourceProfile   
> in slot request form JM.
> 2. RM will pass the ResourceProfile to TM for initializing its slots.
> 3. RM should match the slots offered by TM with SlotRequest from JM strictly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3304: [FLINK-5791] [runtime] Resource should be strictly...

2017-02-13 Thread shuai-xu

GitHub user shuai-xu opened a pull request:

https://github.com/apache/flink/pull/3304

[FLINK-5791] [runtime] Resource should be strictly matched when allocating 
for yarn

This pr is for jira 
#[5791](https://issues.apache.org/jira/browse/FLINK-5791).

It has following changes:
1. Yarn RM will pass the real ResourceProfile to TM for initializing the 
slots.
2. Add a SMRFSlotManager for allocating slots only resource strictly equal.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shuai-xu/flink jira-5791

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3304


commit 5d00959feed419ade08218a5570b884d88108dcf
Author: shuai.xus 
Date:   2017-02-14T05:48:17Z

[FLINK-5791] [runtime] Resource should be strictly matched when allocating 
for yarn




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5566) Introduce structure to hold table and column level statistics

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865088#comment-15865088
 ] 

ASF GitHub Bot commented on FLINK-5566:
---

Github user beyond1920 commented on the issue:

https://github.com/apache/flink/pull/3196
  
@fhueske , thanks for your review. I modify code based on your advice, 
including compatibility with Java and column stats field type.


> Introduce structure to hold table and column level statistics
> -
>
> Key: FLINK-5566
> URL: https://issues.apache.org/jira/browse/FLINK-5566
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Kurt Young
>Assignee: zhangjing
>
> We define two structure mode to hold statistics
> 1. TableStats: contain stats for table level, now only one element: rowCount
> 2. ColumnStats: contain stats of column level. 
> for numeric column type: including ndv, nullCount, max, min, histogram
> for string type: including ndv, nullCount, avgLen,maxLen
> for boolean:including ndv, nullCount, trueCount, falseCount
> for date/time/timestamp:  including ndv, nullCount, max, min, histogram 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3196: [FLINK-5566] [Table API & SQL]Introduce structure to hold...

2017-02-13 Thread beyond1920

Github user beyond1920 commented on the issue:

https://github.com/apache/flink/pull/3196
  
@fhueske , thanks for your review. I modify code based on your advice, 
including compatibility with Java and column stats field type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource

2017-02-13 Thread ramkrish86

Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3149
  
@fhueske - Are you fine with that pom change? If so we can get this in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2168) Add HBaseTableSource

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865068#comment-15865068
 ] 

ASF GitHub Bot commented on FLINK-2168:
---

Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/3149
  
@fhueske - Are you fine with that pom change? If so we can get this in.


> Add HBaseTableSource
> 
>
> Key: FLINK-2168
> URL: https://issues.apache.org/jira/browse/FLINK-2168
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> Add a {{HBaseTableSource}} to read data from a HBase table. The 
> {{HBaseTableSource}} should implement the {{ProjectableTableSource}} 
> (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces.
> The implementation can be based on Flink's {{TableInputFormat}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5133) Add new setResource API for DataStream and DataSet

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865048#comment-15865048
 ] 

ASF GitHub Bot commented on FLINK-5133:
---

GitHub user wangzhijiang999 opened a pull request:

https://github.com/apache/flink/pull/3303

[FLINK-5133][core] Add new setResource API for DataStream and DataSet

This is part of the fine-grained resource configuration.
For **DataStream**, the **setResource** API will be setted onto 
**SingleOutputStreamOperator** similar with other existing properties like 
parallelism, name, etc.
For **DataSet**, the **setResource** API will be setted onto **Operator** 
in the similar way.
There are two parameters described with minimum **ResourceSpec** and 
maximum **ResourceSpec** separately in the API for considering dynamic resource 
resize in future improvements.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangzhijiang999/flink FLINK-5133

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3303


commit de7c4c75c9f1c80bf7e529a1ff05fa870c2df1cb
Author: 淘江 
Date:   2017-02-14T04:37:18Z

[FLINK-5133][core] Add new setResource API for DataStream and DataSet




> Add new setResource API for DataStream and DataSet
> --
>
> Key: FLINK-5133
> URL: https://issues.apache.org/jira/browse/FLINK-5133
> Project: Flink
>  Issue Type: Sub-task
>  Components: DataSet API, DataStream API
>Reporter: Zhijiang Wang
>Assignee: Zhijiang Wang
>
> This is part of the fine-grained resource configuration.
> For *DataStream*, the *setResource* API will be setted onto 
> *SingleOutputStreamOperator* similar with other existing properties like 
> parallelism, name, etc.
> For *DataSet*, the *setResource* API will be setted onto *Operator* in the 
> similar way.
> There are two parameters described with minimum *ResourceSpec* and maximum 
> *ResourceSpec* separately in the API for considering resource resize in 
> future improvements.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3303: [FLINK-5133][core] Add new setResource API for Dat...

2017-02-13 Thread wangzhijiang999

GitHub user wangzhijiang999 opened a pull request:

https://github.com/apache/flink/pull/3303

[FLINK-5133][core] Add new setResource API for DataStream and DataSet

This is part of the fine-grained resource configuration.
For **DataStream**, the **setResource** API will be setted onto 
**SingleOutputStreamOperator** similar with other existing properties like 
parallelism, name, etc.
For **DataSet**, the **setResource** API will be setted onto **Operator** 
in the similar way.
There are two parameters described with minimum **ResourceSpec** and 
maximum **ResourceSpec** separately in the API for considering dynamic resource 
resize in future improvements.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangzhijiang999/flink FLINK-5133

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3303


commit de7c4c75c9f1c80bf7e529a1ff05fa870c2df1cb
Author: æ·æ± 
Date:   2017-02-14T04:37:18Z

[FLINK-5133][core] Add new setResource API for DataStream and DataSet




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5566) Introduce structure to hold table and column level statistics

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865033#comment-15865033
 ] 

ASF GitHub Bot commented on FLINK-5566:
---

Github user beyond1920 commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100958745
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
+nullCount: Long,
+avgLen: Long,
+maxLen: Long,
+max: Option[Any],
+min: Option[Any]) {
--- End diff --

It makes sense to add a field to denote whether stats are precise or 
approximate and a field to hold timestamp when the stats were generated. But 
I'm not sure how these two fields effects the optimized plan. Because we prefer 
to use the provided stats, even it is estimated value or it is a little stale. 
So I didn't add these two fields currently, maybe will add them later. What do 
you think?


> Introduce structure to hold table and column level statistics
> -
>
> Key: FLINK-5566
> URL: https://issues.apache.org/jira/browse/FLINK-5566
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Kurt Young
>Assignee: zhangjing
>
> We define two structure mode to hold statistics
> 1. TableStats: contain stats for table level, now only one element: rowCount
> 2. ColumnStats: contain stats of column level. 
> for numeric column type: including ndv, nullCount, max, min, histogram
> for string type: including ndv, nullCount, avgLen,maxLen
> for boolean:including ndv, nullCount, trueCount, falseCount
> for date/time/timestamp:  including ndv, nullCount, max, min, histogram 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3196: [FLINK-5566] [Table API & SQL]Introduce structure ...

2017-02-13 Thread beyond1920

Github user beyond1920 commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100958745
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
+nullCount: Long,
+avgLen: Long,
+maxLen: Long,
+max: Option[Any],
+min: Option[Any]) {
--- End diff --

It makes sense to add a field to denote whether stats are precise or 
approximate and a field to hold timestamp when the stats were generated. But 
I'm not sure how these two fields effects the optimized plan. Because we prefer 
to use the provided stats, even it is estimated value or it is a little stale. 
So I didn't add these two fields currently, maybe will add them later. What do 
you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-5792) Improve “UDF/UDTF" to support with parameter constructor

2017-02-13 Thread sunjincheng (JIRA)

sunjincheng created FLINK-5792:
--

 Summary: Improve “UDF/UDTF" to support with parameter constructor
 Key: FLINK-5792
 URL: https://issues.apache.org/jira/browse/FLINK-5792
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: sunjincheng
Assignee: sunjincheng


Improved “UDF/UDTF" to support with parameter constructor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-5791) Resource should be strictly matched when allocating for yarn

2017-02-13 Thread shuai.xu (JIRA)

shuai.xu created FLINK-5791:
---

 Summary: Resource should be strictly matched when allocating for 
yarn
 Key: FLINK-5791
 URL: https://issues.apache.org/jira/browse/FLINK-5791
 Project: Flink
  Issue Type: Improvement
  Components: YARN
Reporter: shuai.xu
Assignee: shuai.xu


For yarn mode, resource should be assigned as requested to avoid resource 
wasting and OOM.
1. YarnResourceManager will request container according to ResourceProfile   in 
slot request form JM.
2. RM will pass the ResourceProfile to TM for initializing its slots.
3. RM should match the slots offered by TM with SlotRequest from JM strictly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5566) Introduce structure to hold table and column level statistics

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864939#comment-15864939
 ] 

ASF GitHub Bot commented on FLINK-5566:
---

Github user beyond1920 commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100951579
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
+nullCount: Long,
+avgLen: Long,
--- End diff --

ok


> Introduce structure to hold table and column level statistics
> -
>
> Key: FLINK-5566
> URL: https://issues.apache.org/jira/browse/FLINK-5566
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Kurt Young
>Assignee: zhangjing
>
> We define two structure mode to hold statistics
> 1. TableStats: contain stats for table level, now only one element: rowCount
> 2. ColumnStats: contain stats of column level. 
> for numeric column type: including ndv, nullCount, max, min, histogram
> for string type: including ndv, nullCount, avgLen,maxLen
> for boolean:including ndv, nullCount, trueCount, falseCount
> for date/time/timestamp:  including ndv, nullCount, max, min, histogram 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3196: [FLINK-5566] [Table API & SQL]Introduce structure ...

2017-02-13 Thread beyond1920

Github user beyond1920 commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100951579
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
+nullCount: Long,
+avgLen: Long,
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-5190) ZooKeeperLeaderRetrievalService should not close the zk client when stop

2017-02-13 Thread shuai.xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shuai.xu resolved FLINK-5190.
-
Resolution: Fixed

fix via cbfb807d65b68b2b6157e1b1d42606123ea499ad

> ZooKeeperLeaderRetrievalService should not close the zk client when stop
> 
>
> Key: FLINK-5190
> URL: https://issues.apache.org/jira/browse/FLINK-5190
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Reporter: shuai.xu
>Assignee: shuai.xu
>  Labels: flip-6
>
> The zk client is created outside of ZooKeeperLeaderRetrievalService and 
> psssed to it, when ZooKeeperLeaderRetrievalService stop, it should not stop 
> the zk client as other may be using it outside.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3196: [FLINK-5566] [Table API & SQL]Introduce structure ...

2017-02-13 Thread beyond1920

Github user beyond1920 commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100950729
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
--- End diff --

@fhueske , there is no need to make all stats optional. If there is no 
statistics for ndv/nullcount/avgLen/maxLen, we could give them an invalid 
value, e.g, -1. But it does not work for max/min, because max/min value could 
be possible negative. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5566) Introduce structure to hold table and column level statistics

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864928#comment-15864928
 ] 

ASF GitHub Bot commented on FLINK-5566:
---

Github user beyond1920 commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100950729
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
--- End diff --

@fhueske , there is no need to make all stats optional. If there is no 
statistics for ndv/nullcount/avgLen/maxLen, we could give them an invalid 
value, e.g, -1. But it does not work for max/min, because max/min value could 
be possible negative. 


> Introduce structure to hold table and column level statistics
> -
>
> Key: FLINK-5566
> URL: https://issues.apache.org/jira/browse/FLINK-5566
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Kurt Young
>Assignee: zhangjing
>
> We define two structure mode to hold statistics
> 1. TableStats: contain stats for table level, now only one element: rowCount
> 2. ColumnStats: contain stats of column level. 
> for numeric column type: including ndv, nullCount, max, min, histogram
> for string type: including ndv, nullCount, avgLen,maxLen
> for boolean:including ndv, nullCount, trueCount, falseCount
> for date/time/timestamp:  including ndv, nullCount, max, min, histogram 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-5790) Use list types when ListStateDescriptor extends StateDescriptor

2017-02-13 Thread Xiaogang Shi (JIRA)

Xiaogang Shi created FLINK-5790:
---

 Summary: Use list types when ListStateDescriptor extends 
StateDescriptor
 Key: FLINK-5790
 URL: https://issues.apache.org/jira/browse/FLINK-5790
 Project: Flink
  Issue Type: Improvement
Reporter: Xiaogang Shi
Assignee: Xiaogang Shi


Flink keeps the state serializer in {{StateDescriptor}}, but it's the 
serializer of list elements  that is put in {{ListStateDescriptor}}. The 
implementation is a little confusing. Some backends need to construct the state 
serializer with the element serializer by themselves.

We should use an {{ArrayListSerializer}}, which is composed of the serializer 
of the element, in the {{ListStateDescriptor}}. It helps the backend to avoid 
constructing the state serializer.

If a backend needs customized serialization of the state (e.g. 
{{RocksDBStateBackend}}), it still can obtain the element serializer from the 
{{ArrayListSerializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5739) NullPointerException in CliFrontend

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864869#comment-15864869
 ] 

ASF GitHub Bot commented on FLINK-5739:
---

Github user clarkyzl commented on a diff in the pull request:

https://github.com/apache/flink/pull/3292#discussion_r100945289
  
--- Diff: 
flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java ---
@@ -842,6 +842,12 @@ protected int executeProgram(PackagedProgram program, 
ClusterClient client, int
program.deleteExtractedLibraries();
}
 
+   if (null == result) {
+   logAndSysout("No JobSubmissionResult returned, please 
make sure you called " +
--- End diff --

Added a space here and try again


> NullPointerException in CliFrontend
> ---
>
> Key: FLINK-5739
> URL: https://issues.apache.org/jira/browse/FLINK-5739
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
> Environment: Mac OS X 10.12.2, Java 1.8.0_92-b14
>Reporter: Zhuoluo Yang
> Fix For: 1.3.0
>
>
> I've run a simple program on a local cluster. It always fails with code 
> Version: 1.3-SNAPSHOTCommit: e24a866. 
> {quote}
> Zhuoluos-MacBook-Pro:build-target zhuoluo.yzl$ bin/flink run -c 
> com.alibaba.blink.TableApp ~/gitlab/tableapp/target/tableapp-1.0-SNAPSHOT.jar 
> Cluster configuration: Standalone cluster with JobManager at 
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> 
>  The program finished with the following exception:
> java.lang.NullPointerException
>   at 
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:845)
>   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:259)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1076)
>   at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1123)
>   at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
>   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1120)
> {quote}
> I don't think there should be a NullPointerException here, even if you forgot 
> the "execute()" call.
> The reproducing code looks like following:
> {code:java}
> public static void main(String[] args) throws Exception {
> ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> DataSource customer = 
> env.readTextFile("/Users/zhuoluo.yzl/customer.tbl");
> customer.filter(new FilterFunction() {
> public boolean filter(String value) throws Exception {
> return true;
> }
> })
> .writeAsText("/Users/zhuoluo.yzl/customer.txt");
> //env.execute();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3292: [FLINK-5739] [client] fix NullPointerException in ...

2017-02-13 Thread clarkyzl

Github user clarkyzl commented on a diff in the pull request:

https://github.com/apache/flink/pull/3292#discussion_r100945289
  
--- Diff: 
flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java ---
@@ -842,6 +842,12 @@ protected int executeProgram(PackagedProgram program, 
ClusterClient client, int
program.deleteExtractedLibraries();
}
 
+   if (null == result) {
+   logAndSysout("No JobSubmissionResult returned, please 
make sure you called " +
--- End diff --

Added a space here and try again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Closed] (FLINK-5737) Fix the bug when TableSource contains a field of byte[] type

2017-02-13 Thread zhangjing (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangjing closed FLINK-5737.

Resolution: Fixed

> Fix the bug when TableSource contains a field of byte[] type
> 
>
> Key: FLINK-5737
> URL: https://issues.apache.org/jira/browse/FLINK-5737
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: zhangjing
>Assignee: zhangjing
>
> At current, if a TableSource contains a field of byte[] type, TableException 
> would be thrown when optimize RelNode tree.
> If we run the following code, logBlockTableSource contain one field: f0,  
> which is  byte[] type.
>  {code}
>   tableEnv.registerTableSource("t1", logBlockTableSource);
>   tableEnv.registerFunction("parse", new BinaryParser());
>   Table ttDatas = tableEnv.sql("select parse(f0) from t1");
>   DataStream result = tableEnv.toDataStream(ttDatas, 
> String.class);
>   result.addSink(new PrintSinkFunction());
>   env.execute();
>   public static class BinaryParser extends ScalarFunction {
>   public String eval(byte[] bytes) {
>   return new String(bytes);
>   }
>   }
>  {code}
> we would get the following exception:
> {code}
> Exception in thread "main" java.lang.AssertionError: Internal error: Error 
> occurred while applying rule StreamTableSourceScanRule
>   at org.apache.calcite.util.Util.newInternal(Util.java:792)
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:225)
>   at 
> org.apache.calcite.rel.convert.ConverterRule.onMatch(ConverterRule.java:117)
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:213)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:819)
>   at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334)
>   at 
> org.apache.flink.table.api.TableEnvironment.runVolcanoPlanner(TableEnvironment.scala:264)
>   at 
> org.apache.flink.table.api.StreamTableEnvironment.optimize(StreamTableEnvironment.scala:231)
>   at 
> org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:259)
>   at 
> org.apache.flink.table.api.java.StreamTableEnvironment.toDataStream(StreamTableEnvironment.scala:148)
>   at 
> com.alibaba.blink.streaming.connectors.tt.examples.TT4TableSourceExample.main(TT4TableSourceExample.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: org.apache.flink.table.api.TableException: Unsupported data type 
> encountered: ARRAY
>   at org.apache.flink.table.api.TableException$.apply(exceptions.scala:51)
>   at 
> org.apache.flink.table.plan.nodes.FlinkRel$$anonfun$estimateRowSize$2.apply(FlinkRel.scala:124)
>   at 
> org.apache.flink.table.plan.nodes.FlinkRel$$anonfun$estimateRowSize$2.apply(FlinkRel.scala:108)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>   at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:47)
>   at 
> org.apache.flink.table.plan.nodes.FlinkRel$class.estimateRowSize(FlinkRel.scala:108)
>   at 
> org.apache.flink.table.plan.nodes.datastream.StreamScan.estimateRowSize(StreamScan.scala:37)
>   at 
> org.apache.flink.table.plan.nodes.datastream.StreamTableSourceScan.computeSelfCost(StreamTableSourceScan.scala:46)
>   at 
> org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.getNonCumulativeCost(RelMdPercentageOriginalRows.java:162)
>   at 
> GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost_$(Unknown 
> Source)
>   at 
> GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost(Unknown 
> Source)
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumulativeCost(RelMetadataQuery.java:258)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(VolcanoPlanner.java:1128)
>   at 
> org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:336)
>   at 
> org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements(RelSubset.java:319)
>   at 
> org.apache.calcite.

[jira] [Commented] (FLINK-5737) Fix the bug when TableSource contains a field of byte[] type

2017-02-13 Thread zhangjing (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864852#comment-15864852
 ] 

zhangjing commented on FLINK-5737:
--

Hi Fabian, thanks a lot. The bug was already fixed, I will close this jira.

> Fix the bug when TableSource contains a field of byte[] type
> 
>
> Key: FLINK-5737
> URL: https://issues.apache.org/jira/browse/FLINK-5737
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: zhangjing
>Assignee: zhangjing
>
> At current, if a TableSource contains a field of byte[] type, TableException 
> would be thrown when optimize RelNode tree.
> If we run the following code, logBlockTableSource contain one field: f0,  
> which is  byte[] type.
>  {code}
>   tableEnv.registerTableSource("t1", logBlockTableSource);
>   tableEnv.registerFunction("parse", new BinaryParser());
>   Table ttDatas = tableEnv.sql("select parse(f0) from t1");
>   DataStream result = tableEnv.toDataStream(ttDatas, 
> String.class);
>   result.addSink(new PrintSinkFunction());
>   env.execute();
>   public static class BinaryParser extends ScalarFunction {
>   public String eval(byte[] bytes) {
>   return new String(bytes);
>   }
>   }
>  {code}
> we would get the following exception:
> {code}
> Exception in thread "main" java.lang.AssertionError: Internal error: Error 
> occurred while applying rule StreamTableSourceScanRule
>   at org.apache.calcite.util.Util.newInternal(Util.java:792)
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:148)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:225)
>   at 
> org.apache.calcite.rel.convert.ConverterRule.onMatch(ConverterRule.java:117)
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:213)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:819)
>   at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334)
>   at 
> org.apache.flink.table.api.TableEnvironment.runVolcanoPlanner(TableEnvironment.scala:264)
>   at 
> org.apache.flink.table.api.StreamTableEnvironment.optimize(StreamTableEnvironment.scala:231)
>   at 
> org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:259)
>   at 
> org.apache.flink.table.api.java.StreamTableEnvironment.toDataStream(StreamTableEnvironment.scala:148)
>   at 
> com.alibaba.blink.streaming.connectors.tt.examples.TT4TableSourceExample.main(TT4TableSourceExample.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: org.apache.flink.table.api.TableException: Unsupported data type 
> encountered: ARRAY
>   at org.apache.flink.table.api.TableException$.apply(exceptions.scala:51)
>   at 
> org.apache.flink.table.plan.nodes.FlinkRel$$anonfun$estimateRowSize$2.apply(FlinkRel.scala:124)
>   at 
> org.apache.flink.table.plan.nodes.FlinkRel$$anonfun$estimateRowSize$2.apply(FlinkRel.scala:108)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>   at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:47)
>   at 
> org.apache.flink.table.plan.nodes.FlinkRel$class.estimateRowSize(FlinkRel.scala:108)
>   at 
> org.apache.flink.table.plan.nodes.datastream.StreamScan.estimateRowSize(StreamScan.scala:37)
>   at 
> org.apache.flink.table.plan.nodes.datastream.StreamTableSourceScan.computeSelfCost(StreamTableSourceScan.scala:46)
>   at 
> org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.getNonCumulativeCost(RelMdPercentageOriginalRows.java:162)
>   at 
> GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost_$(Unknown 
> Source)
>   at 
> GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost(Unknown 
> Source)
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumulativeCost(RelMetadataQuery.java:258)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(VolcanoPlanner.java:1128)
>   at 
> org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:336)
>   at 
> org.apa

[jira] [Commented] (FLINK-5710) Add ProcTime() function to indicate StreamSQL

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864833#comment-15864833
 ] 

ASF GitHub Bot commented on FLINK-5710:
---

GitHub user huawei-flink opened a pull request:

https://github.com/apache/flink/pull/3302

[FLINK-5710] Add ProcTime() function to indicate StreamSQL

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huawei-flink/flink FLINK-5710

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3302






> Add ProcTime() function to indicate StreamSQL
> -
>
> Key: FLINK-5710
> URL: https://issues.apache.org/jira/browse/FLINK-5710
> Project: Flink
>  Issue Type: New Feature
>  Components: DataStream API
>Reporter: Stefano Bortoli
>Assignee: Stefano Bortoli
>Priority: Minor
>
> procTime() is a parameterless scalar function that just indicates processing 
> time mode



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3302: [FLINK-5710] Add ProcTime() function to indicate S...

2017-02-13 Thread huawei-flink

GitHub user huawei-flink opened a pull request:

https://github.com/apache/flink/pull/3302

[FLINK-5710] Add ProcTime() function to indicate StreamSQL

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huawei-flink/flink FLINK-5710

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3302






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3285: [FLINK-4997] [streaming] Introduce ProcessWindowFunction

2017-02-13 Thread manuzhang

Github user manuzhang commented on the issue:

https://github.com/apache/flink/pull/3285
  
LGTM. Really excited to see we move further in this direction, and faster 
if possible. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-4997) Extending Window Function Metadata

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864819#comment-15864819
 ] 

ASF GitHub Bot commented on FLINK-4997:
---

Github user manuzhang commented on the issue:

https://github.com/apache/flink/pull/3285
  
LGTM. Really excited to see we move further in this direction, and faster 
if possible. 


> Extending Window Function Metadata
> --
>
> Key: FLINK-4997
> URL: https://issues.apache.org/jira/browse/FLINK-4997
> Project: Flink
>  Issue Type: New Feature
>  Components: DataStream API, Streaming
>Reporter: Ventura Del Monte
>Assignee: Ventura Del Monte
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5624) Support tumbling window on streaming tables in the SQL API

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864797#comment-15864797
 ] 

ASF GitHub Bot commented on FLINK-5624:
---

Github user haohui commented on the issue:

https://github.com/apache/flink/pull/3252
  
The v3 PR gets the best from both of the worlds -- the code generator will 
throw exceptions if the queries actually execute the `rowtime()`.

Essentially it rewrites the project and the aggregate operators before 
passing the operators into the Volcano planner. @fhueske please take another 
look.


> Support tumbling window on streaming tables in the SQL API
> --
>
> Key: FLINK-5624
> URL: https://issues.apache.org/jira/browse/FLINK-5624
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> This is a follow up of FLINK-4691.
> FLINK-4691 adds supports for group-windows for streaming tables. This jira 
> proposes to expose the functionality in the SQL layer via the {{GROUP BY}} 
> clauses, as described in 
> http://calcite.apache.org/docs/stream.html#tumbling-windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3252: [FLINK-5624] Support tumbling window on streaming tables ...

2017-02-13 Thread haohui

Github user haohui commented on the issue:

https://github.com/apache/flink/pull/3252
  
The v3 PR gets the best from both of the worlds -- the code generator will 
throw exceptions if the queries actually execute the `rowtime()`.

Essentially it rewrites the project and the aggregate operators before 
passing the operators into the Volcano planner. @fhueske please take another 
look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5723) Use "Used" instead of "Initial" to make taskmanager tag more readable

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864795#comment-15864795
 ] 

ASF GitHub Bot commented on FLINK-5723:
---

Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3274
  
@zentol Is it ok to go?


> Use "Used" instead of "Initial" to make taskmanager tag more readable
> -
>
> Key: FLINK-5723
> URL: https://issues.apache.org/jira/browse/FLINK-5723
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Reporter: Tao Wang
>Priority: Trivial
>
> Now in JobManager web fronted, the used memory of task managers is presented 
> as "Initial" in table header, which actually means "memory used", from codes.
> I'd like change it to be more readable, even it is trivial one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3274: [FLINK-5723][UI]Use Used instead of Initial to make taskm...

2017-02-13 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3274
  
@zentol Is it ok to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-5486) Lack of synchronization in BucketingSink#handleRestoredBucketState()

2017-02-13 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-5486:
--
Description: 
Here is related code:
{code}
  
handlePendingFilesForPreviousCheckpoints(bucketState.pendingFilesPerCheckpoint);

  synchronized (bucketState.pendingFilesPerCheckpoint) {
bucketState.pendingFilesPerCheckpoint.clear();
  }
{code}

The handlePendingFilesForPreviousCheckpoints() call should be enclosed inside 
the synchronization block. Otherwise during the processing of 
handlePendingFilesForPreviousCheckpoints(), some entries of the map may be 
cleared.

  was:
Here is related code:
{code}
  
handlePendingFilesForPreviousCheckpoints(bucketState.pendingFilesPerCheckpoint);

  synchronized (bucketState.pendingFilesPerCheckpoint) {
bucketState.pendingFilesPerCheckpoint.clear();
  }
{code}
The handlePendingFilesForPreviousCheckpoints() call should be enclosed inside 
the synchronization block. Otherwise during the processing of 
handlePendingFilesForPreviousCheckpoints(), some entries of the map may be 
cleared.


> Lack of synchronization in BucketingSink#handleRestoredBucketState()
> 
>
> Key: FLINK-5486
> URL: https://issues.apache.org/jira/browse/FLINK-5486
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Reporter: Ted Yu
>
> Here is related code:
> {code}
>   
> handlePendingFilesForPreviousCheckpoints(bucketState.pendingFilesPerCheckpoint);
>   synchronized (bucketState.pendingFilesPerCheckpoint) {
> bucketState.pendingFilesPerCheckpoint.clear();
>   }
> {code}
> The handlePendingFilesForPreviousCheckpoints() call should be enclosed inside 
> the synchronization block. Otherwise during the processing of 
> handlePendingFilesForPreviousCheckpoints(), some entries of the map may be 
> cleared.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (FLINK-5488) yarnClient should be closed in AbstractYarnClusterDescriptor for error conditions

2017-02-13 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-5488:
--
Description: 
Here is one example:
{code}
if(jobManagerMemoryMb > maxRes.getMemory() ) {
  failSessionDuringDeployment(yarnClient, yarnApplication);
  throw new YarnDeploymentException("The cluster does not have the 
requested resources for the JobManager available!\n"
+ "Maximum Memory: " + maxRes.getMemory() + "MB Requested: " + 
jobManagerMemoryMb + "MB. " + NOTE);
}
{code}
yarnClient implements Closeable.
It should be closed in situations where exception is thrown.

  was:
Here is one example:
{code}
if(jobManagerMemoryMb > maxRes.getMemory() ) {
  failSessionDuringDeployment(yarnClient, yarnApplication);
  throw new YarnDeploymentException("The cluster does not have the 
requested resources for the JobManager available!\n"
+ "Maximum Memory: " + maxRes.getMemory() + "MB Requested: " + 
jobManagerMemoryMb + "MB. " + NOTE);
}
{code}

yarnClient implements Closeable.
It should be closed in situations where exception is thrown.


> yarnClient should be closed in AbstractYarnClusterDescriptor for error 
> conditions
> -
>
> Key: FLINK-5488
> URL: https://issues.apache.org/jira/browse/FLINK-5488
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Reporter: Ted Yu
>
> Here is one example:
> {code}
> if(jobManagerMemoryMb > maxRes.getMemory() ) {
>   failSessionDuringDeployment(yarnClient, yarnApplication);
>   throw new YarnDeploymentException("The cluster does not have the 
> requested resources for the JobManager available!\n"
> + "Maximum Memory: " + maxRes.getMemory() + "MB Requested: " + 
> jobManagerMemoryMb + "MB. " + NOTE);
> }
> {code}
> yarnClient implements Closeable.
> It should be closed in situations where exception is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5624) Support tumbling window on streaming tables in the SQL API

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864614#comment-15864614
 ] 

ASF GitHub Bot commented on FLINK-5624:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3252#discussion_r100918614
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/calls/FunctionGenerator.scala
 ---
@@ -290,6 +291,15 @@ object FunctionGenerator {
 Seq(),
 new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
 
+  // Make ROWTIME() return the local timestamp
+  // The function has to be executable as in windowed queries it is used
+  // in the GroupBy expression. The results of the function, however, does
+  // not matter.
+  addSqlFunction(
+EventTimeExtractor,
+Seq(),
+new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
--- End diff --

Ah, yes. You are right. It is still called in the DataStreamCalc and cannot 
be easily removed as you noted.

Alright, then I'd suggest to just emit a casted `null`. This is not very 
nice, as it might also be called at any other place but since we will remove 
the marker function soon, it should not be a big issue.


> Support tumbling window on streaming tables in the SQL API
> --
>
> Key: FLINK-5624
> URL: https://issues.apache.org/jira/browse/FLINK-5624
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> This is a follow up of FLINK-4691.
> FLINK-4691 adds supports for group-windows for streaming tables. This jira 
> proposes to expose the functionality in the SQL layer via the {{GROUP BY}} 
> clauses, as described in 
> http://calcite.apache.org/docs/stream.html#tumbling-windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3252: [FLINK-5624] Support tumbling window on streaming ...

2017-02-13 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3252#discussion_r100918614
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/calls/FunctionGenerator.scala
 ---
@@ -290,6 +291,15 @@ object FunctionGenerator {
 Seq(),
 new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
 
+  // Make ROWTIME() return the local timestamp
+  // The function has to be executable as in windowed queries it is used
+  // in the GroupBy expression. The results of the function, however, does
+  // not matter.
+  addSqlFunction(
+EventTimeExtractor,
+Seq(),
+new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
--- End diff --

Ah, yes. You are right. It is still called in the DataStreamCalc and cannot 
be easily removed as you noted.

Alright, then I'd suggest to just emit a casted `null`. This is not very 
nice, as it might also be called at any other place but since we will remove 
the marker function soon, it should not be a big issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5406) add normalization phase for predicate logical plan rewriting between decorrelate query phase and volcano optimization phase

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864607#comment-15864607
 ] 

ASF GitHub Bot commented on FLINK-5406:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/3101
  
Thanks for the update @godfreyhe!
PR looks good to me and can be merged IMO.

@godfreyhe, can you rebase the PR to the current master?
@twalthr, do you want to have another look as well?


> add normalization phase for predicate logical plan rewriting between 
> decorrelate query phase and volcano optimization phase
> ---
>
> Key: FLINK-5406
> URL: https://issues.apache.org/jira/browse/FLINK-5406
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: godfrey he
>Assignee: godfrey he
>
> Normalization phase is for predicate logical plan rewriting and is 
> independent of cost module. The rules in normalization phase do not need to 
> repeatedly applied to different logical plan which is different to volcano 
> optimization phase. And the benefit of normalization phase is to reduce the 
> running time of volcano planner.
> *ReduceExpressionsRule* can apply various simplifying transformations on 
> RexNode trees. Currently, there are two transformations:
> 1) Constant reduction, which evaluates constant subtrees, replacing them with 
> a corresponding RexLiteral
> 2) Removal of redundant casts, which occurs when the argument into the cast 
> is the same as the type of the resulting cast expression
> the above transformations do not depend on the cost module,  so we can move 
> the rules in *ReduceExpressionsRule* from 
> DATASET_OPT_RULES/DATASTREAM_OPT_RULES to DataSet/DataStream Normalization 
> Rules.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3101: [FLINK-5406] [table] add normalization phase for predicat...

2017-02-13 Thread fhueske

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/3101
  
Thanks for the update @godfreyhe!
PR looks good to me and can be merged IMO.

@godfreyhe, can you rebase the PR to the current master?
@twalthr, do you want to have another look as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5624) Support tumbling window on streaming tables in the SQL API

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864603#comment-15864603
 ] 

ASF GitHub Bot commented on FLINK-5624:
---

Github user haohui commented on a diff in the pull request:

https://github.com/apache/flink/pull/3252#discussion_r100917108
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/calls/FunctionGenerator.scala
 ---
@@ -290,6 +291,15 @@ object FunctionGenerator {
 Seq(),
 new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
 
+  // Make ROWTIME() return the local timestamp
+  // The function has to be executable as in windowed queries it is used
+  // in the GroupBy expression. The results of the function, however, does
+  // not matter.
+  addSqlFunction(
+EventTimeExtractor,
+Seq(),
+new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
--- End diff --

It turns out that the function is used all the way in runtime -- the 
translated plan looks like the following:

```
LogicalAggregate(group={0}, ...)
  LogicalProject($0=FLOOR(ROWTIME() TO HOURS)))
```

The expression is used in the projection. Unfortunately there is no trivial 
way to exclude it in Calcite as mentioned in the last comments.

The results of expression is not used in the query though.


> Support tumbling window on streaming tables in the SQL API
> --
>
> Key: FLINK-5624
> URL: https://issues.apache.org/jira/browse/FLINK-5624
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> This is a follow up of FLINK-4691.
> FLINK-4691 adds supports for group-windows for streaming tables. This jira 
> proposes to expose the functionality in the SQL layer via the {{GROUP BY}} 
> clauses, as described in 
> http://calcite.apache.org/docs/stream.html#tumbling-windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3252: [FLINK-5624] Support tumbling window on streaming ...

2017-02-13 Thread haohui

Github user haohui commented on a diff in the pull request:

https://github.com/apache/flink/pull/3252#discussion_r100917108
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/calls/FunctionGenerator.scala
 ---
@@ -290,6 +291,15 @@ object FunctionGenerator {
 Seq(),
 new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
 
+  // Make ROWTIME() return the local timestamp
+  // The function has to be executable as in windowed queries it is used
+  // in the GroupBy expression. The results of the function, however, does
+  // not matter.
+  addSqlFunction(
+EventTimeExtractor,
+Seq(),
+new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
--- End diff --

It turns out that the function is used all the way in runtime -- the 
translated plan looks like the following:

```
LogicalAggregate(group={0}, ...)
  LogicalProject($0=FLOOR(ROWTIME() TO HOURS)))
```

The expression is used in the projection. Unfortunately there is no trivial 
way to exclude it in Calcite as mentioned in the last comments.

The results of expression is not used in the query though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3301: [FLINK-5788] [docs] Improve documentation of FileS...

2017-02-13 Thread StephanEwen

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/3301

[FLINK-5788] [docs] Improve documentation of FileSystem and spell out the 
data persistence contract

This writes down the contract that the Flink `FileSystem` and 
`FSDataOutputStream` implementations have to adhere to in order to support 
proper consistency and failure recovery. The contract has so far been only 
implicitly defined and adhered to by the checkpointing and high-availability 
code.

## Contract

Data written to an `FSDataOutputStream` created from a `FileSystem` is 
considered persistent, if two requirements are met:

  1. **Visibility Requirement:** It must be guaranteed that all other 
processes, machines,
 virtual machines, containers, etc. that are able to access the file 
see the data consistently
 when given the absolute file path. This requirement is similar to the 
*close-to-open*
 semantics defined by POSIX, but restricted to the file itself (by its 
absolute path).

  2. **Durability Requirement:** The file system's specific 
durability/persistence requirements
 must be met. These are specific to the particular file system. For 
example the
 `LocalFileSystem` does not provide any durability guarantees for 
crashes of both
 hardware and operating system, while replicated distributed file 
systems (like HDFS)
 guarantee typically durability in the presence of up to concurrent 
failure or *n*
 nodes, where *n* is the replication factor.

Updates to the file's parent directory (such as that the file shows up when 
listing the directory contents) are not required to be complete for the data in 
the file stream to be considered persistent. This relaxation is important for 
file systems where updates to directory contents are only eventually consistent 
(like S3).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink filesystem_docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3301






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5788) Document assumptions about File Systems and persistence

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864489#comment-15864489
 ] 

ASF GitHub Bot commented on FLINK-5788:
---

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/3301

[FLINK-5788] [docs] Improve documentation of FileSystem and spell out the 
data persistence contract

This writes down the contract that the Flink `FileSystem` and 
`FSDataOutputStream` implementations have to adhere to in order to support 
proper consistency and failure recovery. The contract has so far been only 
implicitly defined and adhered to by the checkpointing and high-availability 
code.

## Contract

Data written to an `FSDataOutputStream` created from a `FileSystem` is 
considered persistent, if two requirements are met:

  1. **Visibility Requirement:** It must be guaranteed that all other 
processes, machines,
 virtual machines, containers, etc. that are able to access the file 
see the data consistently
 when given the absolute file path. This requirement is similar to the 
*close-to-open*
 semantics defined by POSIX, but restricted to the file itself (by its 
absolute path).

  2. **Durability Requirement:** The file system's specific 
durability/persistence requirements
 must be met. These are specific to the particular file system. For 
example the
 `LocalFileSystem` does not provide any durability guarantees for 
crashes of both
 hardware and operating system, while replicated distributed file 
systems (like HDFS)
 guarantee typically durability in the presence of up to concurrent 
failure or *n*
 nodes, where *n* is the replication factor.

Updates to the file's parent directory (such as that the file shows up when 
listing the directory contents) are not required to be complete for the data in 
the file stream to be considered persistent. This relaxation is important for 
file systems where updates to directory contents are only eventually consistent 
(like S3).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink filesystem_docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3301






> Document assumptions about File Systems and persistence
> ---
>
> Key: FLINK-5788
> URL: https://issues.apache.org/jira/browse/FLINK-5788
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.3.0
>
>
> We should add some description about the assumptions we make for the behavior 
> of {{FileSystem}} implementations to support proper checkpointing and 
> recovery operations.
> This is especially critical for file systems like {{S3}} with a somewhat 
> tricky contract.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-4997) Extending Window Function Metadata

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864476#comment-15864476
 ] 

ASF GitHub Bot commented on FLINK-4997:
---

Github user VenturaDelMonte commented on the issue:

https://github.com/apache/flink/pull/3285
  
I checked the code and it looks fine according to me. I see you are dealing 
with RichFunction inheritance in wrapper classes. It is something that bothered 
me too when I was working on the previous PR. Actually there could be a similar 
minor issue in RichProcessWindowFunction due to duplicate code, however I 
cannot see any easy workaround for it. 
Regarding the overall feature, are you planning to do anything similar for 
#2946 or should I reflect these changes also there? 


> Extending Window Function Metadata
> --
>
> Key: FLINK-4997
> URL: https://issues.apache.org/jira/browse/FLINK-4997
> Project: Flink
>  Issue Type: New Feature
>  Components: DataStream API, Streaming
>Reporter: Ventura Del Monte
>Assignee: Ventura Del Monte
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink issue #3285: [FLINK-4997] [streaming] Introduce ProcessWindowFunction

2017-02-13 Thread VenturaDelMonte

Github user VenturaDelMonte commented on the issue:

https://github.com/apache/flink/pull/3285
  
I checked the code and it looks fine according to me. I see you are dealing 
with RichFunction inheritance in wrapper classes. It is something that bothered 
me too when I was working on the previous PR. Actually there could be a similar 
minor issue in RichProcessWindowFunction due to duplicate code, however I 
cannot see any easy workaround for it. 
Regarding the overall feature, are you planning to do anything similar for 
#2946 or should I reflect these changes also there? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-5789) Make Bucketing Sink independent of Hadoop's FileSysten

2017-02-13 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-5789:
---

 Summary: Make Bucketing Sink independent of Hadoop's FileSysten
 Key: FLINK-5789
 URL: https://issues.apache.org/jira/browse/FLINK-5789
 Project: Flink
  Issue Type: Bug
  Components: Streaming Connectors
Affects Versions: 1.1.4, 1.2.0
Reporter: Stephan Ewen
 Fix For: 1.3.0


The {{BucketingSink}} is hard wired to Hadoop's FileSystem, bypassing Flink's 
file system abstraction.

This causes several issues:
  - The bucketing sink will behave different than other file sinks with respect 
to configuration
  - Directly supported file systems (not through hadoop) like the MapR File 
System does not work in the same way with the BuketingSink as other file systems
  - The previous point is all the more problematic in the effort to make Hadoop 
an optional dependency and with in other stacks (Mesos, Kubernetes, AWS, GCE, 
Azure) with ideally no Hadoop dependency.

We should port the {{BucketingSink}} to use Flink's FileSystem classes.

To support the *truncate* functionality that is needed for the exactly-once 
semantics of the Bucketing Sink, we should extend Flink's FileSystem 
abstraction to have the methods
  - {{boolean supportsTruncate()}}
  - {{void truncate(Path, long)}}







--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5092) Add integration with Sonarqube and code coverage

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864356#comment-15864356
 ] 

ASF GitHub Bot commented on FLINK-5092:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2836#discussion_r100888724
  
--- Diff: flink-test-utils-parent/flink-test-utils-coverage/pom.xml ---
@@ -0,0 +1,393 @@
+
+
+http://maven.apache.org/POM/4.0.0";
+xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+   
+   flink-test-utils-parent
+   org.apache.flink
+   1.3-SNAPSHOT
+   
+   4.0.0
+
+   flink-test-utils-coverage
+
+   
> Key: FLINK-5092
> URL: https://issues.apache.org/jira/browse/FLINK-5092
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Boris Osipov
>Assignee: Boris Osipov
>
> It would be good to have the opportunity to generate test coverage reports 
> for Flink and analyze code by SonarQube.
> Parts of tasks:
> - add generate test coverage reports for Flink with new maven profile
> - implement integration with https://analysis.apache.org/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5092) Add integration with Sonarqube and code coverage

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864355#comment-15864355
 ] 

ASF GitHub Bot commented on FLINK-5092:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2836#discussion_r100888652
  
--- Diff: flink-examples/pom.xml ---
@@ -69,4 +69,12 @@ under the License.
flink-examples-batch
flink-examples-streaming

+   
+   
+   coverage
+   
+   flink-examples-batch
--- End diff --

Why are the streaming examples not in the module list?


> Add integration with Sonarqube and code coverage
> 
>
> Key: FLINK-5092
> URL: https://issues.apache.org/jira/browse/FLINK-5092
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Boris Osipov
>Assignee: Boris Osipov
>
> It would be good to have the opportunity to generate test coverage reports 
> for Flink and analyze code by SonarQube.
> Parts of tasks:
> - add generate test coverage reports for Flink with new maven profile
> - implement integration with https://analysis.apache.org/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #2836: [FLINK-5092] Add maven profile with code coverage ...

2017-02-13 Thread rmetzger

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2836#discussion_r100888724
  
--- Diff: flink-test-utils-parent/flink-test-utils-coverage/pom.xml ---
@@ -0,0 +1,393 @@
+
+
+http://maven.apache.org/POM/4.0.0";
+xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+   
+   flink-test-utils-parent
+   org.apache.flink
+   1.3-SNAPSHOT
+   
+   4.0.0
+
+   flink-test-utils-coverage
+
+

[GitHub] flink pull request #2836: [FLINK-5092] Add maven profile with code coverage ...

2017-02-13 Thread rmetzger

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2836#discussion_r100888652
  
--- Diff: flink-examples/pom.xml ---
@@ -69,4 +69,12 @@ under the License.
flink-examples-batch
flink-examples-streaming

+   
+   
+   coverage
+   
+   flink-examples-batch
--- End diff --

Why are the streaming examples not in the module list?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2836: [FLINK-5092] Add maven profile with code coverage ...

2017-02-13 Thread rmetzger

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2836#discussion_r100888521
  
--- Diff: flink-connectors/flink-connector-cassandra/pom.xml ---
@@ -65,6 +68,7 @@ under the License.
shade


+   
${shade.shadedArtifactAttached}
--- End diff --

Why can't we define this in the parent pom globally for all shade plugin 
instances?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5092) Add integration with Sonarqube and code coverage

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864354#comment-15864354
 ] 

ASF GitHub Bot commented on FLINK-5092:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2836#discussion_r100888521
  
--- Diff: flink-connectors/flink-connector-cassandra/pom.xml ---
@@ -65,6 +68,7 @@ under the License.
shade


+   
${shade.shadedArtifactAttached}
--- End diff --

Why can't we define this in the parent pom globally for all shade plugin 
instances?


> Add integration with Sonarqube and code coverage
> 
>
> Key: FLINK-5092
> URL: https://issues.apache.org/jira/browse/FLINK-5092
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Boris Osipov
>Assignee: Boris Osipov
>
> It would be good to have the opportunity to generate test coverage reports 
> for Flink and analyze code by SonarQube.
> Parts of tasks:
> - add generate test coverage reports for Flink with new maven profile
> - implement integration with https://analysis.apache.org/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5631) [yarn] Support downloading additional jars from non-HDFS paths

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864315#comment-15864315
 ] 

ASF GitHub Bot commented on FLINK-5631:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3202


> [yarn] Support downloading additional jars from non-HDFS paths
> --
>
> Key: FLINK-5631
> URL: https://issues.apache.org/jira/browse/FLINK-5631
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 1.3.0
>
>
> Currently the {{YarnResourceManager}} and {{YarnApplicationMasterRunner}} 
> always register the additional jars using the YARN filesystem object. This is 
> problematic as the paths might require another filesystem.
> To support localizing from non-HDFS paths (e.g., s3, http or viewfs), the 
> cleaner approach is to get the filesystem object from the path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5690) protobuf is not shaded properly

2017-02-13 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864312#comment-15864312
 ] 

Robert Metzger commented on FLINK-5690:
---

Feel free to review my pull request.

> protobuf is not shaded properly
> ---
>
> Key: FLINK-5690
> URL: https://issues.apache.org/jira/browse/FLINK-5690
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.1.4, 1.3.0
>Reporter: Andrey
>Assignee: Robert Metzger
>
> Currently distributive contains com/google/protobuf package. Without proper 
> shading client code could fail with:
> {code}
> Caused by: java.lang.IllegalAccessError: tried to access method 
> com.google.protobuf.
> {code}
> Steps to reproduce:
> * create job class "com.google.protobuf.TestClass"
> * call com.google.protobuf.TextFormat.escapeText(String) method from this 
> class
> * deploy job to flink cluster (usign web console for example)
> * run job. In logs IllegalAccessError.
> Issue in package protected method and different classloaders. TestClass 
> loaded by FlinkUserCodeClassLoader, but TextFormat class loaded by 
> sun.misc.Launcher$AppClassLoader



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-5788) Document assumptions about File Systems and persistence

2017-02-13 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-5788:
---

 Summary: Document assumptions about File Systems and persistence
 Key: FLINK-5788
 URL: https://issues.apache.org/jira/browse/FLINK-5788
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.3.0


We should add some description about the assumptions we make for the behavior 
of {{FileSystem}} implementations to support proper checkpointing and recovery 
operations.

This is especially critical for file systems like {{S3}} with a somewhat tricky 
contract.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3300: [FLINK-5690][docs] Add note on shading to best pra...

2017-02-13 Thread rmetzger

GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/3300

[FLINK-5690][docs] Add note on shading to best practices guide



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink5690

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3300


commit 504cf17d28932921da9b27ea924d777107b248ed
Author: Robert Metzger 
Date:   2017-02-13T19:50:23Z

[FLINK-5690][docs] Add note on shading to best practices guide




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3283: [FLINK-5729][EXAMPLES]add hostname option to be mo...

2017-02-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3283


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Closed] (FLINK-5631) [yarn] Support downloading additional jars from non-HDFS paths

2017-02-13 Thread Stephan Ewen (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-5631.
---

> [yarn] Support downloading additional jars from non-HDFS paths
> --
>
> Key: FLINK-5631
> URL: https://issues.apache.org/jira/browse/FLINK-5631
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 1.3.0
>
>
> Currently the {{YarnResourceManager}} and {{YarnApplicationMasterRunner}} 
> always register the additional jars using the YARN filesystem object. This is 
> problematic as the paths might require another filesystem.
> To support localizing from non-HDFS paths (e.g., s3, http or viewfs), the 
> cleaner approach is to get the filesystem object from the path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5729) add hostname option in SocketWindowWordCount example to be more convenient

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864314#comment-15864314
 ] 

ASF GitHub Bot commented on FLINK-5729:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3283


> add hostname option in SocketWindowWordCount example to be more convenient
> --
>
> Key: FLINK-5729
> URL: https://issues.apache.org/jira/browse/FLINK-5729
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Tao Wang
>Priority: Minor
> Fix For: 1.3.0
>
>
> "hostname" option will help users to get data from the right port, otherwise 
> the example would fail due to connection refused.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (FLINK-5729) add hostname option in SocketWindowWordCount example to be more convenient

2017-02-13 Thread Stephan Ewen (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-5729.
---

> add hostname option in SocketWindowWordCount example to be more convenient
> --
>
> Key: FLINK-5729
> URL: https://issues.apache.org/jira/browse/FLINK-5729
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Tao Wang
>Priority: Minor
> Fix For: 1.3.0
>
>
> "hostname" option will help users to get data from the right port, otherwise 
> the example would fail due to connection refused.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (FLINK-5631) [yarn] Support downloading additional jars from non-HDFS paths

2017-02-13 Thread Stephan Ewen (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-5631.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed in 186b12309b540f82a055be28f3f005dce4b8cf46

Thank you for the contribution!

> [yarn] Support downloading additional jars from non-HDFS paths
> --
>
> Key: FLINK-5631
> URL: https://issues.apache.org/jira/browse/FLINK-5631
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 1.3.0
>
>
> Currently the {{YarnResourceManager}} and {{YarnApplicationMasterRunner}} 
> always register the additional jars using the YARN filesystem object. This is 
> problematic as the paths might require another filesystem.
> To support localizing from non-HDFS paths (e.g., s3, http or viewfs), the 
> cleaner approach is to get the filesystem object from the path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3202: [FLINK-5631] [yarn] Support downloading additional...

2017-02-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3202


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-5729) add hostname option in SocketWindowWordCount example to be more convenient

2017-02-13 Thread Stephan Ewen (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-5729.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed via 30c5b771a7943e981dd5f67131c932fdb204fbc2

Thank you for the contribution!

> add hostname option in SocketWindowWordCount example to be more convenient
> --
>
> Key: FLINK-5729
> URL: https://issues.apache.org/jira/browse/FLINK-5729
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Tao Wang
>Priority: Minor
> Fix For: 1.3.0
>
>
> "hostname" option will help users to get data from the right port, otherwise 
> the example would fail due to connection refused.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5690) protobuf is not shaded properly

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864303#comment-15864303
 ] 

ASF GitHub Bot commented on FLINK-5690:
---

GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/3300

[FLINK-5690][docs] Add note on shading to best practices guide



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink5690

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3300


commit 504cf17d28932921da9b27ea924d777107b248ed
Author: Robert Metzger 
Date:   2017-02-13T19:50:23Z

[FLINK-5690][docs] Add note on shading to best practices guide




> protobuf is not shaded properly
> ---
>
> Key: FLINK-5690
> URL: https://issues.apache.org/jira/browse/FLINK-5690
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.1.4, 1.3.0
>Reporter: Andrey
>Assignee: Robert Metzger
>
> Currently distributive contains com/google/protobuf package. Without proper 
> shading client code could fail with:
> {code}
> Caused by: java.lang.IllegalAccessError: tried to access method 
> com.google.protobuf.
> {code}
> Steps to reproduce:
> * create job class "com.google.protobuf.TestClass"
> * call com.google.protobuf.TextFormat.escapeText(String) method from this 
> class
> * deploy job to flink cluster (usign web console for example)
> * run job. In logs IllegalAccessError.
> Issue in package protected method and different classloaders. TestClass 
> loaded by FlinkUserCodeClassLoader, but TextFormat class loaded by 
> sun.misc.Launcher$AppClassLoader



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5690) protobuf is not shaded properly

2017-02-13 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864266#comment-15864266
 ] 

Robert Metzger commented on FLINK-5690:
---

1) I'll open a pull request.

Regarding 2) This is something we could look into. 
However, so far, almost all complaints from users were about the dependencies 
we inherit from Hadoop. That's why I've tried relocating all of Hadoop's 
dependencies: 
https://issues.apache.org/jira/browse/FLINK-5297?focusedCommentId=15815050&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15815050
 (with mixed results).

Also, I'm planning to provide a "hadoop free" version of Flink. As more and 
more of our users actually run without anything from Hadoop (for example on AWS 
with S3 and Docker), these users don't need all the deps we inherit from Hadoop.
The libraries you've mentioned are pretty good with maintaining API 
compatibility. So you can mix these libraries better in one classpath.

> protobuf is not shaded properly
> ---
>
> Key: FLINK-5690
> URL: https://issues.apache.org/jira/browse/FLINK-5690
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.1.4, 1.3.0
>Reporter: Andrey
>Assignee: Robert Metzger
>
> Currently distributive contains com/google/protobuf package. Without proper 
> shading client code could fail with:
> {code}
> Caused by: java.lang.IllegalAccessError: tried to access method 
> com.google.protobuf.
> {code}
> Steps to reproduce:
> * create job class "com.google.protobuf.TestClass"
> * call com.google.protobuf.TextFormat.escapeText(String) method from this 
> class
> * deploy job to flink cluster (usign web console for example)
> * run job. In logs IllegalAccessError.
> Issue in package protected method and different classloaders. TestClass 
> loaded by FlinkUserCodeClassLoader, but TextFormat class loaded by 
> sun.misc.Launcher$AppClassLoader



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3196: [FLINK-5566] [Table API & SQL]Introduce structure ...

2017-02-13 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100851830
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
+nullCount: Long,
+avgLen: Long,
+maxLen: Long,
+max: Option[Any],
+min: Option[Any]) {
--- End diff --

Does it make sense to denote whether stats are precise or approximate? Also 
an optional field could hold the a timestamp when the stats were generated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3196: [FLINK-5566] [Table API & SQL]Introduce structure ...

2017-02-13 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100851572
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
--- End diff --

Make all stats optional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5566) Introduce structure to hold table and column level statistics

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864061#comment-15864061
 ] 

ASF GitHub Bot commented on FLINK-5566:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100851830
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
+nullCount: Long,
+avgLen: Long,
+maxLen: Long,
+max: Option[Any],
+min: Option[Any]) {
--- End diff --

Does it make sense to denote whether stats are precise or approximate? Also 
an optional field could hold the a timestamp when the stats were generated.


> Introduce structure to hold table and column level statistics
> -
>
> Key: FLINK-5566
> URL: https://issues.apache.org/jira/browse/FLINK-5566
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Kurt Young
>Assignee: zhangjing
>
> We define two structure mode to hold statistics
> 1. TableStats: contain stats for table level, now only one element: rowCount
> 2. ColumnStats: contain stats of column level. 
> for numeric column type: including ndv, nullCount, max, min, histogram
> for string type: including ndv, nullCount, avgLen,maxLen
> for boolean:including ndv, nullCount, trueCount, falseCount
> for date/time/timestamp:  including ndv, nullCount, max, min, histogram 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5566) Introduce structure to hold table and column level statistics

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864060#comment-15864060
 ] 

ASF GitHub Bot commented on FLINK-5566:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100851572
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
--- End diff --

Make all stats optional?


> Introduce structure to hold table and column level statistics
> -
>
> Key: FLINK-5566
> URL: https://issues.apache.org/jira/browse/FLINK-5566
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Kurt Young
>Assignee: zhangjing
>
> We define two structure mode to hold statistics
> 1. TableStats: contain stats for table level, now only one element: rowCount
> 2. ColumnStats: contain stats of column level. 
> for numeric column type: including ndv, nullCount, max, min, histogram
> for string type: including ndv, nullCount, avgLen,maxLen
> for boolean:including ndv, nullCount, trueCount, falseCount
> for date/time/timestamp:  including ndv, nullCount, max, min, histogram 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3196: [FLINK-5566] [Table API & SQL]Introduce structure ...

2017-02-13 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100851670
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
+nullCount: Long,
+avgLen: Long,
--- End diff --

I think `Int` should be sufficient for value length.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5566) Introduce structure to hold table and column level statistics

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864062#comment-15864062
 ] 

ASF GitHub Bot commented on FLINK-5566:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3196#discussion_r100851670
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/stats/ColumnStats.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.plan.stats
+
+/**
+  * column statistics
+  *
+  * @param ndv   number of distinct values
+  * @param nullCount number of nulls
+  * @param avgLenaverage length of column values
+  * @param maxLenmax length of column values
+  * @param max   max value of column values
+  * @param min   min value of column values
+  */
+case class ColumnStats(
+ndv: Long,
+nullCount: Long,
+avgLen: Long,
--- End diff --

I think `Int` should be sufficient for value length.


> Introduce structure to hold table and column level statistics
> -
>
> Key: FLINK-5566
> URL: https://issues.apache.org/jira/browse/FLINK-5566
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: Kurt Young
>Assignee: zhangjing
>
> We define two structure mode to hold statistics
> 1. TableStats: contain stats for table level, now only one element: rowCount
> 2. ColumnStats: contain stats of column level. 
> for numeric column type: including ndv, nullCount, max, min, histogram
> for string type: including ndv, nullCount, avgLen,maxLen
> for boolean:including ndv, nullCount, trueCount, falseCount
> for date/time/timestamp:  including ndv, nullCount, max, min, histogram 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (FLINK-1730) Add a FlinkTools.persist style method to the Data Set.

2017-02-13 Thread Kate Eri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863650#comment-15863650
 ] 

Kate Eri edited comment on FLINK-1730 at 2/13/17 5:36 PM:
--

Hello [~StephanEwen], hello [~fhueske].
We would like to implement this ticket to enable [integration of Flink with 
SystemML|https://github.com/apache/incubator-systemml/pull/119#issuecomment-222059794].
Considering the previous discussion, we would like to design this feature first 
and discuss it here.

First of all, I would like to double check the main statements of this issue:
1.  Dataset should be persisted and cached in memory or spilled to disk, if 
memory is not enough and to avoid to job failure. 
2.  Persisted data sets should be used for failover recovery. If done on 
the network stack level, persistence could cover this requirement. 
3.  Data should be shared across jobs. Operators are expected to return 
their memory when a job is done otherwise this will be a memory leak. There is 
no way to free memory if the job is finished and did not do it.
4.  Because of GPU/CPU -> Offheap/heap memory management, persistence in 
cache is required for support of GPUs.

Do I get these right or something was lost? 



was (Author: kateri):
Hello [~StephanEwen], hello [~fhueske].
We would like to implement this ticket to enable integration of Flink with 
SystemML.
Considering the previous discussion, we would like to design this feature first 
and discuss it here.

First of all, I would like to double check the main statements of this issue:
1.  Dataset should be persisted and cached in memory or spilled to disk, if 
memory is not enough and to avoid to job failure. 
2.  Persisted data sets should be used for failover recovery. If done on 
the network stack level, persistence could cover this requirement. 
3.  Data should be shared across jobs. Operators are expected to return 
their memory when a job is done otherwise this will be a memory leak. There is 
no way to free memory if the job is finished and did not do it.
4.  Because of GPU/CPU -> Offheap/heap memory management, persistence in 
cache is required for support of GPUs.

Do I get these right or something was lost? 


> Add a FlinkTools.persist style method to the Data Set.
> --
>
> Key: FLINK-1730
> URL: https://issues.apache.org/jira/browse/FLINK-1730
> Project: Flink
>  Issue Type: New Feature
>  Components: DataSet API
>Reporter: Stephan Ewen
>Assignee: Evgeny Kincharov
>
> I think this is an operation that will be needed more prominently. Defining a 
> point where one long logical program is broken into different executions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5624) Support tumbling window on streaming tables in the SQL API

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864011#comment-15864011
 ] 

ASF GitHub Bot commented on FLINK-5624:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3252#discussion_r100845516
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/expressions/streaming.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.expressions
+
+import org.apache.calcite.sql.`type`.{OperandTypes, ReturnTypes, 
SqlTypeName}
+import org.apache.calcite.sql.validate.SqlMonotonicity
+import org.apache.calcite.sql._
+import org.apache.calcite.tools.RelBuilder
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.table.api.TableException
+
+object EventTimeExtractor extends SqlFunction("ROWTIME", 
SqlKind.OTHER_FUNCTION,
+  ReturnTypes.explicit(SqlTypeName.TIMESTAMP), null, OperandTypes.NILADIC,
+  SqlFunctionCategory.SYSTEM) {
+  override def getSyntax: SqlSyntax = SqlSyntax.FUNCTION
+
+  override def getMonotonicity(call: SqlOperatorBinding): SqlMonotonicity =
+SqlMonotonicity.INCREASING
+}
+
+case class RowTime() extends LeafExpression {
--- End diff --

Use `CurrentTimestamp` like PR #3271


> Support tumbling window on streaming tables in the SQL API
> --
>
> Key: FLINK-5624
> URL: https://issues.apache.org/jira/browse/FLINK-5624
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> This is a follow up of FLINK-4691.
> FLINK-4691 adds supports for group-windows for streaming tables. This jira 
> proposes to expose the functionality in the SQL layer via the {{GROUP BY}} 
> clauses, as described in 
> http://calcite.apache.org/docs/stream.html#tumbling-windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3271: [FLINK-5710] Add ProcTime() function to indicate S...

2017-02-13 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3271#discussion_r100849148
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/validate/FunctionCatalog.scala
 ---
@@ -190,11 +192,14 @@ object FunctionCatalog {
 // array
 "cardinality" -> classOf[ArrayCardinality],
 "at" -> classOf[ArrayElementAt],
-"element" -> classOf[ArrayElement]
+"element" -> classOf[ArrayElement],
 
+"procTime" -> classOf[CurrentTimestamp]
--- End diff --

make `procTime` lowercase, i.e., `proctime` to keep it consistent with 
`rowtime`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-5710) Add ProcTime() function to indicate StreamSQL

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864030#comment-15864030
 ] 

ASF GitHub Bot commented on FLINK-5710:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3271#discussion_r100849148
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/validate/FunctionCatalog.scala
 ---
@@ -190,11 +192,14 @@ object FunctionCatalog {
 // array
 "cardinality" -> classOf[ArrayCardinality],
 "at" -> classOf[ArrayElementAt],
-"element" -> classOf[ArrayElement]
+"element" -> classOf[ArrayElement],
 
+"procTime" -> classOf[CurrentTimestamp]
--- End diff --

make `procTime` lowercase, i.e., `proctime` to keep it consistent with 
`rowtime`.


> Add ProcTime() function to indicate StreamSQL
> -
>
> Key: FLINK-5710
> URL: https://issues.apache.org/jira/browse/FLINK-5710
> Project: Flink
>  Issue Type: New Feature
>  Components: DataStream API
>Reporter: Stefano Bortoli
>Assignee: Stefano Bortoli
>Priority: Minor
>
> procTime() is a parameterless scalar function that just indicates processing 
> time mode



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5624) Support tumbling window on streaming tables in the SQL API

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864012#comment-15864012
 ] 

ASF GitHub Bot commented on FLINK-5624:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3252#discussion_r100837291
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/calls/FunctionGenerator.scala
 ---
@@ -290,6 +291,15 @@ object FunctionGenerator {
 Seq(),
 new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
 
+  // Make ROWTIME() return the local timestamp
+  // The function has to be executable as in windowed queries it is used
+  // in the GroupBy expression. The results of the function, however, does
+  // not matter.
+  addSqlFunction(
+EventTimeExtractor,
+Seq(),
+new CurrentTimePointCallGen(SqlTimeTypeInfo.TIMESTAMP, local = true))
--- End diff --

This function should not be called. So, I would suggest to create a 
`CallGenerator` that throws an exception, if possible when the code is 
generated, alternatively in the generated code. PR #3271 will need the same 
call generator.


> Support tumbling window on streaming tables in the SQL API
> --
>
> Key: FLINK-5624
> URL: https://issues.apache.org/jira/browse/FLINK-5624
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> This is a follow up of FLINK-4691.
> FLINK-4691 adds supports for group-windows for streaming tables. This jira 
> proposes to expose the functionality in the SQL layer via the {{GROUP BY}} 
> clauses, as described in 
> http://calcite.apache.org/docs/stream.html#tumbling-windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5624) Support tumbling window on streaming tables in the SQL API

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864010#comment-15864010
 ] 

ASF GitHub Bot commented on FLINK-5624:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3252#discussion_r100844487
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/expressions/streaming.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.expressions
+
+import org.apache.calcite.sql.`type`.{OperandTypes, ReturnTypes, 
SqlTypeName}
+import org.apache.calcite.sql.validate.SqlMonotonicity
+import org.apache.calcite.sql._
+import org.apache.calcite.tools.RelBuilder
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.table.api.TableException
+
+object EventTimeExtractor extends SqlFunction("ROWTIME", 
SqlKind.OTHER_FUNCTION,
--- End diff --

Can you move this to a `TimeModeIndicatorFunctions` class as suggested on 
`FlinkStreamingFunctionCatalog` of PR #3271. 


> Support tumbling window on streaming tables in the SQL API
> --
>
> Key: FLINK-5624
> URL: https://issues.apache.org/jira/browse/FLINK-5624
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> This is a follow up of FLINK-4691.
> FLINK-4691 adds supports for group-windows for streaming tables. This jira 
> proposes to expose the functionality in the SQL layer via the {{GROUP BY}} 
> clauses, as described in 
> http://calcite.apache.org/docs/stream.html#tumbling-windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] flink pull request #3252: [FLINK-5624] Support tumbling window on streaming ...

2017-02-13 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3252#discussion_r100845516
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/expressions/streaming.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.expressions
+
+import org.apache.calcite.sql.`type`.{OperandTypes, ReturnTypes, 
SqlTypeName}
+import org.apache.calcite.sql.validate.SqlMonotonicity
+import org.apache.calcite.sql._
+import org.apache.calcite.tools.RelBuilder
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.table.api.TableException
+
+object EventTimeExtractor extends SqlFunction("ROWTIME", 
SqlKind.OTHER_FUNCTION,
+  ReturnTypes.explicit(SqlTypeName.TIMESTAMP), null, OperandTypes.NILADIC,
+  SqlFunctionCategory.SYSTEM) {
+  override def getSyntax: SqlSyntax = SqlSyntax.FUNCTION
+
+  override def getMonotonicity(call: SqlOperatorBinding): SqlMonotonicity =
+SqlMonotonicity.INCREASING
+}
+
+case class RowTime() extends LeafExpression {
--- End diff --

Use `CurrentTimestamp` like PR #3271


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3252: [FLINK-5624] Support tumbling window on streaming ...

2017-02-13 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3252#discussion_r100844487
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/expressions/streaming.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.table.expressions
+
+import org.apache.calcite.sql.`type`.{OperandTypes, ReturnTypes, 
SqlTypeName}
+import org.apache.calcite.sql.validate.SqlMonotonicity
+import org.apache.calcite.sql._
+import org.apache.calcite.tools.RelBuilder
+import org.apache.flink.api.common.typeinfo.{SqlTimeTypeInfo, 
TypeInformation}
+import org.apache.flink.table.api.TableException
+
+object EventTimeExtractor extends SqlFunction("ROWTIME", 
SqlKind.OTHER_FUNCTION,
--- End diff --

Can you move this to a `TimeModeIndicatorFunctions` class as suggested on 
`FlinkStreamingFunctionCatalog` of PR #3271. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

1 2 3 >

1 - 100 of 252 matches

Mail list logo